Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add monadic traversal with accumulator to Data.Traversable #65

Closed
lykahb opened this issue May 17, 2022 · 37 comments
Closed

Add monadic traversal with accumulator to Data.Traversable #65

lykahb opened this issue May 17, 2022 · 37 comments
Labels
approved Approved by CLC vote base-4.18 Implemented in base-4.18 (GHC 9.6)

Comments

@lykahb
Copy link

lykahb commented May 17, 2022

I suggest adding new utility functions to Data.Traversable that allow monadic effects when traversing the data with accumulator:

mapAccumM :: (Monad m, Traversable t) => (a -> b -> m (a, c)) -> a -> t b -> m (a, t c)
forAccumM :: (Monad m, Traversable t) => a -> t b -> (a -> b -> m (a, c)) -> m (a, t c)

Cabal has an implementation of it at Distribution.Utils.MapAccum:

mapAccumM f s t = runStateM (traverse (\x -> StateM (\s' -> f s' x)) t) s

We can reuse this implementation.

Motivation

My use case is processing a list of entries with the cache of auxiliary data that needs expensive IO operations to compute. Cabal also uses mapAccumM for caching and discards the accumulator.

Alternative

It gets easier to use larger functions when a function is the last argument. A few times Cabal flips the arguments of mapAccumM or defines the processing function outside of the mapAccumM call. So, this may be more ergonomic:

forAccumM :: (Monad m, Traversable t) => a -> t b -> (a -> b -> m (a, c)) -> m (a, t c)

Impact

A search of "mapAccumM" on github yields nearly 200 results. After this change, a name clash would break many of those modules. However, this also shows that there is a need for the function.

@treeowl
Copy link

treeowl commented May 17, 2022

If you change the argument order a bit, you'll see that mapAccumM is really just mapM in the StateM monad. I think it should use mapM rather than traverse because there is an (admittedly not huge) chance that the Traversable in question has a specially optimized mapM for "strict" monads. I also think the Functor and Applicative instances for StateM should use monadic implementations to avoid accumulating thunks to inspect pairs. All together:

import Control.Applicative
import Data.Traversable
import Control.Monad
import Data.Coerce

myMapAccumM
  :: (Monad m, Traversable t)
  => (a -> s -> m (s, b))
  -> t a -> s -> m (s, t b)
myMapAccumM f = runStateM #. mapM (StateM #. f)

newtype StateM s m a = StateM
  { runStateM :: s -> m (s, a) }

instance Monad m => Functor (StateM s m) where
  fmap = liftM

instance Monad m => Applicative (StateM s m) where
  pure a = StateM $ \s -> pure (s, a)
  liftA2 = liftM2
  (<*>) = ap
  m *> n = StateM $ \s -> runStateM m s >>= \(s', _) -> runStateM n s'

instance Monad m => Monad (StateM s m) where
  m >>= f = StateM $ \s -> runStateM m s >>= \(s', a) -> runStateM (f a) s'
  (>>) = (*>)

-- Stolen from profunctors
(#.) :: Coercible b c => p b c -> (a -> b) -> a -> c
(#.) _ = coerce
{-# INLINE (#.) #-}

Of course, you can define your mapAccumM in terms of myMapAccumM by just flipping arguments around. The argument order is probably mostly a matter of taste, but there might be some minor performance implications in some cases.

@andreasabel
Copy link
Member

I think mapAccumL is already a sort of monster that can be avoided by using mapM with State.
So, why not just using StateT on your monad to get the accumulator?

Of course, your monad might already implementing MonadState. The lesson we learned in Agda development is to not make our main monad TCM instance of MonadState, MonadExcept, MonadReader etc. This way, we can stack additional effects like StateT on top of TCM whenever we need them temporarily.
https://github.com/agda/agda/blob/78986123103190e62ba175f85957c7432615285d/src/full/Agda/TypeChecking/Monad/Base.hs#L4207-L4212

@tomjaguarpaw
Copy link
Member

Of course, your monad might already implementing MonadState

I don't think that matters, does it? Unless I'm missing something, naked MonadState operations will be handled by the StateT that's providing the "accumulating" functionality, and lift of MonadState operations will be delegated to the wrapped monad.

@andreasabel
Copy link
Member

Indeed, MonadState does not lift through StateT, but always refers to the outermost StateT (https://hackage.haskell.org/package/mtl-2.3/docs/Control-Monad-State-Class.html#t:MonadState):

type My = State Integer

f :: Char -> StateT String My Int
f c = do
  modify (c:)
  lift $ modify succ
  return $ ord c

test :: (([Int], String), Integer)
test = mapM f "Hello, world!" `runStateT` "" `runState` 0

@Bodigrim
Copy link
Collaborator

@lykahb could you please prepare a GHC merge request, so that we have a specific design to vote on? Does it fit into Data.Traversable better than into Control.Monad?

@lykahb
Copy link
Author

lykahb commented May 30, 2022

@Bodigrim Do you think it is better to split the voting into two parts? At first, vote whether we need mapAccumM/forAccumM, and then vote for implementation or discuss it at the merge request?

In my opinion, Data.Traversable is the most fitting module. Putting the functions there would follow the pattern for the functions mapM and forM, which also have both the Traversable and Monad constraints. They are defined in Data.Traversable and re-exported by the module Control.Monad. I do not have a strong opinion if they should be re-exported.

@Bodigrim
Copy link
Collaborator

Yes, we can have separate votes, but please put up a draft MR first, so that we know that there exists at least one working implementation of the proposal.

@lykahb
Copy link
Author

lykahb commented May 31, 2022

Submitted a merge request https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8384.

@treeowl
Copy link

treeowl commented May 31, 2022

Your decision to use a lazy state transformer needs explanation/justification.

@treeowl
Copy link

treeowl commented May 31, 2022

To be more explicit, I think this should probably use a strict state transformer, like Control.Monad.Trans.State.Strict. The lazy one is only good for rather special purposes, and I think is quite likely to lead to memory leaks in this context.

@lykahb
Copy link
Author

lykahb commented May 31, 2022

I believe that the lazy versions of the functions are the default. And there is a convention that the lazy functions have a plain name, and the strict ones have a ' suffix. The related module Data.Foldable has foldMap/foldMap', foldr/foldr'.

Do you suggest adding the strict versions or making the transformer strict by default?

As for the details of the implementation I think that it is better to re-use an existing implementation of the state monad, rather than create a new one. So, the definitions are copied from transformers library.

@treeowl
Copy link

treeowl commented May 31, 2022

The strict state monad transformer doesn't force the state; it only forces the pairs. It should be thought of as the "default" state monad transformer. The lazy version is a weird beast. It doesn't strictly obey the monad laws, and it tends to break intuition about what gets calculated when. As a general rule, it should only be used when its special behavior is specifically desired. It would be valuable to consider actual applications for mapAccumM. Which ones would benefit from a lazy state transformer? How counterintuitive will that laziness be in contexts where it's not needed?

@lykahb
Copy link
Author

lykahb commented May 31, 2022

This is a convincing argument. The lazy variant seems to be more general, and it is the default in the transformers library. But for the use cases I've seen the laziness wouldn't be useful. Updated the MR to use the strict state transformer.

@tomjaguarpaw
Copy link
Member

tomjaguarpaw commented Jun 1, 2022

The lazy variant seems to be more general

I believe that it's the other way round: the strict variant is more general, in the sense that you can use the strict variant to implement a variant with the behaviour and space asymptotics of the lazy version, but not vice versa.

EDIT: Having thought about it for a bit I suspect that neither can be used to implement the other.

@konsumlamm
Copy link
Contributor

Since Data.Traversable has mapAccumL/mapAccumR, would it make sense to also have left/right variants here?

@lykahb
Copy link
Author

lykahb commented Jun 3, 2022

It seems to me that the functions mapAccumM/forAccumM are similar to mapM/forM/sequence/sequenceA and other traversals that have an Applicative or Monad constraint. They go from left to right. The right-to-left functions like foldr can be powerful tools when there is laziness. For the monadic traversals where the evaluation is done for every element, the right variant wouldn't have more expressive power. Edit: evaluation is not necessarily done for every element if the effects short-circuit.

If someone needs to traverse a structure right-to-left, they reverse the structure. I'd be curious to see the use cases for right-to-left monadic traversal in the wild. I couldn't find examples of it with search on hoogle and github. If it is done manually with recursion, it would be hard to find, though.

@lykahb
Copy link
Author

lykahb commented Jun 22, 2022

I updated the implementation to address the feedback three weeks ago. The Merge Request introduces these two functions:

mapAccumM
  :: (Monad m, Traversable t)
  => (s -> a -> m (s, b)) -> s -> t a -> m (s, t b)

forAccumM
  :: (Monad m, Traversable t)
  => s -> t a -> (s -> a -> m (s, b)) -> m (s, t b)

Without feedback I do not have work to do for this proposal. Should this be put to vote?

@Bodigrim
Copy link
Collaborator

With regards to prior art and impact, Hackage Search reveals that mapAccumM was implemented (with small variations) in 19 packages. There is however no prior art at all for forAccumM.

@lykahb sorry for delay. I posted this proposal to Reddit, hopefully we can source a bit more feedback before putting this to vote.

@why-not-try-calmer
Copy link

With regards to prior art and impact, Hackage Search reveals that mapAccumM was implemented (with small variations) in 19 packages. There is however no prior art at all for forAccumM.

@lykahb sorry for delay. I posted this proposal to Reddit, hopefully we can source a bit more feedback before putting this to vote.

Please don't forget about https://discourse.haskell.org/. It's a great fit for such decisions.

@benjamin-hodgson
Copy link

A minor suggestion: It took me a moment to think about why the constraint needed to be Monad and not Applicative. Just wanted to suggest adding a note to the documentation explaining why that is. (Although it may be the sort of thing where the skills required to understand the docs are the same as the skills required to figure it out for yourself, idk.)

@Bodigrim
Copy link
Collaborator

Bodigrim commented Jun 25, 2022

Please don't forget about https://discourse.haskell.org/. It's a great fit for such decisions.

Feel free to post a link there, but please encourage to discuss the proposal here, not on Discourse.

@Bodigrim
Copy link
Collaborator

Bodigrim commented Jul 6, 2022

@lykahb happy to trigger a vote?

@lykahb
Copy link
Author

lykahb commented Jul 6, 2022

@Bodigrim I updated the proposal to include both mapAccumM and forAccumM. This matches the implementation in MR. Happy to trigger a vote.

@treeowl
Copy link

treeowl commented Jul 7, 2022

As I commented on the MR, the Functor instance for StateT should have a Monad constraint. You can then write fmap = liftM (or inline that; whatever). Currently it introduces weird laziness.

@Bodigrim
Copy link
Collaborator

I agree that instance Functor StateT should be expressed via Monad, but it's unused anyway, so this is a tangential question. Let's get to the crux first.


Dear CLC members, the proposal is to expand Data.Traversable with

mapAccumM
  :: forall m t s a b. (Monad m, Traversable t)
  => (s -> a -> m (s, b))
  -> s -> t a -> m (s, t b)
mapAccumM f s t = coerce (mapM @t @(StateT s m) @a @b) (StateT #. flip f) t s

and

forAccumM
  :: (Monad m, Traversable t)
  => s -> t a -> (s -> a -> m (s, b)) -> m (s, t b)
forAccumM s t f = mapAccumM f s t

The full implementation is available at https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8384/diffs

Please vote on these two additions separately. If you disagree with the proposal, please indicate whether some changes may sway your opinion (e. g., move to Control.Monad or somewhere else, rename, change constraints).

@tomjaguarpaw @chessai @mixphix @emilypi @cgibbard


+1 for mapAccumM, -1 for forAccumM for the reasons explained in #65 (comment): there are plenty or prior art for mapAccumM and none for forAccumM.

@cgibbard
Copy link
Contributor

+1 for both. I think given the rest of the contents of the module, lacking a for-variant would seem out of place.

@treeowl
Copy link

treeowl commented Jul 12, 2022

I agree that instance Functor StateT should be expressed via Monad, but it's unused anyway, so this is a tangential question. Let's get to the crux first.

I agree that it's tangential, but your claim that it's not used is unjustified. mapM for the relevant container may use it.

@mixphix
Copy link
Collaborator

mixphix commented Jul 12, 2022

+1 for both!

@lykahb
Copy link
Author

lykahb commented Jul 13, 2022

Updated the MR with fmap = liftM.

@tomjaguarpaw
Copy link
Member

+1 for both


The prior art for mapAccumM is convincing (and there is additional prior art for the same function under the name mapAccumLM) and I don't think we should add the map version without the for version.

@tomjaguarpaw
Copy link
Member

Even more tangential ...

but your claim that it's not used is unjustified. mapM for the relevant container may use it.

How? StateT and its functions are not exposed in the public API.

@treeowl
Copy link

treeowl commented Jul 14, 2022

@tomjaguarpaw traverse may (lawfully) use fmap, pure, <*>, and/or liftA2. mapM may additionally (lawfully) use >>=.

@tomjaguarpaw
Copy link
Member

Ah, interesting. Of course, the point of the Monad interface is so that it can be used without direct access to the concrete operations and the underlying type. I think it's the fact that there are two interacting typeclasses (Monad and Traversable) that makes this unintuitive for me.

@Bodigrim
Copy link
Collaborator

@chessai @emilypi just a gentle reminder to vote.

@chessai
Copy link
Member

chessai commented Jul 15, 2022

+1 for both

@Bodigrim
Copy link
Collaborator

...which gives as 5 votes in favor of mapAccumM and 4 votes in favor of forAccumM, so both are approved. Thanks all.

@Bodigrim Bodigrim added the approved Approved by CLC vote label Jul 16, 2022
ghc-mirror-bot pushed a commit to ghc/ghc that referenced this issue Jul 19, 2022
ghc-mirror-bot pushed a commit to ghc/ghc that referenced this issue Jul 19, 2022
@chshersh
Copy link
Member

I'm trying to summarise the state of this proposal as part of my volunteering effort to track the progress of all approved CLC proposals.

Field Value
Authors @lykahb
Status merged
base version 4.18.0.0
Merge Request (MR) https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8384
Blocked by nothing
CHANGELOG entry present
Migration guide not needed

Please, let me know if you find any mistakes 🙂

@chshersh chshersh added the base-4.18 Implemented in base-4.18 (GHC 9.6) label Mar 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Approved by CLC vote base-4.18 Implemented in base-4.18 (GHC 9.6)
Projects
None yet
Development

No branches or pull requests