Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V1.2 proposal #62

Merged

Conversation

lehins
Copy link
Contributor

@lehins lehins commented May 27, 2020

Context

Following @lehins' performance analysis of Haskell pseudo-random number libraries and the ensuing discussion, @lehins, @idontgetoutmuch and @curiousleo with help from @Shimuuar set out to improve random as both an interface for and implementation of a pseudo-random number generator for Haskell.

Our goals were to fix #25 (filed in 2015) and #51 (filed in 2018), see "Quality" and "Performance" below.

In the process of tackling these two issues, we addressed a number of other issues too (see "Other issues addressed" below) and added a monadic interface to the library so monadic pseudo-random number generators can be used interchangeably with random, see "API changes" below.

This PR is the result of that effort. The changes are considerable. To signal this, we propose to release this as version 1.2 (the previous released version is 1.1, from 2014).

However, the API changes are generally backwards-compatible, see "Compatibility" below.

Quality (#25)

We created an environment for running statistical pseudo-random number generator tests, tested random v1.1 and splitmix using dieharder, TestU01, PractRand and other test suites and recorded the results.

The results clearly show that the split operation in random v1.1 produces pseudo-random number generators which are correlated, corroborating #25. The split operation in splitmix showed no weakness in our tests.

As a result, we replaced the pseudo-random number generator implementation in random by the one provided by splitmix.

Performance (#51)

@lehins' performance analysis has the data for random v1.1. It is slow, and using faster pseudo-random number generators via random v1.1 makes them slow.

By switching to splitmix and improving the API, this PR speeds up pseudo-random number generation with random by one to three orders of magnitude, depending on the number type. See Benchmarks for details.

API changes

StatefulGen

The major API addition in this PR is the definition of a new class StatefulGen:

-- | 'StatefulGen' is an interface to monadic pseudo-random number generators.
class Monad m => StatefulGen g m where
  {-# MINIMAL (uniformWord32|uniformWord64) #-}

  uniformWord32 :: g -> m Word32 -- default implementation in terms of uniformWord64
  uniformWord64 :: g -> m Word64 -- default implementation in terms of uniformWord32
  -- plus methods for other word sizes and for byte strings
  -- all have default implementations so the MINIMAL pragma holds

Conceptually, in StatefulGen g m, g is the type of the generator, and m the underlying monad.

This definition is generic enough to accommodate, for example, the Gen type from mwc-random, which itself abstracts over the underlying primitive monad and state token. This is the full instance declaration (provided here as an example - this instance is not part of random as random does not depend on mwc-random):

instance (s ~ PrimState m, PrimMonad m) => StatefulGen (MWC.Gen s) m where
  uniformWord8 = MWC.uniform
  uniformWord16 = MWC.uniform
  uniformWord32 = MWC.uniform
  uniformWord64 = MWC.uniform
  uniformShortByteString n g = unsafeSTToPrim (genShortByteStringST n (MWC.uniform g))

Four StatefulGen instances ("monadic adapters") are provided for pure generators to enable their use in monadic code. The documentation describes them in detail.

Uniform and UniformRange

The Random typeclass has conceptually been split into Uniform and UniformRange. The Random typeclass is still included for backwards compatibility. Uniform is for types where it is possible to sample from the type's entire domain; UniformRange is for types where one can sample from a specified range.

Changes left out

There were changes we considered and decided against including in this PR.

Some pseudo-random number generators are splittable, others are not. A good way of communicating this is to have a separate typeclass, Splittable, say, which only splittable generators implement. After long discussions (see this issue and this PR), we decided against adding Splittable: the interface changes would either have been backwards-incompatible or too complex. For now, split stays part of the RandomGen typeclass. The new documentation suggests that split should call error if the generator is not splittable.

Due to floating point rounding, generating a floating point number in a range can yield surprising results. There are techniques to generate floating point numbers in a range with actual guarantees, but they are more complex and likely slower than the naive methods, so we decided to postpone this particular issue.

Ranges on the real number line can be inclusive or exclusive in the lower and upper bound. We considered API designs that would allow users to communicate precisely what kind of range they wanted to generate. This is particularly relevant for floating point numbers. However, we found that such an API would make more sense in conjunction with an improved method for generating floating point numbers, so we postponed this too.

Compatibility

We strove to make changes backwards compatible where possible and desirable.

The following changes may break existing packages:

  • import clashes, e.g. with the new functions uniform and uniformR
  • randomIO and randomRIO where extracted outside of Random class as separate functions, which means some packages need to adjust how they are imported
  • StdGen is no longer an instance of Read
  • requires base >= 4.10 (GHC-8.2)

In addition, genRange and next have been deprecated.

We have built all of Stackage against the code in this PR, and confirmed that no other build breakages occurred.

For more details, see this comment and the "Compatibility" section in the docs.

Other issues addressed

This PR also addresses #26, #44, #53, #55, #58 and #59, see Issues Addressed for details.

@Shimuuar
Copy link
Contributor

Loss of ability to save/restore state for mutable generators is regrettable since now each generator will have to create separate implementations and it makes much more difficult to implement API for uniform generator initialization etc.

I think that crux of problem is that generator for pure PRNG is just some normal value of type g and mutable PRNGs are wrappers over some mutable buffer so they're generally parametrized by state token and we get something like Gen s. So PRNG is represented by Gen :: * -> * while Gen s is type concrete instance of PRNG.

@curiousleo
Copy link
Contributor

Just to make the difference completely clear, here is the direct comparison between the original proposal (#61) and the version without the state token parameter (this PR).

From #61:

-- | 'MonadRandom' is an interface to monadic pseudo-random number generators.
class Monad m => MonadRandom g s m | g m -> s where
-- | Represents the state of the pseudo-random number generator for use with
-- 'thawGen' and 'freezeGen'.
--
-- @since 1.2
type Frozen g = (f :: Type) | f -> g
{-# MINIMAL freezeGen,thawGen,(uniformWord32|uniformWord64) #-}
-- | Restores the pseudo-random number generator from its 'Frozen'
-- representation.
--
-- @since 1.2
thawGen :: Frozen g -> m (g s)
-- | Saves the state of the pseudo-random number generator to its 'Frozen'
-- representation.
--
-- @since 1.2
freezeGen :: g s -> m (Frozen g)
-- | @uniformWord32R upperBound g@ generates a 'Word32' that is uniformly
-- distributed over the range @[0, upperBound]@.
--
-- @since 1.2
uniformWord32R :: Word32 -> g s -> m Word32
uniformWord32R = unsignedBitmaskWithRejectionM uniformWord32

This PR:

-- | 'MonadRandom' is an interface to monadic pseudo-random number generators.
class Monad m => MonadRandom g m where
-- | @uniformWord32R upperBound g@ generates a 'Word32' that is uniformly
-- distributed over the range @[0, upperBound]@.
--
-- @since 1.2
uniformWord32R :: Word32 -> g -> m Word32
uniformWord32R = unsignedBitmaskWithRejectionM uniformWord32

@haskell haskell locked and limited conversation to collaborators Jun 3, 2020
@haskell haskell deleted a comment from Profpatsch Jun 3, 2020
@haskell haskell deleted a comment from lehins Jun 3, 2020
@cartazio
Copy link
Contributor

cartazio commented Jun 3, 2020

For those who don’t read the news, nyc is a bit complicated this past month or two and even more so this past week, with curfew. There is no sane planet where expecting me to be responsive and fast is possible when there’s that sort of stuff going one.

I’ve deleted the comments that aren’t around technical discourse (and are oblivious to larger world events that might lead to slow replies).

@cartazio
Copy link
Contributor

cartazio commented Jun 3, 2020

To be clear: I’m keen to focus on this, but the past few weeks in nyc are not a reasonable environment to facilitate that.

@haskell haskell unlocked this conversation Jun 3, 2020
@cartazio
Copy link
Contributor

cartazio commented Jun 3, 2020

Pardon this morning, I was a bit grumpy and was dealing with some asthma and packing this morning. Reviewing this squashed patch this evening

@Boarders
Copy link

Boarders commented Jun 4, 2020

@Shimuuar : is there any way to add this back in the form of a super class with the lost functionality put back in or is the argument simply over whether this is good design in the first place?

@lehins
Copy link
Contributor Author

lehins commented Jun 4, 2020

@Boarders I was literally just discussing this on a private channel. Were you listening? ;P
See my comment about this on PR that handles community feedback adjustments: idontgetoutmuch#144 (comment)

@lehins lehins changed the title V1.2 squashed proposal V1.2 proposal Jun 4, 2020
@lehins
Copy link
Contributor Author

lehins commented Jun 4, 2020

Current master has been reset to the version of random-1.1, which was used as base for this proposal. All work that conflicts with current proposal has been backed up in https://github.com/haskell/random/tree/master-backup branch.

This PR now has all of git commits since beginning of proposal, which we will probably squash as one mega commit before merging into master. For now we encourage anyone with interest in the newly added API to review and comment on this PR. Looking forward to some feedback

I also update the description of this PR to reflect the actual proposal.

@idontgetoutmuch
Copy link
Member

Great work!

@lehins lehins mentioned this pull request Jun 4, 2020
@ulysses4ever
Copy link
Contributor

@lehins the text of the first post now reads:

The major API addition in this PR is the definition of a new class MonadRandom:

    ...
    class Monad m => StatefulGen g m where

Note that StatefulGen != MonadRandom, which is a bit confusing.

@lehins
Copy link
Contributor Author

lehins commented Jun 5, 2020

@ulysses4ever Thanks for catching it, I updated the text. he class was renamed recently to avoid the what turned out to be a common confusion of what the intended use for the class should be. If you see MonadRandom in some other place, please ping me, it needs to be renamed to StatefulGen.

@ulysses4ever
Copy link
Contributor

@lehins sure! Just to be clear, I meant this very page (#62). And it still references MonadRandom in several places in the first post.

@Bodigrim Bodigrim self-requested a review June 5, 2020 20:31
@chessai
Copy link
Member

chessai commented Jun 5, 2020

@lehins the link to the PR in the original comment is dead, presumably because it has been deleted. also, a hyperlink for StatefulGen is still labelled 'MonadRandom'.

@chessai chessai self-requested a review June 5, 2020 22:29
@chessai
Copy link
Member

chessai commented Jun 5, 2020

This is very thorough! Going to be digesting this for a while.

First thought: Is GHC >= 8.2 required just for -XTypeFamilyDependencies? Or is there more to it. Ideally in core libraries we would have a wider support window.

.travis.yml Outdated Show resolved Hide resolved
performance is over x1000 times better; the minimum performance
increase for the types listed below is more than x36.

Name | 1.1 Mean | 1.2 Mean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These numbers are simply phenomenal.

@@ -1,3 +1,112 @@
# 1.2

1. Breaking change which mostly maintains backwards compatibility, see
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these changes are great!

@Shimuuar
Copy link
Contributor

I would say that approaches are pretty much equivalent in power. StatefulGen dispatches on function parameter and requires carrying that parameter around. MonadRandom dispatches on monad types and requires monad transformer/custom newtypes/etc.

@chessai
Copy link
Member

chessai commented Jun 19, 2020

@chessai No worries. I should construct a puzzle in the readme "Try to implement this [...] with both StatefulGen and MonadRandom" and point to it everytime someone asks me this question ;)

I actually think something along these lines would be a good idea. It's a pretty natural question and one that might get brought up in the issue tracker. Having something to point to and then "Close and comment" would be pretty useful.

@chessai chessai closed this Jun 19, 2020
@chessai chessai reopened this Jun 19, 2020
@chessai
Copy link
Member

chessai commented Jun 19, 2020

Oops, when I wrote "Close and comment", I had a brain malfunction and ended up hitting the "Close and comment" button.

@lehins
Copy link
Contributor Author

lehins commented Jun 19, 2020

I would say that approaches are pretty much equivalent in power.

Not quite.

  • It is impossible to use MonadRandom m together with the concept of FrozenGen f m

requires monad transformer/custom newtypes/etc.

  • This means that I a user needs to use a custom monad transformer, which is always more involved.
  • It is impossible to use two different RNG types in the same monad
  • You are forced to use a transformer and lift . lift . lift .. ugliness to mix with other monads
  • ... probably other things that I have not yet considered

Until someone can show me an example where MonadRandom is at least a little bit more user friendly than StatefulGen it will always be strictly less powerful in my books.

@chessai
Copy link
Member

chessai commented Jun 19, 2020

I would say that approaches are pretty much equivalent in power.

Not quite.

  • It is impossible to use MonadRandom m together with the concept of FrozenGen f m

requires monad transformer/custom newtypes/etc.

  • This means that I a user needs to use a custom monad transformer, which is always more involved.
  • It is impossible to use two different RNG types in the same monad
  • You are forced to use a transformer and lift . lift . lift .. ugliness to mix with other monads
  • ... probably other things that I have not yet considered

Until someone can show me an example where MonadRandom is at least a little bit more user friendly than StatefulGen it will always be strictly less powerful in my books.

This would be another great set of points to include in any explanation.

@lehins
Copy link
Contributor Author

lehins commented Jun 19, 2020

@Shimuuar Here is a puzzle for you. Use Gen from mwc-random and Gen frompcg-random directly in IO with MonadRandom. You can't, which means it is less powerful. Try to use them in different monads, but without defining newtypes. You can't, which means it is less powerful.

@Shimuuar
Copy link
Contributor

It is impossible to use MonadRandom m together with the concept of FrozenGen f m

It's very much possible. But it requires structuring library differently and to expose only frozen variant of generator. Mutable generator is implementation detail hidden in some reader monad

class MonadRandom g m | m -> g where
  save    :: m g 
  restore :: g -> m ()
  ...

requires monad transformer/custom newtypes/etc.

  • This means that I a user needs to use a custom monad transformer, which is always more involved.

True. I however don't see it as a big problem. In pre-DerivingVia era I would probably agree with you.

  • It is impossible to use two different RNG types in the same monad

And this is example of more power! However StatefulGen is somewhat limited in this respect. It's impossible to use SplitMix and say some LCG without resorting to IO/ST tricks with IORefs.

  • You are forced to use a transformer and lift . lift . lift .. ugliness to mix with other monads

Whole idea of this style of API is to rely on typeclasses to do all the lifting.

@Shimuuar Here is a puzzle for you. Use Gen from mwc-random and Gen frompcg-random directly in IO with MonadRandom. You can't, which means it is less powerful. Try to use them in different monads, but without defining newtypes. You can't, which means it is less powerful.

It's like saying walk over there but don't move your legs. In API that dispatches on type of monad one have to define new types. Mirror image is puzzle add generation of randomness in some function 3-layers deep in call graph but don't pass any extra parameters manually.

@lehins
Copy link
Contributor Author

lehins commented Jun 19, 2020

@Shimuuar, with @chessai we were discussing this MonadRandom, not this one class MonadRandom g m | m -> g where, which you and I have discussed before here. Latter one is a bit more powerful than the linked one, but still less general than StatefulGen g m, for all the transformer related notes.

It's like saying walk over there but don't move your legs.

You never know, one might be able to walk on hands ;P

@Shimuuar The whole point is that stack of transformers is not user friendly AT ALL. And even if you cook up a proof that the two classes are equivalent in their power with definition of newtype wrappers and such, from the user's perspective, either class MonadRandom m where or class MonadRandom g m | m -> g where will always going be harder and cumbersome to use than class MoandRandom g m where (i.e StatefulGen). For all my projects at work and the personal ones as well I use ReaderT IO pattern (i.e. RIO) and it has been a blessing ever since I've started doing that. Both of the suggested classes would be a drag to use with such a pattern. As a proof of concept I was able to integrate StatefulGen into a project with 30K lines of code that is structured with RIO and uses randomness quite a bit. It took me only a few hours to integrate it. Sticking a transformer in that project would make it a significantly harder task. So, they are definitely not equivalent.

@idontgetoutmuch
Copy link
Member

What about MonadRandom.

@chessai This has been discussed on many occasions. MonadRandom is strictly less powerful than StatefulGen is. Brent can effectively deprecate it or continue using it. It doesn't matter

FYI @chessai @lehins I wrote to @byorgey about this just so he is aware.

@Shimuuar
Copy link
Contributor

You never know, one might be able to walk on hands ;P

That's my point! It will look ridiculous and won't work well.

For all my projects at work and the personal ones as well I use ReaderT IO pattern

I think this is focal point of disagreement. MonadRandom is natural choice for projects structured in mtl-style: (MonadThis m, MonadThat m) => m Whatever it however doesn't work at all for RIO since there's no reasonable instance for IO or for that matter RIO monad. StatefulGen is natural choice for RIO but is somewhat awkward with mtl-style.

Similarly intergating MonadRandom into mtl-style project seems to be very easy task. Slap MonadRandom into context here and there and add instance. Even with deriving machinery it seems to be rather simple task

In the end all word have been said and all we can to is to agree to disagree

@lehins
Copy link
Contributor Author

lehins commented Jun 20, 2020

StatefulGen is natural choice for RIO but is somewhat awkward with mtl-style.

@Shimuuar You are really wrong about this! Behold the proof.

Let's look at a ludicrous example using mtl style programming and mix it with RandT and getRandom from MonadRnadom package:

randomWithRejectionN :: (Random a, RandomGen g, Monad m) => Int -> (a -> m Bool) -> g -> m ([a], g)
randomWithRejectionN n f g = fmap swap $ runWriterT $ flip execRandT g $ evalStateT go 0
  where
    go = do
      i <- get
      when (i < n) $ do
        v <- lift getRandom
        reject <- lift $ lift $ lift $ f v
        unless reject $ do
          lift (lift (tell [v]))
          put (i + 1)
        go

Running it we produce a list of odd numbers in a very convoluted way.

λ> randomWithRejectionN 5 (pure . even) (mkStdGen 217) :: IO ([Int8], StdGen)
([37,3,23,65,45],StdGen {unStdGen = SMGen 18077526032776641972 15251669095119325999})

Question is, what do we need to do in order to use StatefulGen instead of MonadRandom in the above example.

  1. Naturally we need to create an instance for RandT, but what would g be in this case? For pure RNGs, g acts simply as a proxy which allows us to select the appropriate StatefulGen instance, while for true mutable generators it would be the actual state (i.e mutable vector). In other words we need to define it, let's call it RandGenM
data RandGenM g = RandGenM

instance (Monad m, RandomGen g) => StatefulGen (RandGenM g) (RandT g m) where
  uniformWord32R r = applyRand (genWord32R r)
  ...

instance (Monad m, RandomGen g) => RandomGenM (RandGenM g) g (RandT g m) where
  applyRandomGenM = applyRand

applyRand :: Applicative m => (g -> (a, g)) -> RandGenM g -> RandT g m a
applyRand f _ = liftRandT (pure . f)

Note that the whole instance is isomorphic to StateGenM, which is expected because RandT is a wrapper around StateT

  1. Now let's look at what we need to change in our function:
randomWithRejectionN ::
     forall a g m. (Random a, RandomGen g, Monad m) => Int -> (a -> m Bool) -> g -> m ([a], g)
randomWithRejectionN n f g =
  fmap swap $ runWriterT $ flip execRandT g $ evalStateT go 0
  where
    go = do
      i <- get
      when (i < n) $ do
        v <- lift $ randomM (RandGenM :: RandGenM g)
        reject <- lift $ lift $ lift $ f v
        unless reject $ do
          lift (lift (tell [v]))
          put (i + 1)
        go

That's right, the only change that was needed is: getRandom -> randomM (RandGenM :: RandGenM g) and of course invoking this function when compared to the original will give us identical results.

Defining some helper functions as I did for StateGenM (eg. runStateGen) would let us avoid ScopedTypeVariables, and adding ReaderT into the mix would even hide this proxy, but that is besides the point.

@Shimuuar With this concrete example on the screen, can you tell me how MonadRandom is more friendly for mtl?

And that is what I keep trying to say to you, StatefulGen is just like MonadRandom, but on steroids. It is more general and user friendly. I can use it with RIO, mtl, mwc-random, ... and at most I have to do is define a proxy like type.

If I wanted to go the other way around and try to define MWC.Gen for MonadRandom I would have to define the whole new monad transformer, which is not only an unnecessary complexity for the user, but also a potential performance overhead.

I really hope with this example we can agree to agree.

PS. This is a way to add ReaderT into the mix in order to hide the proxy type. This time I used StateGenM and as you can see not much have changed from RandT versions above.

askRandom :: (Random a, RandomGenM g r m, MonadReader g m) => m a
askRandom = ask >>= randomM

randomWithRejectionN ::
     (Random a, RandomGen g, Monad m) => Int -> (a -> m Bool) -> g -> m ([a], g)
randomWithRejectionN n f g =
  fmap swap $ runWriterT $ execStateGenT g $ runReaderT $ evalStateT go 0
  where
    go = do
      i <- get
      when (i < n) $ do
        v <- lift $ askRandom
        reject <- lift $ lift $ lift $ lift $ f v
        unless reject $ do
          lift (lift (tell [v]))
          put (i + 1)
        go

CC @chessai hopefully this demystifies the relation of MonadRandom and StatfulGen a bit. I'll add these examples into readme sometime later after we merge this proposal.

@Bodigrim
Copy link
Contributor

Well, random did not provide MonadRandom before. The question of doing so is orthogonal to this PR and can potentially be raised even against the existing design. I encourage interested parties to create a new issue and discuss it there. Let's stick to the agenda, folks, this discussion is long enough and multithreaded already.

From my point of view, once idontgetoutmuch#169 lands into this PR (which is really couple of keystrokes away, Levenshtein distance is 8), we are good to merge.

@Shimuuar
Copy link
Contributor

But example above is strawman of mtl. Its central idea is to rely on type classes to dispatch function. So lift $ lift $ lift mean one's working with bare transformers. In mtl style I would expect something like:

randomWithRejectionN ::
     (Random a, MonadRandom g m) => Int -> (a -> m Bool) -> m [a]
randomWithRejectionN n f g = replicateM n $ go n
  where
    go 0 = return []
    go n = do 
      x <- randomM
      reject <- f v
      if reject then go (n-1) else (x:) <$> go (n-1)

PRNG state is abstracted away, no needless complication from gratuitous Writer/State monad. I think that StatefulGen approach will have vaguely similar structure.

I still maintain that both approaches have similar expressive power.

P.S. While this exchange is quite interesting it shouldn't affect merge of this PR I think

@lehins
Copy link
Contributor Author

lehins commented Jun 20, 2020

@Bodigrim, I think you are missing the point here. This question came up already a few times and it has to do with the fact that this PR provides StatefulGen, which is an alternative solution to MonadRandom. I think it is important to discuss this question because both of these classes should not live in the same package at the same time. Consequently it means that this question can't be addressed later thus making it very much on the line and not orthogonal. Many people mistakenly think that MonadRandom provides a sufficient interface for all use cases we are trying to handle.

@Shimuuar looks like you are missing the point as well. But whatever, I am not gonna waste my time anymore on trying to convince you.

Putting all that nonsense behind us we are almost ready to get this PR merged with a release that will follow right after.

Here is a summary of what's left to do:

Submit a bunch of precrafted PRs to a few affected repositories (QuickCheck, MonadRandom, uuid, ....)

@curiousleo
Copy link
Contributor

@chessai wrote:

@chessai wrote:

One thing which I think is missing (correct me if I'm wrong): There's a lot of very compelling information about how this approach is better than the old one, just based on issues it addresses/benchmarks, but I think it would be great to have an explanation of what was wrong with the old approach, and why. Additionally, if we are losing anything over the old approach, documenting that would be helpful as well. It's hard to see what looks like almost a strict improvement and not ask what we're losing.

@lehins replied:

To satisfy you curiosity now the reason is that generating random number is much faster when we know that number generated is in the range from 0 to 2^(n-1), i.e. Word* Old approach with next does not have that invariant and forces the library to go through Integer, which is terribly slow.

@curiousleo I don't think this needs to be in the haddocks anywhere, but certainly somewhere as a note in the comments of the source code. Wherever you think would be appropriate.

This is what the Haddock for next says right now:

  -- | Returns an 'Int' that is uniformly distributed over the range returned by
  -- 'genRange' (including both end points), and a new generator. Using 'next'
  -- is inefficient as all operations go via 'Integer'. See
  -- [here](https://alexey.kuleshevi.ch/blog/2019/12/21/random-benchmarks) for
  -- more details. It is thus deprecated.
  next :: g -> (Int, g)

https://github.com/idontgetoutmuch/random/blob/83052c15f0692f41c98ec6c048e541bc47449850/src/System/Random/Internal.hs#L106-L112

It's not mentioning rejection sampling, and the explanation is short, but I think the message that next is to be avoided comes across. I do not know how much of the performance hit compared to the new word-sized API is due to Integer conversion and how much of it is due to rejection sampling. However, in benchmarks where rejection sampling is required with the new API, the new API is still orders of magnitude faster. This makes it likely that the Integer conversion is really responsible. In that sense, I think this Haddock actually captures the problem with next well. I see no reason to add more explanatory text right now.

This patch is mostly backwards compatible. See "Breaking Changes" below
for the full list of backwards incompatible changes.

This patch fixes quality and performance issues, addresses additional
miscellaneous issues, and introduces a monadic API.

Issues addressed
================

Priority issues fixed in this patch:

- Title: "The seeds generated by split are not independent"
  Link:  haskell#25
  Fixed: changed algorithm to SplitMix, which provides a robust 'split'
  operation

- Title: "Very low throughput"
  Link:  haskell#51
  Fixed: see "Performance" below

Additional issues addressed in this patch:

- Title: "Add Random instances for tuples"
  Link:  haskell#26
  Addressed: added 'Uniform' instances for up to 6-tuples

- Title: "Add Random instance for Natural"
  Link:  haskell#44
  Addressed: added 'UniformRange' instance for 'Natural'

- Title: "incorrect distribution of randomR for floating-point numbers"
  Link:  haskell#53
  Addressed: see "Regarding floating-point numbers" below

- Title: "System/Random.hs:43:1: warning: [-Wtabs]"
  Link:  haskell#55
  Fixed: no more tabs

- Title: "Why does random for Float and Double produce exactly 24 or 53 bits?"
  Link:  haskell#58
  Fixed: see "Regarding floating-point numbers" below

- Title: "read :: StdGen fails for strings longer than 6"
  Link:  haskell#59
  Addressed: 'StdGen' is no longer an instance of 'Read'

Regarding floating-point numbers: with this patch, the relevant
instances for 'Float' and 'Double' sample more bits than before but do
not sample every possible representable value. The documentation now
clearly spells out what this means for users.

Quality (issue 25)
==================

The algorithm [1] in version 1.1 of this library fails empirical PRNG
tests when used to generate "split sequences" as proposed in [3].

SplitMix [2] passes the same tests. This patch changes 'StdGen' to use
the SplitMix implementation provided by the splitmix package.

Test batteries used: dieharder, TestU1, PractRand.

[1]: P. L'Ecuyer, "Efficient and portable combined random number
generators". https://doi.org/10.1145/62959.62969

[2]: G. L. Steele, D. Lea, C. H. Flood, "Fast splittable pseudorandom
number generators". https://doi.org/10.1145/2714064.2660195

[3]: H. G. Schaathun, "Evaluation of splittable pseudo-random
generators". https://doi.org/10.1017/S095679681500012X

Performance (issue 51)
======================

The "improvement" column in the following table is a multiplier: the
improvement for 'random' for type 'Float' is 1038, so this operation is
1038 times faster with this patch.

| Name                    | Mean (1.1) | Mean (patch) | Improvement|
| ----------------------- | ---------- | ------------ | ---------- |
| pure/random/Float       |         30 |         0.03 |        1038|
| pure/random/Double      |         52 |         0.03 |        1672|
| pure/random/Integer     |         43 |         0.33 |         131|
| pure/uniform/Word       |         44 |         0.03 |        1491|
| pure/uniform/Int        |         43 |         0.03 |        1512|
| pure/uniform/Char       |         17 |         0.49 |          35|
| pure/uniform/Bool       |         18 |         0.03 |         618|

API changes
===========

StatefulGen
-----------

This patch adds a class 'StatefulGen':

    -- | 'StatefulGen' is an interface to monadic pseudo-random number generators.
    class Monad m => StatefulGen g m where
      uniformWord32 :: g -> m Word32 -- default implementation in terms of uniformWord64
      uniformWord64 :: g -> m Word64 -- default implementation in terms of uniformWord32
      -- plus methods for other word sizes and for byte strings
      -- all have default implementations so the MINIMAL pragma holds

In 'StatefulGen g m', 'g' is the type of the generator and 'm' the underlying
monad.

Four 'StatefulGen' instances ("monadic adapters") are provided for pure
generators to enable their use in monadic code. The documentation
describes them in detail.

FrozenGen
---------

This patch also introduces a class 'FrozenGen':

    -- | 'FrozenGen' is designed for stateful pseudo-random number generators
    -- that can be saved as and restored from an immutable data type.
    class StatefulGen (MutableGen f m) m => FrozenGen f m where
      type MutableGen f m = (g :: Type) | g -> f
      freezeGen :: MutableGen f m -> m f
      thawGen :: f -> m (MutableGen f m)

'f' is the type of the generator's state "at rest" and 'm' the underlying
monad. 'MutableGen' is defined as an injective type family via 'g -> f' so for
any generator 'g', the type 'f' of its at-rest state is well-defined.

Both 'StatefulGen' and 'FrozenGen' are generic enough to accommodate, for
example, the 'Gen' type from the 'mwc-random' package, which itself abstracts
over the underlying primitive monad and state token. The documentation shows
the full instances.

'Uniform' and 'UniformRange'
----------------------------

The 'Random' typeclass has conceptually been split into 'Uniform' and
'UniformRange'. The 'Random' typeclass is still included for backwards
compatibility. 'Uniform' is for types where it is possible to sample
from the type's entire domain; 'UniformRange' is for types where one can
sample from a specified range.

Breaking Changes
================

This patch introduces these breaking changes:

* requires 'base >= 4.8' (GHC-7.10)
* 'StdGen' is no longer an instance of 'Read'
* 'randomIO' and 'randomRIO' where extracted from the 'Random' class into
  separate functions

In addition, there may be import clashes with new functions, e.g. 'uniform' and
'uniformR'.

Deprecations
============

This patch introduces 'genWord64', 'genWord32' and similar methods to
the 'RandomGen' class. The significantly slower method 'next' and its
companion 'genRange' are now deprecated.

Co-authored-by: Alexey Kuleshevich <alexey@kuleshevi.ch>
Co-authored-by: idontgetoutmuch <dominic@steinitz.org>
Co-authored-by: Leonhard Markert <curiousleo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The seeds generated by split are not independent
9 participants