
```
                             Alexey Kuleshevich
                             
                           Boulder Haskell Meetup
                           
                             15th of July 2025
                        
                       Overhaul of the `random` package
                             
                             
                             
```

## History

### Ancient

* At some point in the 90s `Random` module came into being and was included with ghc for many years until about a decade ago. Latest known versions of `random` that were wired in with GHC:
  * `random-1.1` briefly in `ghc-8.4.2` and `ghc-8.2.2`
  * before that `<= random-1.0.0.3` in `ghc-7.0.4` and older.

* February 1999 - Original version made it into [Haskell 98
  report](https://www.haskell.org/onlinereport/random.html) and is included in
  [`haskell98`](https://hackage.haskell.org/package/haskell98) package.

* July 2005 - first ticket on ghc tracker
  [#427](https://gitlab.haskell.org/ghc/ghc/-/issues/427) about `Random.StdGen` being
  slow, which hasn't been fixed until now.

* November of 2007 -
  [`random-1.0.0.0`](https://hackage.haskell.org/package/random-1.0.0.0) is available on
  Hackage


### Recent

* December 2019 - I wrote a blog post about how terrible `random`'s performance is: [Random benchmarks](https://alexey.kuleshevi.ch/blog/2019/12/21/random-benchmarks/).

* February 2020 - Dominic Steinitz confirmed my benchmark results from the blogpost and asked if I'd be willing to collaborate on getting those issues with `random` resolved.

* February 2020 - June 2020: Dominic Stenitz, Leonhard Markert and myself worked really hard on improving `random` by solving it's most serious issues.

* 23rd of June of 2020 - all the hard work paid off and culminated in [`random-1.2.0`](https://hackage.haskell.org/package/random-1.2.0) being released.
  
* 6th of January 2025 - [`random-1.3.0`](https://hackage.haskell.org/package/random-1.3.0) is released with a few new cool features and improvements.

## Original interface

A type class for Pseudo Random Number Generator (PRNG) implementers:

```haskell
class RandomGen g where
  genRange :: g -> (Int, Int)
  next     :: g -> (Int, g)
  split    :: g -> (g, g)
```

A type class for values that can be generated using any pure PRNG that provides `RandomGen`
instance:

```haskell
class Random a where
  randomR :: RandomGen g => (a, a) -> g -> (a, g)
  random  :: RandomGen g => g -> (a, g)

  randomRs :: RandomGen g => (a, a) -> g -> [a]
  randoms  :: RandomGen g => g -> [a]

  randomRIO :: (a, a) -> IO a
  randomIO  :: IO a
```

### Using original interface
#### Sample data type

In [1]:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE NamedFieldPuns #-}
import Text.Printf

newtype AreaCode = AreaCode { unAreaCode :: Int }
  deriving (Eq, Show, Num)

-- | The North American Numbering Plan phone, eg. +1-555-123-4567
data Phone = Phone { phoneAreaCode :: AreaCode
                   , phoneLocalNumber :: Int
                   }

instance Show Phone where
  show Phone {phoneAreaCode, phoneLocalNumber} =
    let areaCode = unAreaCode phoneAreaCode
        (phoneSuffix, phonePostfix) = phoneLocalNumber `quotRem` 10000
     in printf "+1-%03d-%03d-%04d" areaCode phoneSuffix phonePostfix

#### Generating random data

In [2]:
import System.Random

randomPhone :: RandomGen g => [AreaCode] -> g -> (Phone, g)
randomPhone areaCodes g =
  let (i, g') = randomR (0, length areaCodes - 1) g
      (phoneLocalNumber, g'') = randomR (0, 9999999) g'
   in (Phone {phoneAreaCode = areaCodes !! i, phoneLocalNumber}, g'')

In [3]:
gen = mkStdGen 2025
(tollFreePhone, gen') = randomPhone [800,833,844,855,866,877,888] gen
tollFreePhone

+1-844-322-2926

In [4]:
(coloradoPhone, gen'') = randomPhone [303,720] gen'
coloradoPhone

+1-303-721-3261

#### Splitting pure generator

In [5]:
:set -Wno-deprecations
let (gen1, gen2) = split gen''
-- Run in parallel in separate threads or even computers
randomPhone [800,833,844,855,866,877,888] gen1
randomPhone [800,833,844,855,866,877,888] gen2

(+1-800-475-6817,StdGen {unStdGen = SMGen 2829820480049211708 12496735338297977245})

(+1-877-206-7584,StdGen {unStdGen = SMGen 4870536165017295068 492325715607432473})

## Issues that we wanted to solve

* Improve quality of the default pseudo-random number generator (PRNG)

* Fix abysmally slow performance

* Design replacement for an inaccurate `Random` type class

* Design an interface that can also work with stateful generators

## Problem
### StdGen

Original PRNG implementation:

* Bad quality of randomness

* Generated only ~31bits of data at a time (`[1, 2147483562]` range)

* Bad performance characteristics

* Splitting produced sequences that were not independent


### StdGen (Solution)

Switched to [`splitmix`](https://hackage.haskell.org/package/splitmix)
package for `StdGen` implementation:

* Very good quality of randomness (passes most tests, eg. Dieharder)

* Fastest PRNG implementation in Haskell

* Generates 64bits of random data in one iteration

* Splitting produces independent sequences

## Problem
### Poor performance

* Previous implementation of `StdGen` was slow.

* Generation of **all** types went through `Integer`.

* `genRange` was a historical mistake that was necessary for some PRNGs that produced values
  in unusual ranges, like the original StdGen

### Performance (Solutions)

* Switched `StdGen` to `splitmix`, as was already mentioned earlier.

* Used [bitmask with rejection technique](https://www.pcg-random.org/posts/bounded-rands.html)
  for generating values in custom ranges.

* Major redesign of `RandomGen` class:

  * Deprecated `genRange` and `next` in favor of `genWord64` and/or `genWord32`

  * Made generation of other bit widths customizable: `genWord[8|16|32|64]` and `genWord[32|64]R`.

  * Allowed customization of array generation with `genShortByteString` and later with `unsafeUniformFillMutableByteArray`. More on this later.

### Performance (Solution)

New definition of `RandomGen` class (default implementations are omitted):

```haskell
class RandomGen g where
  {-# MINIMAL split,(genWord32|genWord64|(next,genRange)) #-}
  genWord8 :: g -> (Word8, g)
  genWord16 :: g -> (Word16, g)
  genWord32 :: g -> (Word32, g)
  genWord64 :: g -> (Word64, g)
  genWord32R :: Word32 -> g -> (Word32, g)
  genWord64R :: Word64 -> g -> (Word64, g)
  -- | @since 1.3.0
  unsafeUniformFillMutableByteArray :: 
    MutableByteArray s -> Int -> Int -> g -> ST s g

  -- Deprecated:
  genRange :: g -> (Int, Int)
  next :: g -> (Int, g)
  split :: g -> (g, g)
  -- | @since 1.2.0
  genShortByteString :: Int -> g -> (ShortByteString, g)
```

*Note* - None of these functions need to be used directly and only affect PRNG implementers. Thus regular users don't really need to worry about these redesigns and deprecations.

## Problems
### Incorrect interface

`Random` class is expected to produce uniform distribution, but:

* `Integer` has infinitely many values (bad `random`)
* `Double`/`Float` (bad `random`):
  * try to represent real values, which have an infinite range
  * floating point values have a limited range, eg. `-5.0e-324` to `5.0e-324` for `Double`
  * representable values are not equidistant from each other
  * special values `+/-Infinity`, `NaN` and `-0`
* Custom types, eg. `Uuid` or `RGB`  (bad `randomR`)

### Incorrect interface (Solution)

Correct interface is to separate `Random` class into two concepts: `Uniform` and `UniformRange`.

```haskell
class Uniform a where
  uniformM :: StatefulGen g m => g -> m a

class UniformRange a where
  uniformRM :: StatefulGen g m => (a, a) -> g -> m a
  -- | @since 1.3.0
  isInRange :: (a, a) -> a -> Bool
```

Taking this approach makes it possible for `Integer`, `Float` and `Double` to get
instances for `UniformRange` class, but not for `Uniform`.

In [6]:
import System.Random.Stateful
import Data.Word

data RGB a = RGB { red   :: a
                 , green :: a
                 , blue  :: a
                 } deriving (Eq, Show)

instance Uniform a => Uniform (RGB a) where
  uniformM g = RGB <$> uniformM g <*> uniformM g <*> uniformM g

In [7]:
let gen = mkStdGen 1234
fst $ uniform gen :: RGB Word8

RGB {red = 103, green = 35, blue = 253}

**Notes**:

* `StatefulGen` vs `RandomGen`: monadic vs pure
* There were suggestions of deprecating `Random`. We thought it would result in too much breakage.

```haskell
uniformM :: (Uniform a, StatefulGen g m) => g -> m a
```
```haskell
uniform :: (Uniform a, RandomGen g) => g -> (a, g)
```

## Problem
### Stateful monadic generators lack interface

* State transformer monad - when pure generator is the actual state in `StateT` from `transformers` or `RandT` from `MonadRandom`

* Mutable variables - when pure generator is stored in a `STRef`, `IORef` or `TVar`

* Generators that depend on a large mutable state that has to be stored in something like a mutable vector. Here are packages that provide such generators in Haskell:
  
  * [`mwc-random`](http://hackage.haskell.org/package/mwc-random)
  * [`pcg-random`](http://hackage.haskell.org/package/pcg-random)
  * [`sfmt`](https://hackage.haskell.org/package/sfmt)
  * [`mersenne-random`](https://hackage.haskell.org/package/mersenne-random)

## State transformer monad

Passing generator around manually is very inconvenient.

In [8]:
uniformPhone :: RandomGen g => [AreaCode] -> g -> (Phone, g)
uniformPhone areaCodes g =
  let (i, g') = uniformR (0, length areaCodes - 1) g
      (phoneLocalNumber, g'') = uniformR (0, 9999999) g'
   in (Phone {phoneAreaCode = areaCodes !! i, phoneLocalNumber}, g'')

 Natural solution to this problem is to use either `StateT` or `RandT` monad

In [9]:
import Control.Monad.State.Strict

uniformPhone :: RandomGen g => [AreaCode] -> g -> (Phone, g)
uniformPhone areaCodes =
  runState $ do
    i <- state $ uniformR (0, length areaCodes - 1)
    phoneLocalNumber <- state $ uniformR (0, 9999999)
    pure Phone {phoneAreaCode = areaCodes !! i, phoneLocalNumber}

## New approach

Using `StateT` vs `StatefulGen` and `UniformRange`:

In [10]:
uniformPhoneMonadState :: 
  (RandomGen g, MonadState g m) => [AreaCode] -> m Phone
uniformPhoneMonadState areaCodes = do
  i <- state $ uniformR (0, length areaCodes - 1)
  phoneLocalNumber <- state $ uniformR (0, 9999999)
  pure Phone {phoneAreaCode = areaCodes !! i, phoneLocalNumber}

In [11]:
uniformPhoneM :: StatefulGen g m => [AreaCode] -> g -> m Phone
uniformPhoneM areaCodes gen = do
  i <- uniformRM (0, length areaCodes - 1) gen
  phoneLocalNumber <- uniformRM (0, 9999999) gen
  pure Phone {phoneAreaCode = areaCodes !! i, phoneLocalNumber}

```haskell
runState :: State g a -> g -> (a, g)
```

In [12]:
gen = mkStdGen 1234
runState (uniformPhoneMonadState [303, 720]) gen
runState (uniformPhoneM [303, 720] StateGenM) gen

(+1-720-623-8461,StdGen {unStdGen = SMGen 2239102407005124922 13478418381427711195})

(+1-720-623-8461,StdGen {unStdGen = SMGen 2239102407005124922 13478418381427711195})

```haskell
runStateGen :: 
  RandomGen g => g -> (StateGenM g -> State g a) -> (a, g)
```

Note that the `StateGenM` constructor, is merely a proxy type:

```haskell
data StateGenM g = StategGenM
```

In [13]:
runStateGen gen (uniformPhoneM [303, 720])

(+1-720-623-8461,StdGen {unStdGen = SMGen 2239102407005124922 13478418381427711195})

### StateGenM and MTL

Because `StatefulGen` instance for `StateGenM` utilizes `MonadState` underneath, it composes very nicely with other monad transformers:

In [14]:
{-# LANGUAGE FlexibleContexts #-}
import Control.Monad.Reader

uniformPhoneReaderM ::
  (MonadReader [AreaCode] m, StatefulGen g m) => g -> m Phone
uniformPhoneReaderM gen = do
  areaCodes <- ask
  uniformPhoneM areaCodes gen

{- HLINT ignore "Eta reduce" -}
uniformPhoneMTL :: RandomGen g => [AreaCode] -> g -> (Phone, g)
uniformPhoneMTL areaCodes gen = 
  runReader (runStateT (uniformPhoneReaderM StateGenM) gen) areaCodes

uniformPhoneMTL' :: RandomGen g => [AreaCode] -> g -> (Phone, g)
uniformPhoneMTL' areaCodes gen = 
  runReader (runStateGenT gen uniformPhoneReaderM) areaCodes

## New approach - What's the point?

**Claim is**: this new approach works not only with pure `RandomGen`
generators, but also with the true mutable ones as well!

### Mutable variables
#### STGenM - state thread monad

For example when we are working in `ST` monad:

In [15]:
import Control.Monad.ST
import Data.STRef

uniformPhoneST :: 
  RandomGen g => STRef s [AreaCode] -> STGenM g s -> ST s Phone
uniformPhoneST areaCodesRef stGen = do
  areaCodes <- readSTRef areaCodesRef
  phoneNumber <- uniformPhoneM areaCodes stGen
  modifySTRef' areaCodesRef (filter (/= phoneAreaCode phoneNumber))
  pure phoneNumber

uniformPhone :: RandomGen g => [AreaCode] -> g -> (Phone, g)
uniformPhone areaCodes g = runSTGen g $ \stGen -> do
  areaCodesRef <- newSTRef areaCodes
  uniformPhoneST areaCodesRef stGen

#### AtomicGenM - Concurrency 

Concurrent setup cannot be done with `StateT` or `ST`:

In [16]:
import Control.Concurrent.Async (replicateConcurrently)

uniformPhones :: RandomGen g => [AreaCode] -> g -> IO [Phone]
uniformPhones areaCodes g = do
  atomicGen <- newAtomicGenM g
  n <- uniformRM (1, 5) atomicGen
  replicateConcurrently n (uniformPhoneM areaCodes atomicGen)

In [17]:
uniformPhones [303, 720] (mkStdGen 12)

[+1-303-255-4052,+1-720-302-4107,+1-303-962-5381]

### StatefulGen explained

`StatefulGen` kind of looks like a monadic version of `RandomGen` class

```haskell
class Monad m => StatefulGen g m where
  {-# MINIMAL (uniformWord32|uniformWord64) #-}
  uniformWord32R :: Word32 -> g -> m Word32
  uniformWord64R :: Word64 -> g -> m Word64
  uniformWord8 :: g -> m Word8
  uniformWord16 :: g -> m Word16
  uniformWord32 :: g -> m Word32
  uniformWord64 :: g -> m Word64
  uniformByteArrayM :: Bool -> Int -> g -> m ByteArray
```

### StatefulGen instances 

```haskell
(RandomGen g, MonadState g m) => StatefulGen (StateGenM g) m
(RandomGen g, MonadIO m)      => StatefulGen (IOGenM g) m
(RandomGen g, MonadIO m)      => StatefulGen (AtomicGenM g) m
RandomGen g                   => StatefulGen (STGenM g s) (ST s)
RandomGen g                   => StatefulGen (TGenM g) STM
```

We already saw examples of using `StateGenM g`, `STGenM g s` and `AtomicGenM g`:

### Mutable vs Immutable

If a mutable generator also has a frozen immutable counterpart, there is an interface for bridging one to another. Stateful generators provided by `random` out of the box:

```haskell
data StateGenM g = StategGenM
newtype StateGen g = StateGen g

newtype STGenM g s = STGenM (STRef s g)
newtype STGen g = STGen g

newtype IOGenM g = IOGenM (IORef g)
newtype IOGen g = IOGen g

newtype AtomicGenM g = AtomicGenM (IORef g)
newtype AtomicGen g = IOGen g

newtype TGenM g = TGenM (TVar g)
newtype TGen g = TGen g
```

### FrozenGen and ThawedGen

This is where `FrozenGen` comes in:

```haskell
class StatefulGen (MutableGen f m) m => FrozenGen f m where
  {-# MINIMAL (modifyGen | (freezeGen, overwriteGen)) #-}
  type MutableGen f m = (g :: Type) | g -> f
  freezeGen :: MutableGen f m -> m f
  -- | @since 1.3.0
  modifyGen :: MutableGen f m -> (f -> (a, f)) -> m a
  -- | @since 1.3.0
  overwriteGen :: MutableGen f m -> f -> m ()
```

```haskell
-- | @since 1.3.0
class FrozenGen f m => ThawedGen f m where
  thawGen :: f -> m (MutableGen f m)
```

### FrozenGen and ThawedGen (continued)

```haskell
withMutableGen :: ThawedGen f m => f -> (MutableGen f m -> m a) -> m (a, f)
withMutableGen fg action = do
  g <- thawGen fg
  res <- action g
  fg' <- freezeGen g
  pure (res, fg')
```

In [25]:
import Control.Concurrent.STM

atomically $ 
  withMutableGen (TGen (mkStdGen 1)) (uniformPhoneM [505, 575])

(+1-575-603-9916,TGen {unTGen = StdGen {unStdGen = SMGen 17906158935611293232 10451216379200822465}})

## Generators that depend on a large mutable state

`mwc-random-0.15.0.1` is capable of using this new interface because it provides these two instances:

```haskell
instance (s ~ PrimState m, PrimMonad m) => StatefulGen (MWC.Gen s) m

instance PrimMonad m => FrozenGen MWC.Seed m where
  type MutableGen MWC.Seed m = MWC.Gen (PrimState m)
```

In [19]:
import System.Random.MWC (createSystemSeed)
seed <- createSystemSeed
(phone, seed') <- withMutableGen seed (uniformPhoneM [800,888])
print phone

+1-888-829-6420

## Seed

Ability to convert PRNGs to a `ByteArray` and back:

```haskell
newtype Seed g = Seed ByteArray

class ( KnownNat (SeedSize g)
      , 1 <= SeedSize g, Typeable g
      ) => SeedGen g where
  type SeedSize g :: Nat
  {-# MINIMAL (fromSeed, toSeed) | (fromSeed64, toSeed64) #-}
  fromSeed :: Seed g -> g
  toSeed :: g -> Seed g
  fromSeed64 :: NonEmpty Word64 -> g
  toSeed64 :: g -> NonEmpty Word64
```

### Seed usage example

Initially we create a generator, use it and then store its representation into a file:

In [20]:
import qualified Data.ByteString as BS

(_phone, gen) = uniformPhone [303,720] (mkStdGen 12345678)
BS.writeFile "demo-seed.bin" $ unSeedToByteString $ toSeed gen


Then we can conveniently restore it from file, use it and write its modified version back to the same file:

In [21]:
withSeedFile @(IOGen StdGen) "demo-seed.bin" $ \seed -> 
  withSeedMutableGen seed (uniformShuffleListM [1..10])

[5,2,4,8,3,6,7,10,1,9]

### Random binary data

A naive approach to generate random binary data, a.k.a. `ByteString`, is to generate a
list of `Word8`s and pack it:

In [22]:
randomByteStringNaive :: RandomGen g => Int -> g -> BS.ByteString
randomByteStringNaive n = BS.pack . take n . randoms

This approach is very inefficient for two reasons:

* In order to generate `Word8` we have to generate 64bits of random data, which means 56
  of perfectly good random bits are discarded for every byte in the `ByteString`.
* Intermediate list, if not fused will cause unnecessary allocations

### Random binary data (Solution)

Allocate a chunk of memory of a desired size and write 64bits at a time,
while making sure that the machine's CPU endianness does not affect the outcome.

In [23]:
seed <- createSystemSeed
mwcGen <- thawGen seed
BS.unpack <$> uniformByteStringM 15 mwcGen

[97,145,49,37,25,122,71,20,239,164,80,164,226,52,81]

In [24]:
BS.unpack $ runStateGen_ (mkStdGen 2021) (uniformByteStringM 15)

[78,232,117,189,13,237,63,84,228,82,19,36,191,5,128]

## Not all PRNGs are splittable

Example implementation of `split` in `mersenne-random-pure64`:
```haskell
instance RandomGen PureMT where
  split = error "System.Random.Mersenne.Pure: unable to split the mersenne twister"
```

Another example from `pcgen`:
```haskell
instance RandomGen PCGen where
    -- The only real spec here is that the two result generators be
    -- dissimilar from each other and also from the input generator.
    -- So we just do some nonsense shuffling around to achieve that.
    split gen@(PCGen state inc) = ... 
    -- no statistical foundation for this!
```

### Solution is `SplitGen`

Creating a separate type class solves this issue for us:

```haskell
class RandomGen g where
  ...
  -- Deprecated:
  split :: g -> (g, g)

class RandomGen g => SplitGen g where
  splitGen :: g -> (g, g)
```

### A few words on `randomM` and `randomRM`

Thanks to `FrozenGen` it was possible to also create monadic versions of functions from `Random` class:

```haskell
randomM :: forall a g m. (Random a, RandomGen g, FrozenGen g m) =>
  MutableGen g m -> m a
randomRM :: forall a g m. (Random a, RandomGen g, FrozenGen g m) =>
  (a, a) -> MutableGen g m -> m a
```

Note - in `random-1.2.x` `RandomGenM` class was used to define these functions, which has now been deprecated

## Stats for `random-1.2`

After 5 months of work:
* [266 commits](https://github.com/idontgetoutmuch/random/compare/v1.1...v1.2-proposal)
* 150 total pull requests with a [100 of them merged](https://github.com/idontgetoutmuch/random/pulls?q=is%3Apr+is%3Amerged).
* Closed 6 existing issues and partially addressed at least 3 more.
* Exploration in API design yielded a discovery of 1 bug in GHC:
  [#18021](https://gitlab.haskell.org/ghc/ghc/-/issues/18021)

## Summary of achievements

* The quality of generated random values got much better
* Astonishing performance improvement. It only took 15 years since it was first reported
  being slow.
* Interface has been expanded dramatically
* Amount of documentation was increased quite a bit
* Modern test and benchmark suites have been added
* Very little breakage, majority of the functionality was kept backwards compatible.

### Since `random-1.2` release:

* Generic deriving for `Uniform` and `UniformRange` classes
* Further improvements to floating point number generation
* Type safety for distinguishing splittable vs non-splittable generators
* Mutable generator that works in `STM`.
* Support for serialization of generators with `SeedGen` type class

## Potential future plans

* Interface for generating complex data structures, eg. lists, arrays, trees, etc.
* More operations (eg. picking random element, randomly selected sub-lists or sub-maps, etc.)
* Interface for lazy generators (eg. QuickCheck style that uses splittability)
* Support of splittability in `Random`, `Uniform` and `UniformRange` instances
* Reader like interface that avoids passing around mutable generator.



-------

# Thank You!


### Questions?

-------