Make `Fake` a monad transformer #32

ivanbakel · 2021-06-09T19:12:29Z

Motivation

At my work, we were running lots of Fakes in a monad stack to generate very large amounts of data, but it was quite slow. When I did some profiling, I found that the biggest bottlenecks were work that fakedata does every time you generate a Fake - namely, parse the YAML files for the data, and build a cache for accessing it.

That meant that we should have been running all of our code inside a single Fake, to take advantage of the cache - but then we would have had to run our monad code in IO, since that's what's required by Fake. So instead, we created this monad transformer:

newtype FakeT m a = FakeT (FakerSettings -> m a)

which works just like Fake, but with any monad stack we want, and still shares the cache.

We then lifted our Fakes to FakeTs

class MonadFake m where
  liftFake :: Fake a -> m a

to allow us to run them using the FakerSettings from FakerT.

Using this monad transformer, we were able to share a single FakerSettings, including the cache, across lots of Fake values, without having to rewrite everything in terms of IO. This was a massive performance improvement. Since it was useful to us, I thought it would be good to contribute back to upstream.

Breaking change

This is a breaking change, since importing Fake(..) is no longer enough to import the Fake constructor (which is now actually a pattern synonym).

Future work

~~I still want to go though Faker.Combinators to generalise them to FakerT m instead of Fake, but I thought I would share these changes first.~~ This is done now.

psibi

LGTM. We can ignore the CI issue for the nightly builds, I have left a minor comment and also the changelog needs update. Thank you!

psibi · 2021-06-10T06:40:50Z

fakedata.cabal

@@ -609,6 +610,7 @@ library
    , template-haskell
    , text
    , time
+    , transformers


Can you do the changes in package.yaml instead since we generate the cabal file from there ?

Should I commit the updated cabal file as well?

Yes, the generated cabal file needs to be updated.

Also, can you add a performance note about this (similar to what you mentioned in the PR description) in README.md:

Using this monad transformer, we were able to share a single FakerSettings, including the cache, across lots of Fake values, without having to rewrite everything in terms of IO. This was a massive performance improvement. Since it was useful to us, I thought it would be good to contribute back to upstream.

And also give an example of a good performance and a non good performance code using this MR.

Alright, I've moved those changes to package.yaml, including a couple of dependencies that were only in the .cabal file.

I've now added a README section which describes how to use FakeT for better performance - let me know what you think.

@FAKET

This is a use-case that came up in my work, where we were generating lots of 'Fake' values. The trouble with running 'Fake' is that parsing YAML and building the cache are very expensive, but all the effort involved is thrown away after the 'Fake' is run - so it makes sense to generate as much data as possible all in the same 'Fake'. But if you're using a monad stack, you would have to put 'Fake' at the bottom - when instead it makes more sense as a transformer, which can share a single YAML parser and cache between many 'Fake' runs. This adds the 'FakeT' monad transformer, and makes 'Fake' a synonym for @FAKET IO@, preserving as much backwards-compatibility as possible with functions and pattern synonyms.

Following on from the 'FakeT' transformer, it makes sense to be able to "lift" the default 'Fake' values into the monad stack, in order to allow them to actually share the transformer cache and generator. This is done by the 'MonadFake' class, which comes with an instance for 'FakeT', as well as some of the other transformers defined in the `transformers` package.

This allows users to use combinators on any 'FakeT', instead of just the base 'Fake' monad.

psibi · 2021-06-10T15:30:19Z

Thanks!

README.md

psibi requested changes Jun 10, 2021

View reviewed changes

ivanbakel added 4 commits June 10, 2021 10:55

Add regex-tdfa, QuickCheck to package.yaml

ada99a7

Generalise combinators to FakeT

494201d

This allows users to use combinators on any 'FakeT', instead of just the base 'Fake' monad.

ivanbakel force-pushed the faker-transformer branch from 8c4d2cb to 494201d Compare June 10, 2021 09:59

Add README section about FakeT performance impact

9750120

psibi approved these changes Jun 10, 2021

View reviewed changes

psibi merged commit d27461f into fakedata-haskell:master Jun 10, 2021

psibi reviewed Jun 10, 2021

View reviewed changes

README.md Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `Fake` a monad transformer #32

Make `Fake` a monad transformer #32

ivanbakel commented Jun 9, 2021 •

edited

Loading

psibi left a comment

psibi Jun 10, 2021

ivanbakel Jun 10, 2021

psibi Jun 10, 2021

psibi Jun 10, 2021 •

edited

Loading

ivanbakel Jun 10, 2021

ivanbakel Jun 10, 2021

psibi commented Jun 10, 2021

Make Fake a monad transformer #32

Make Fake a monad transformer #32

Conversation

ivanbakel commented Jun 9, 2021 • edited Loading

Motivation

Breaking change

Future work

psibi left a comment

Choose a reason for hiding this comment

psibi Jun 10, 2021

Choose a reason for hiding this comment

ivanbakel Jun 10, 2021

Choose a reason for hiding this comment

psibi Jun 10, 2021

Choose a reason for hiding this comment

psibi Jun 10, 2021 • edited Loading

Choose a reason for hiding this comment

ivanbakel Jun 10, 2021

Choose a reason for hiding this comment

ivanbakel Jun 10, 2021

Choose a reason for hiding this comment

psibi commented Jun 10, 2021

Make `Fake` a monad transformer #32

Make `Fake` a monad transformer #32

ivanbakel commented Jun 9, 2021 •

edited

Loading

psibi Jun 10, 2021 •

edited

Loading