Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Fake a monad transformer #32

Merged
merged 5 commits into from
Jun 10, 2021

Conversation

ivanbakel
Copy link
Contributor

@ivanbakel ivanbakel commented Jun 9, 2021

Motivation

At my work, we were running lots of Fakes in a monad stack to generate very large amounts of data, but it was quite slow. When I did some profiling, I found that the biggest bottlenecks were work that fakedata does every time you generate a Fake - namely, parse the YAML files for the data, and build a cache for accessing it.

That meant that we should have been running all of our code inside a single Fake, to take advantage of the cache - but then we would have had to run our monad code in IO, since that's what's required by Fake. So instead, we created this monad transformer:

newtype FakeT m a = FakeT (FakerSettings -> m a)

which works just like Fake, but with any monad stack we want, and still shares the cache.

We then lifted our Fakes to FakeTs

class MonadFake m where
  liftFake :: Fake a -> m a

to allow us to run them using the FakerSettings from FakerT.

Using this monad transformer, we were able to share a single FakerSettings, including the cache, across lots of Fake values, without having to rewrite everything in terms of IO. This was a massive performance improvement. Since it was useful to us, I thought it would be good to contribute back to upstream.

Breaking change

This is a breaking change, since importing Fake(..) is no longer enough to import the Fake constructor (which is now actually a pattern synonym).

Future work

I still want to go though Faker.Combinators to generalise them to FakerT m instead of Fake, but I thought I would share these changes first. This is done now.

Copy link
Member

@psibi psibi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We can ignore the CI issue for the nightly builds, I have left a minor comment and also the changelog needs update. Thank you!

@@ -609,6 +610,7 @@ library
, template-haskell
, text
, time
, transformers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do the changes in package.yaml instead since we generate the cabal file from there ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I commit the updated cabal file as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the generated cabal file needs to be updated.

Copy link
Member

@psibi psibi Jun 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can you add a performance note about this (similar to what you mentioned in the PR description) in README.md:

Using this monad transformer, we were able to share a single FakerSettings, including the cache, across lots of Fake values, without having to rewrite everything in terms of IO. This was a massive performance improvement. Since it was useful to us, I thought it would be good to contribute back to upstream.

And also give an example of a good performance and a non good performance code using this MR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I've moved those changes to package.yaml, including a couple of dependencies that were only in the .cabal file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now added a README section which describes how to use FakeT for better performance - let me know what you think.

This is a use-case that came up in my work, where we were generating
lots of 'Fake' values. The trouble with running 'Fake' is that parsing
YAML and building the cache are very expensive, but all the effort
involved is thrown away after the 'Fake' is run - so it makes sense to
generate as much data as possible all in the same 'Fake'. But if you're
using a monad stack, you would have to put 'Fake' at the bottom - when
instead it makes more sense as a transformer, which can share a single
YAML parser and cache between many 'Fake' runs.

This adds the 'FakeT' monad transformer, and makes 'Fake' a synonym for
@FAKET IO@, preserving as much backwards-compatibility as possible with
functions and pattern synonyms.
Following on from the 'FakeT' transformer, it makes sense to be able to
"lift" the default 'Fake' values into the monad stack, in order to allow
them to actually share the transformer cache and generator. This is done
by the 'MonadFake' class, which comes with an instance for 'FakeT', as
well as some of the other transformers defined in the `transformers`
package.
This allows users to use combinators on any 'FakeT', instead of just the
base 'Fake' monad.
@psibi psibi merged commit d27461f into fakedata-haskell:master Jun 10, 2021
@psibi
Copy link
Member

psibi commented Jun 10, 2021

Thanks!

README.md Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants