-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(gatsby): use json-stream-stringify to serialize redux state #9370
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well that was quick!
Would you be able to test this with your local site and validate it improves the scalability of gatsby? Additionally, testing on a smaller site and validating similar performance would be a nice to have, but I don't anticipate this causing a lot of issues.
One other note is that json-stringify-safe handled circular references, whereas I don't think this new approach does. I'm not sure whether that's a huge concern, but something worth considering!
We got a fair number of bug reports about circular references in the past so definitely need to handle that. |
@rametta looks like we can keep the circular-reference safe
You able to check that out and validate it still works with your site? |
@DSchau That's a good idea. I will try that approach and let you guys know. |
Here's another json streaming utility https://www.npmjs.com/package/bfj that has some handling for circular references |
@pieh even better 👌 Thanks for that link! |
@KyleAMathews I agree that this is not ideal to slow down the build, but I think it's worth it. Here is why:
Here are my findings after experimenting with different stringify-ing methods 50,000 Records (~134 MB file)
100,000 Records (~251 MB file)
500,000 Records (~1.24 GB file)
|
But does it actually slows down the builds or just slows down persisting redux state to file? It mentions:
So this might be not bad idea and might actually speed up builds. Plus there are options we can play with to not make it incredibly slow |
Unfortunately in my tests, BFJ was not handling all types of circular references, so I switched my PR to use json-stream-stringify instead, which seems to be handling everything flawlessly, it's a bit slower, but since it's async it should not be a problem |
|
Both, i've updated my little benchmark table above with the new lib |
@rametta would you be able to run performance benchmarks with a sample site (e.g. something in benchmarks) to ensure this doesn't introduce a performance regression! |
@DSchau Here are my results with the create pages benchmark. (After every test, I deleted .cache and public folders manually)
|
Sorry for not being more explicit. Can you use the markdown benchmark instead? The create pages benchmark doesn't add anything basically to the cache so it's not a good test for this. |
E.g. I just tried a 50k markdown site and it OOMed when it tried to write out the cache file. |
@KyleAMathews No problem, so I ran the markdown benchmark a few times, and realized I would have to make each node created have content that was VERY large, so i replaced the
Then when I switched to use my PR, it would build no problem. (Granted I had to provide 12GB of memory to my node command, but at least it mimics the project I am working on now that lead me to find this issue) When my build passed I had a redux-state.json file of about 990MB, whereas the current gatsby would crash anywhere around 500MB file size because that seems to be the limit of JSON.stringify or safe-stringify implementation. |
Awesome! And also no perf regressions? |
I don't think so, if there is, I don't know how to find them |
I meant could you do whay you did in #9370 (comment) with the markdown benchmark |
@KyleAMathews here are the Markdown benchmarks with large amount of data passed into each node
|
@rametta one neat trick I built into that markdown pages benchmark is that you can conditionally (with environment variables) change the amount of content for each page :) So here's how that works:
This tests both, so consider checking it out! (Thanks @KyleAMathews) |
Not sure what you mean by this? |
The raw node content is in the redux state (though not the AST as you mention) |
@KyleAMathews yeah, whoops! I was thinking only the cache was touched here, but updating that env var will increase the size of the generated node too, so it'll actually test both. The other info is accurate :) |
Ah okay, that's good to know, thanks! |
Even without extra rows, the markdown benchmark OOMed somewhere before 50k pages as I mentioned above #9370 (comment). Just tested this PR w/ the same benchmark and 100k pages and it finished just fine! Super exciting! Kyles-MacBook-Pro:markdown (stringify-stream $)$ NUM_PAGES=100000 gatsby build
success open and validate gatsby-configs — 0.006 s
success load plugins — 0.109 s
success onPreInit — 0.715 s
success delete html and css files from previous builds — 0.004 s
success initialize cache — 22.712 s
success copy gatsby files — 0.058 s
success onPreBootstrap — 0.012 s
success source and transform nodes — 82.279 s
success building schema — 5.159 s
success createPages — 29.534 s
success createPagesStatefully — 0.017 s
success onPreExtractQueries — 0.000 s
success update schema — 12.211 s
success extract queries from components — 0.195 s
success run graphql queries — 507.226 s — 100001/100001 197.16 queries/second
success write out page data — 1.296 s
success write out redirect data — 0.001 s
success onPostBootstrap — 0.353 s
info bootstrap finished - 664.473 s
success Building production JavaScript and CSS bundles — 25.580 s
success Building static HTML for pages — 35.440 s — 100001/100001 3060.76 pages/second
info Done building in 726.477 sec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome @rametta!
Will just update yarn.lock again (seems like yarn removed bunch of integrity SHAs when you changed dependency).
Holy buckets, @rametta — we just merged your PR to Gatsby! 💪💜 Gatsby is built by awesome people like you. Let us say “thanks” in two ways:
If there’s anything we can do to help, please don’t hesitate to reach out to us: tweet at @gatsbyjs and we’ll come a-runnin’. Thanks again! |
…sbyjs#9370) This PR changes the way the internal gatsby store is saved to a disk to account for sites that have very large payloads. Closes issue gatsbyjs#9362
I did some work on above and noticed another problem - the way this handles cyclic refs (or rather repeated refs for same object) is cool (see
So I think we should (at least temporarily) revert this and fix those issues before applying this change again, because right now cache is basically broken |
I've run same
So it took additional ~6 minutes to wait for |
…ate (gatsbyjs#9370)" This reverts commit c334075.
PR to revert this: #9896 |
current implementation of using `json-stream-stringify` is causing a lot of problems and I think this should be reverted - see #9370 (comment) shortly put - this does help with not getting OOM, but it essentially make `.cache/redux-state.json` to not work at all
current implementation of using `json-stream-stringify` is causing a lot of problems and I think this should be reverted - see gatsbyjs#9370 (comment) shortly put - this does help with not getting OOM, but it essentially make `.cache/redux-state.json` to not work at all
What’s the best way to incorporate these changes without forking Gatsby? A P.R. To make it an option for large sites? Fix the two issues mentioned by @pieh above? (cyclic refs, a,b,c) |
…sbyjs#9370) This PR changes the way the internal gatsby store is saved to a disk to account for sites that have very large payloads. Closes issue gatsbyjs#9362
current implementation of using `json-stream-stringify` is causing a lot of problems and I think this should be reverted - see gatsbyjs#9370 (comment) shortly put - this does help with not getting OOM, but it essentially make `.cache/redux-state.json` to not work at all
This PR changes the way the internal gatsby store is saved to a disk to account for sites that have very large payloads.
Closes issue #9362