-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix out-of-memory error when building hundreds of pages #2969
Conversation
Deploy preview ready! Built with commit e51c389 |
Deploy preview ready! Built with commit e51c389 |
04917e5
to
5161f69
Compare
So why would 7dd5f3b slow down running queries? Seems like that'd just slow down inferring the schema. This change seems pretty reasonable except I'd prefer a change that limited the number of queries we're running in parallel to something like 10-15 (testing would help) as that would still reduce peak memory usage a lot while not slowing things down as much. |
I'm working on adding perf testing fairly soon. It'll measure peak & on-going memory usage + build times. That'll make it easier to work on PRs like this and ensure we're actually improving things. |
Perhaps try Bluebird's concurrency limit http://bluebirdjs.com/docs/api/promise.map.html#map-option-concurrency You can also record peak memory using https://stackoverflow.com/questions/774556/peak-memory-usage-of-a-linux-unix-process |
Without this fix, the peak is 1.5GB, which is the node.js memory limit :) I'll try out different concurrency settings and report back the times and memory usage. |
@KyleAMathews Here are some results from my Macbook Pro with 2.6GHz dual-core i5 CPU for ~460 pages with json and remark plugins. Obviously, these results are far from "scientific" and could be completely different if someone has e.g. 10 pages, or different queries. Promise.mapSeries: 450MB and 55s
Promise.map with concurrency 2: 446MB and 51s
Promise.map with concurrency 3: 465MB and 52s
Promise.map with concurrency 5: 440MB and 54s
Promise.map with concurrency 10: 465MB and 58s
Promise.map with concurrency 100: 870MB and 88s
Promise.map with concurrency 250: 1526MB and 101s
From these results it looks like concurrency of 2 works best, though maybe setting it to something like |
@@ -90,8 +90,9 @@ const runQueriesForIds = ids => { | |||
return Promise.resolve() | |||
} | |||
const state = store.getState() | |||
return Promise.all( | |||
ids.map(id => { | |||
return Promise.mapSeries( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should maybe try and not depend on bluebird's static methods as it makes switching to the native promise eventually harder
d800ff6
to
8ab36c6
Compare
Deploy preview failed. Built with commit db8a1ae4b69555935f1061dafa41731725df094e https://app.netlify.com/sites/using-glamor/deploys/5a1c3fc28198760772b713bd |
I can confirm this fixes memory issues when trying to build ~1470 pages 👍 |
db8a1ae
to
e4265ed
Compare
This is really great! Thanks for diving into Gatsby's innards and finding the problem — been thinking a lot recently about how to add automated perf testing to Gatsby which will make it a lot easier to track down problem spots like this + prevent regressions. Per @jquense's comment — could you use async.js' concurrency controls instead of Bluebird's non-standard Promise concurrency? |
Also the # of CPUs don't matter since JS is single-threaded. So please back that out. |
@KyleAMathews I don't really know why you don't want to keep using bluebird, if this code runs only on the server side (you could also import is as |
I'm ok using bluebird as a utils libraries the issue is that it's assuming the global promise object is a bluebird promise, which makes switching to just the platform promise harder. E.g. We want a lodash approach not prototype extensions |
(and yes other code does this too :) we want to limit its spread) |
Yeah — we don't want to use Bluebird's promise extensions. The only reason we switched to it was part for performance and part for its better error messages — both of which are improving in Node core so I can definitely see us pulling out Bluebird in the future which would speed up Gatsby some as we'd be loading less JS. |
e4265ed
to
78fd94a
Compare
@KyleAMathews @jquense Please review it again :) |
78fd94a
to
e51c389
Compare
Looks great! Thanks for the investigation and staying with us on the back and forth on this PR! :-) |
The latest version of Gatsby crashes when trying to build ~500 pages using filesystem source plugin and json transformer plugin. The same error has been reported in #2796. The issue was most likely introduced in 7dd5f3b, though I haven't been able to figure out where exactly. However, the time to finish
run graphql queries
step is significantly longer in 1.9.33 than in 1.9.32 (for a number of pages that doesn't cause 1.9.33 to crash). It looks like it's caused byqueryRunner
function, which is called for every page, but if there are too many pages, it never gets resolved.This is probably more of a workaround rather than a real fix, because whatever Gatsby is doing there, it probably shouldn't use up to 1.5GB memory to process 500 JSON files, even if it's processing them all in parallel. At least now the build finishes! :) Also, it takes ~16-20s to run the
run graphql queries
for these 500 pages in 1.9.118 with this fix, while in 1.9.32 it takes ~2s. However, as I mentioned earlier, it's (mostly) caused by changes introduced in 1.9.33, not by this fix.