New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2] Hulksmash build slowdowns on larger sites #6226

Merged
merged 32 commits into from Jul 11, 2018

Conversation

Projects
None yet
9 participants
@KyleAMathews
Contributor

KyleAMathews commented Jun 29, 2018

Spent the afternoon and evening going through critical path for creating pages.

Currently a 5000 page site can build in ~37 seconds. A very significant
improvement over the current.

This does necessitate one breaking change namely creating nodes for each page. This frankly
was more of academic interest and for debugging purposes and given that it adds a
large amount of slowdowns and should be rarely if ever used in sites, we should
hopefully be finish letting it go. In any case, if people are using it, there's
far better ways of querying the same data.

TODO

  • Calculate how much faster large sites are with this PR vs. current beta.
  • remove profile calls
  • Do node profiling on adding thousands of pages
  • Do memory profiling on same site.
  • Evaluate why "write out pages data" gets really slow at 25k
  • test freecodecamp
  • add benchmarking folder and first two sites — minimal createPages in memory and larger one that writes out markdown files
  • Add progress indicator for longer-running things like graphql queries + page building w/ final status at end
  • Remove extra packages from benchmark sites
@KyleAMathews

This comment has been minimized.

Show comment
Hide comment
@KyleAMathews

KyleAMathews Jun 29, 2018

Contributor

Deploy preview for using-glamor failed.

Built with commit fa32e5d

https://app.netlify.com/sites/using-glamor/deploys/5b457017c6aed64e9461ef06

Contributor

KyleAMathews commented Jun 29, 2018

Deploy preview for using-glamor failed.

Built with commit fa32e5d

https://app.netlify.com/sites/using-glamor/deploys/5b457017c6aed64e9461ef06

@KyleAMathews

This comment has been minimized.

Show comment
Hide comment
@KyleAMathews

KyleAMathews Jun 29, 2018

Contributor

Hmm and a 25,000 page site builds in ~7.5 minutes. The file for pages metadata is now enormous (~4mb) so we'll need to fix that but this is all very promising.

Contributor

KyleAMathews commented Jun 29, 2018

Hmm and a 25,000 page site builds in ~7.5 minutes. The file for pages metadata is now enormous (~4mb) so we'll need to fix that but this is all very promising.

@KyleAMathews

This comment has been minimized.

Show comment
Hide comment
@KyleAMathews

KyleAMathews Jun 29, 2018

Contributor

Hmmm, gatsbyjs.org isn't building since StaticQueries aren't being run during builds. Is this a known issue? Been heads down last few days.

Contributor

KyleAMathews commented Jun 29, 2018

Hmmm, gatsbyjs.org isn't building since StaticQueries aren't being run during builds. Is this a known issue? Been heads down last few days.

@pieh

This comment has been minimized.

Show comment
Hide comment
@pieh

pieh Jun 29, 2018

Contributor

This does necessitate one breaking change namely creating nodes for each page. This frankly
was more of academic interest and for debugging purposes and given that it adds a
large amount of slowdowns and should be rarely if ever used in sites, we should
hopefully be finish letting it go. In any case, if people are using it, there's
far better ways of querying the same data.

This would break current implementation of sitemap plugin I think - but we can get that info for redux store instead of querying for it. This also would allow us to skip updating schema part (which is needed to add SitePage related queries to schema right now)

Switching nodes reducer to map and mutating it instead of creating new state for every CREATE_NODE action will defenitely help here too

Hmmm, gatsbyjs.org isn't building since StaticQueries aren't being run during builds. Is this a known issue? Been heads down last few days.

I'm not seeing this here

Contributor

pieh commented Jun 29, 2018

This does necessitate one breaking change namely creating nodes for each page. This frankly
was more of academic interest and for debugging purposes and given that it adds a
large amount of slowdowns and should be rarely if ever used in sites, we should
hopefully be finish letting it go. In any case, if people are using it, there's
far better ways of querying the same data.

This would break current implementation of sitemap plugin I think - but we can get that info for redux store instead of querying for it. This also would allow us to skip updating schema part (which is needed to add SitePage related queries to schema right now)

Switching nodes reducer to map and mutating it instead of creating new state for every CREATE_NODE action will defenitely help here too

Hmmm, gatsbyjs.org isn't building since StaticQueries aren't being run during builds. Is this a known issue? Been heads down last few days.

I'm not seeing this here

@m-allanson

This comment has been minimized.

Show comment
Hide comment
@m-allanson

m-allanson Jun 29, 2018

Contributor

Hmmm, gatsbyjs.org isn't building since StaticQueries aren't being run during builds. Is this a known issue? Been heads down last few days.

I'm not seeing this here

Me neither. Seems like nice speedups for .org. Unscientific tests running gatsby build a few times (with warm cache):

current beta
✨  Done in 130.25s.
✨  Done in 97.00s.
✨  Done in 163.13s.

this PR
✨  Done in 72.71s.
✨  Done in 70.16s.
✨  Done in 74.66s.

Edit: oh yeah, gatsby develop doesn't seem quite right. Content changes take a long time and StaticQuery doesn't always update.

Contributor

m-allanson commented Jun 29, 2018

Hmmm, gatsbyjs.org isn't building since StaticQueries aren't being run during builds. Is this a known issue? Been heads down last few days.

I'm not seeing this here

Me neither. Seems like nice speedups for .org. Unscientific tests running gatsby build a few times (with warm cache):

current beta
✨  Done in 130.25s.
✨  Done in 97.00s.
✨  Done in 163.13s.

this PR
✨  Done in 72.71s.
✨  Done in 70.16s.
✨  Done in 74.66s.

Edit: oh yeah, gatsby develop doesn't seem quite right. Content changes take a long time and StaticQuery doesn't always update.

@KyleAMathews

This comment has been minimized.

Show comment
Hide comment
@KyleAMathews

KyleAMathews Jun 29, 2018

Contributor

Switching nodes reducer to map and mutating it instead of creating new state for every CREATE_NODE action will defenitely help here too

Oh, great point. That would speed up creating the SitePage nodes a ton. I'll try that too in this PR before declaring SitePage nodes dead.

Contributor

KyleAMathews commented Jun 29, 2018

Switching nodes reducer to map and mutating it instead of creating new state for every CREATE_NODE action will defenitely help here too

Oh, great point. That would speed up creating the SitePage nodes a ton. I'll try that too in this PR before declaring SitePage nodes dead.

@KyleAMathews

This comment has been minimized.

Show comment
Hide comment
@KyleAMathews

KyleAMathews Jun 29, 2018

Contributor

I'm not seeing this here

Hrrmmm... weird. I deleted node_modules and yarn.lock. Will keep poking at this.

Contributor

KyleAMathews commented Jun 29, 2018

I'm not seeing this here

Hrrmmm... weird. I deleted node_modules and yarn.lock. Will keep poking at this.

@KyleAMathews

This comment has been minimized.

Show comment
Hide comment
@KyleAMathews

KyleAMathews Jun 29, 2018

Contributor

This also would allow us to skip updating schema part (which is needed to add SitePage related queries to schema right now)

This would be nice still. Though it seems your changes have been making the schema generation a lot faster.

Contributor

KyleAMathews commented Jun 29, 2018

This also would allow us to skip updating schema part (which is needed to add SitePage related queries to schema right now)

This would be nice still. Though it seems your changes have been making the schema generation a lot faster.

@KyleAMathews

This comment has been minimized.

Show comment
Hide comment
@KyleAMathews

KyleAMathews Jun 30, 2018

Contributor

Finally got gatsbyjs.org to work — had deleted public/static/d without also deleting the .cache directory 🤦‍♂️

Builds are taking anywhere between ~55-80 seconds on a warm cache. Not bad!

Contributor

KyleAMathews commented Jun 30, 2018

Finally got gatsbyjs.org to work — had deleted public/static/d without also deleting the .cache directory 🤦‍♂️

Builds are taking anywhere between ~55-80 seconds on a warm cache. Not bad!

@gatsbybot

This comment has been minimized.

Show comment
Hide comment
@gatsbybot

gatsbybot Jun 30, 2018

Deploy preview for using-drupal ready!

Built with commit b647bb1

https://deploy-preview-6226--using-drupal.netlify.com

gatsbybot commented Jun 30, 2018

Deploy preview for using-drupal ready!

Built with commit b647bb1

https://deploy-preview-6226--using-drupal.netlify.com

@gatsbybot

This comment has been minimized.

Show comment
Hide comment
@gatsbybot

gatsbybot Jun 30, 2018

Deploy preview for gatsbygram ready!

Built with commit b647bb1

https://deploy-preview-6226--gatsbygram.netlify.com

gatsbybot commented Jun 30, 2018

Deploy preview for gatsbygram ready!

Built with commit b647bb1

https://deploy-preview-6226--gatsbygram.netlify.com

KyleAMathews added some commits Jun 30, 2018

@KyleAMathews

This comment has been minimized.

Show comment
Hide comment
@KyleAMathews

KyleAMathews Jul 4, 2018

Contributor

This is coming along!

Just built a 25k page site in a bit over 2 minutes

success open and validate gatsby-config — 0.082 s
success onPreBootstrap — 0.071 s
success delete html and css files from previous builds — 0.007 s
success copy gatsby files — 0.043 s
success source and transform nodes — 0.019 s
success building schema — 0.133 s
success createPages — 33.258 s
success createPagesStatefully — 0.184 s
success onPreExtractQueries — 0.000 s
success update schema — 0.056 s
success extract queries from components — 0.075 s
success run graphql queries — 21.329 s
success write out page data — 32.011 s
success write out redirect data — 0.001 s
success onPostBootstrap — 0.004 s

info bootstrap finished - 91.063 s

success Building production JavaScript and CSS bundles — 13.668 s
success Building static HTML for pages — 29.937 s
info Done building in 134.79 sec

Creating pages can still probably be made a lot faster. Writing out page data is getting weirdly slow w/ large number of pages but progress.

Contributor

KyleAMathews commented Jul 4, 2018

This is coming along!

Just built a 25k page site in a bit over 2 minutes

success open and validate gatsby-config — 0.082 s
success onPreBootstrap — 0.071 s
success delete html and css files from previous builds — 0.007 s
success copy gatsby files — 0.043 s
success source and transform nodes — 0.019 s
success building schema — 0.133 s
success createPages — 33.258 s
success createPagesStatefully — 0.184 s
success onPreExtractQueries — 0.000 s
success update schema — 0.056 s
success extract queries from components — 0.075 s
success run graphql queries — 21.329 s
success write out page data — 32.011 s
success write out redirect data — 0.001 s
success onPostBootstrap — 0.004 s

info bootstrap finished - 91.063 s

success Building production JavaScript and CSS bundles — 13.668 s
success Building static HTML for pages — 29.937 s
info Done building in 134.79 sec

Creating pages can still probably be made a lot faster. Writing out page data is getting weirdly slow w/ large number of pages but progress.

KyleAMathews added some commits Jul 4, 2018

Use forEach instead of reduce when prepping page data
For building a 25k page site, it reduced the time spent writing out page
data from 32 seconds to 0.32 seconds 😱
@KyleAMathews

This comment has been minimized.

Show comment
Hide comment
@KyleAMathews

KyleAMathews Jul 6, 2018

Contributor

Aaaandddd dropped a 25k build site another 75% to 32 seconds :-D

/p/t/my-hello-world gatsby build
success open and validate gatsby-config — 0.007 s
success onPreBootstrap — 0.027 s
success delete html and css files from previous builds — 0.005 s
success copy gatsby files — 0.036 s
success source and transform nodes — 0.015 s
success building schema — 0.088 s
success createPages — 4.843 s
success createPagesStatefully — 0.111 s
success onPreExtractQueries — 0.003 s
success update schema — 0.055 s
success extract queries from components — 0.076 s
success run graphql queries — 8.749 s
success write out page data — 0.328 s
success write out redirect data — 0.002 s
success onPostBootstrap — 0.001 s

info bootstrap finished - 16.599 s

success Building production JavaScript and CSS bundles — 3.022 s
success Building static HTML for pages — 12.919 s
info Done building in 32.619 sec

A 10k page site builds in ~18 seconds and a 100k page site builds in 175 seconds.

Contributor

KyleAMathews commented Jul 6, 2018

Aaaandddd dropped a 25k build site another 75% to 32 seconds :-D

/p/t/my-hello-world gatsby build
success open and validate gatsby-config — 0.007 s
success onPreBootstrap — 0.027 s
success delete html and css files from previous builds — 0.005 s
success copy gatsby files — 0.036 s
success source and transform nodes — 0.015 s
success building schema — 0.088 s
success createPages — 4.843 s
success createPagesStatefully — 0.111 s
success onPreExtractQueries — 0.003 s
success update schema — 0.055 s
success extract queries from components — 0.076 s
success run graphql queries — 8.749 s
success write out page data — 0.328 s
success write out redirect data — 0.002 s
success onPostBootstrap — 0.001 s

info bootstrap finished - 16.599 s

success Building production JavaScript and CSS bundles — 3.022 s
success Building static HTML for pages — 12.919 s
info Done building in 32.619 sec

A 10k page site builds in ~18 seconds and a 100k page site builds in 175 seconds.

KyleAMathews added some commits Jul 6, 2018

WIP commit to dramatically speed up graphql queries
Not really loving all the nested caching I'm adding... but it's
necessary. Will revisit this Monday to see if there's a cleaner way
to cache things.
@zachgibson

This comment has been minimized.

Show comment
Hide comment
@zachgibson

zachgibson Jul 8, 2018

This frankly was more of academic interest and for debugging purposes and given that it adds a large amount of slowdowns and should be rarely if ever used in sites, we should hopefully be finish letting it go.

Is this still the case, or will this be a viable solution? I’m looking at using Gatsby for a 400k page site, but not sure if things would just explode trying to do this.

zachgibson commented Jul 8, 2018

This frankly was more of academic interest and for debugging purposes and given that it adds a large amount of slowdowns and should be rarely if ever used in sites, we should hopefully be finish letting it go.

Is this still the case, or will this be a viable solution? I’m looking at using Gatsby for a 400k page site, but not sure if things would just explode trying to do this.

@KyleAMathews

This comment has been minimized.

Show comment
Hide comment
@KyleAMathews

KyleAMathews Jul 9, 2018

Contributor

@zachgibson not sure I understand what you mean?

Contributor

KyleAMathews commented Jul 9, 2018

@zachgibson not sure I understand what you mean?

@zachgibson

This comment has been minimized.

Show comment
Hide comment
@zachgibson

zachgibson Jul 9, 2018

@KyleAMathews It seemed your comment was saying this PR was an experiment. I was wondering if the techniques you ended up finding in this work would end up getting merged into core Gatsby.

zachgibson commented Jul 9, 2018

@KyleAMathews It seemed your comment was saying this PR was an experiment. I was wondering if the techniques you ended up finding in this work would end up getting merged into core Gatsby.

@m-allanson

This comment has been minimized.

Show comment
Hide comment
@m-allanson

m-allanson Jul 9, 2018

Contributor

@zachgibson the plan is to merge this in to Gatsby core as soon as we can!

Contributor

m-allanson commented Jul 9, 2018

@zachgibson the plan is to merge this in to Gatsby core as soon as we can!

@KyleAMathews

This comment has been minimized.

Show comment
Hide comment
@KyleAMathews

KyleAMathews Jul 10, 2018

Contributor

Charted memory usage when building a 10k markdown site. We can probably drop this quite a bit but acceptable for now.

screen shot 2018-07-10 at 4 07 41 pm

Contributor

KyleAMathews commented Jul 10, 2018

Charted memory usage when building a 10k markdown site. We can probably drop this quite a bit but acceptable for now.

screen shot 2018-07-10 at 4 07 41 pm

@KyleAMathews KyleAMathews merged commit 27c644b into master Jul 11, 2018

1 of 4 checks passed

deploy/netlify Deploy preview failed.
Details
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@KyleAMathews KyleAMathews deleted the speed-large-site branch Jul 11, 2018

@tradziej

This comment has been minimized.

Show comment
Hide comment
@tradziej

tradziej Jul 17, 2018

@KyleAMathews I don't know how this works but is there a need to keep all those empty directories (public/static/d/*) after the build (gatsby build) was finished?

fs.ensureDir(`${program.directory}/public/static/d/${i}`)

tradziej commented Jul 17, 2018

@KyleAMathews I don't know how this works but is there a need to keep all those empty directories (public/static/d/*) after the build (gatsby build) was finished?

fs.ensureDir(`${program.directory}/public/static/d/${i}`)

@KyleAMathews

This comment has been minimized.

Show comment
Hide comment
@KyleAMathews

KyleAMathews Jul 17, 2018

Contributor

@tradziej yeah :-) I'm going to put up a PR in a sec to not pre-create the folders.

Contributor

KyleAMathews commented Jul 17, 2018

@tradziej yeah :-) I'm going to put up a PR in a sec to not pre-create the folders.

@stoltzrobin

This comment has been minimized.

Show comment
Hide comment
@stoltzrobin

stoltzrobin Aug 16, 2018

Contributor

@KyleAMathews Intresting to see your benchmarks. I'm at the moment trying out building ~12k pages and it takes around 22 minutes. Our bottleneck is the graphQL queries that takes around 1000s-1500s to complete. What type website did you build with this timings? As I see in your output the GraphQL queries take not that long time to run, how many queries do you run?

Contributor

stoltzrobin commented Aug 16, 2018

@KyleAMathews Intresting to see your benchmarks. I'm at the moment trying out building ~12k pages and it takes around 22 minutes. Our bottleneck is the graphQL queries that takes around 1000s-1500s to complete. What type website did you build with this timings? As I see in your output the GraphQL queries take not that long time to run, how many queries do you run?

@KyleAMathews

This comment has been minimized.

Show comment
Hide comment
@KyleAMathews

KyleAMathews Aug 16, 2018

Contributor

The benchmark sites https://github.com/gatsbyjs/gatsby/tree/master/benchmarks

Would love to hear more details about your site in another issue to see if we can figure out how to optimize it!

Contributor

KyleAMathews commented Aug 16, 2018

The benchmark sites https://github.com/gatsbyjs/gatsby/tree/master/benchmarks

Would love to hear more details about your site in another issue to see if we can figure out how to optimize it!

@stoltzrobin

This comment has been minimized.

Show comment
Hide comment
@stoltzrobin

stoltzrobin Aug 16, 2018

Contributor

@KyleAMathews thanks, will check that out.

Will post another issue if I can't find out any other ways to do it.

I've created an issue for this now #7373

Contributor

stoltzrobin commented Aug 16, 2018

@KyleAMathews thanks, will check that out.

Will post another issue if I can't find out any other ways to do it.

I've created an issue for this now #7373

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment