Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2] Hulksmash build slowdowns on larger sites #6226

Merged
merged 32 commits into from
Jul 11, 2018
Merged

Conversation

KyleAMathews
Copy link
Contributor

@KyleAMathews KyleAMathews commented Jun 29, 2018

Spent the afternoon and evening going through critical path for creating pages.

Currently a 5000 page site can build in ~37 seconds. A very significant
improvement over the current.

This does necessitate one breaking change namely creating nodes for each page. This frankly
was more of academic interest and for debugging purposes and given that it adds a
large amount of slowdowns and should be rarely if ever used in sites, we should
hopefully be finish letting it go. In any case, if people are using it, there's
far better ways of querying the same data.

TODO

  • Evaluate why "write out pages data" gets really slow at 25k
  • Do node profiling on adding thousands of pages
  • remove profile calls
  • Calculate how much faster large sites are with this PR vs. current beta.
  • Do memory profiling on same site.
  • test freecodecamp
  • add benchmarking folder and first two sites — minimal createPages in memory and larger one that writes out markdown files
  • Add progress indicator for longer-running things like graphql queries + page building w/ final status at end
  • Remove extra packages from benchmark sites

@KyleAMathews
Copy link
Contributor Author

KyleAMathews commented Jun 29, 2018

Deploy preview for using-glamor failed.

Built with commit fa32e5d

https://app.netlify.com/sites/using-glamor/deploys/5b457017c6aed64e9461ef06

@KyleAMathews
Copy link
Contributor Author

Hmm and a 25,000 page site builds in ~7.5 minutes. The file for pages metadata is now enormous (~4mb) so we'll need to fix that but this is all very promising.

@KyleAMathews
Copy link
Contributor Author

Hmmm, gatsbyjs.org isn't building since StaticQueries aren't being run during builds. Is this a known issue? Been heads down last few days.

}

// Delete internal data from pageContext
delete result.pageContext.jsonName
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refs: #5096

@pieh
Copy link
Contributor

pieh commented Jun 29, 2018

This does necessitate one breaking change namely creating nodes for each page. This frankly
was more of academic interest and for debugging purposes and given that it adds a
large amount of slowdowns and should be rarely if ever used in sites, we should
hopefully be finish letting it go. In any case, if people are using it, there's
far better ways of querying the same data.

This would break current implementation of sitemap plugin I think - but we can get that info for redux store instead of querying for it. This also would allow us to skip updating schema part (which is needed to add SitePage related queries to schema right now)

Switching nodes reducer to map and mutating it instead of creating new state for every CREATE_NODE action will defenitely help here too

Hmmm, gatsbyjs.org isn't building since StaticQueries aren't being run during builds. Is this a known issue? Been heads down last few days.

I'm not seeing this here

@m-allanson
Copy link
Contributor

m-allanson commented Jun 29, 2018

Hmmm, gatsbyjs.org isn't building since StaticQueries aren't being run during builds. Is this a known issue? Been heads down last few days.

I'm not seeing this here

Me neither. Seems like nice speedups for .org. Unscientific tests running gatsby build a few times (with warm cache):

current beta
✨  Done in 130.25s.
✨  Done in 97.00s.
✨  Done in 163.13s.

this PR
✨  Done in 72.71s.
✨  Done in 70.16s.
✨  Done in 74.66s.

Edit: oh yeah, gatsby develop doesn't seem quite right. Content changes take a long time and StaticQuery doesn't always update.

@KyleAMathews
Copy link
Contributor Author

Switching nodes reducer to map and mutating it instead of creating new state for every CREATE_NODE action will defenitely help here too

Oh, great point. That would speed up creating the SitePage nodes a ton. I'll try that too in this PR before declaring SitePage nodes dead.

@KyleAMathews
Copy link
Contributor Author

I'm not seeing this here

Hrrmmm... weird. I deleted node_modules and yarn.lock. Will keep poking at this.

@KyleAMathews
Copy link
Contributor Author

This also would allow us to skip updating schema part (which is needed to add SitePage related queries to schema right now)

This would be nice still. Though it seems your changes have been making the schema generation a lot faster.

@@ -13,7 +13,7 @@ module.exports = async (program: any) => {

debug(`generating static HTML`)
// Reduce pages objects to an array of paths.
const pages = store.getState().pages.map(page => page.path)
const pages = [...store.getState().pages.values()].map(page => page.path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array.from(store.getState().pages.values(), page => page.path) maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh haha — didn't know this was part of Array.from. Yeah, totally makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i just learned it myself!

@@ -81,7 +90,7 @@ const findIdsWithoutDataDependencies = () => {
// paths.
const notTrackedIds = _.difference(
[
...state.pages.map(p => p.path),
...[...state.pages.values()].map(p => p.path),
...[...state.staticQueryComponents.values()].map(c => c.jsonName),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should do Array.from(set, mapFn) here probably as well

@KyleAMathews
Copy link
Contributor Author

Finally got gatsbyjs.org to work — had deleted public/static/d without also deleting the .cache directory 🤦‍♂️

Builds are taking anywhere between ~55-80 seconds on a warm cache. Not bad!

@gatsbybot
Copy link
Collaborator

gatsbybot commented Jun 30, 2018

Deploy preview for using-drupal ready!

Built with commit b647bb1

https://deploy-preview-6226--using-drupal.netlify.com

@gatsbybot
Copy link
Collaborator

gatsbybot commented Jun 30, 2018

Deploy preview for gatsbygram ready!

Built with commit b647bb1

https://deploy-preview-6226--gatsbygram.netlify.com

@KyleAMathews
Copy link
Contributor Author

This is coming along!

Just built a 25k page site in a bit over 2 minutes

success open and validate gatsby-config — 0.082 s
success onPreBootstrap — 0.071 s
success delete html and css files from previous builds — 0.007 s
success copy gatsby files — 0.043 s
success source and transform nodes — 0.019 s
success building schema — 0.133 s
success createPages — 33.258 s
success createPagesStatefully — 0.184 s
success onPreExtractQueries — 0.000 s
success update schema — 0.056 s
success extract queries from components — 0.075 s
success run graphql queries — 21.329 s
success write out page data — 32.011 s
success write out redirect data — 0.001 s
success onPostBootstrap — 0.004 s

info bootstrap finished - 91.063 s

success Building production JavaScript and CSS bundles — 13.668 s
success Building static HTML for pages — 29.937 s
info Done building in 134.79 sec

Creating pages can still probably be made a lot faster. Writing out page data is getting weirdly slow w/ large number of pages but progress.

@KyleAMathews
Copy link
Contributor Author

Aaaandddd dropped a 25k build site another 75% to 32 seconds :-D

/p/t/my-hello-world gatsby build
success open and validate gatsby-config — 0.007 s
success onPreBootstrap — 0.027 s
success delete html and css files from previous builds — 0.005 s
success copy gatsby files — 0.036 s
success source and transform nodes — 0.015 s
success building schema — 0.088 s
success createPages — 4.843 s
success createPagesStatefully — 0.111 s
success onPreExtractQueries — 0.003 s
success update schema — 0.055 s
success extract queries from components — 0.076 s
success run graphql queries — 8.749 s
success write out page data — 0.328 s
success write out redirect data — 0.002 s
success onPostBootstrap — 0.001 s

info bootstrap finished - 16.599 s

success Building production JavaScript and CSS bundles — 3.022 s
success Building static HTML for pages — 12.919 s
info Done building in 32.619 sec

A 10k page site builds in ~18 seconds and a 100k page site builds in 175 seconds.

Not really loving all the nested caching I'm adding... but it's
necessary. Will revisit this Monday to see if there's a cleaner way
to cache things.
@zachgibson
Copy link

This frankly was more of academic interest and for debugging purposes and given that it adds a large amount of slowdowns and should be rarely if ever used in sites, we should hopefully be finish letting it go.

Is this still the case, or will this be a viable solution? I’m looking at using Gatsby for a 400k page site, but not sure if things would just explode trying to do this.

@KyleAMathews
Copy link
Contributor Author

@zachgibson not sure I understand what you mean?

@zachgibson
Copy link

@KyleAMathews It seemed your comment was saying this PR was an experiment. I was wondering if the techniques you ended up finding in this work would end up getting merged into core Gatsby.

@m-allanson
Copy link
Contributor

@zachgibson the plan is to merge this in to Gatsby core as soon as we can!

@tradziej
Copy link

@KyleAMathews I don't know how this works but is there a need to keep all those empty directories (public/static/d/*) after the build (gatsby build) was finished?

fs.ensureDir(`${program.directory}/public/static/d/${i}`)

@KyleAMathews
Copy link
Contributor Author

@tradziej yeah :-) I'm going to put up a PR in a sec to not pre-create the folders.

@stoltzrobin
Copy link
Contributor

@KyleAMathews Intresting to see your benchmarks. I'm at the moment trying out building ~12k pages and it takes around 22 minutes. Our bottleneck is the graphQL queries that takes around 1000s-1500s to complete. What type website did you build with this timings? As I see in your output the GraphQL queries take not that long time to run, how many queries do you run?

@KyleAMathews
Copy link
Contributor Author

The benchmark sites https://github.com/gatsbyjs/gatsby/tree/master/benchmarks

Would love to hear more details about your site in another issue to see if we can figure out how to optimize it!

@stoltzrobin
Copy link
Contributor

stoltzrobin commented Aug 16, 2018

@KyleAMathews thanks, will check that out.

Will post another issue if I can't find out any other ways to do it.

I've created an issue for this now #7373

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants