perf(gatsby): Shortcut trivial queries by id, potential huge win #20609

pvdz · 2020-01-14T21:52:26Z

While there is a check for queries by id, this one circumvents a few more steps. It prevents having to build up an array based on type. Instead, if it sees a query by id, it will immediately just fetch that node directly.

This makes many large sites that use plain queries by id a helluvalot faster. One site that made me look into this problem with 145k pages went down from 4.5 hours to about 5 minutes (that is not a typo).

KyleAMathews · 2020-01-14T21:54:49Z

🔥

DSchau · 2020-01-14T21:59:41Z

Nice!! Well done 👏

TylerBarnes · 2020-01-14T22:07:46Z

😮 wow! amazing work!

wardpeet

Wow! 🔥 This looks awesome, i'm excited what it should do for .org.

I've added a few nitpicks & questions. I have no idea how redux/nodes work so I'm going to defer that part to our graphql gurus.

packages/gatsby/src/redux/nodes.js

packages/gatsby/src/redux/run-sift.js

vladar

This sounds fantastic! But I'd also wait for a review from @freiksenet as this is one of the most complex parts in Gatsby core and I don't have enough expertise on it yet.

vladar · 2020-01-15T09:11:15Z

packages/gatsby/src/redux/nodes.js

+  }
+
+  return node
+}


This function looks a lot like addResolvedNodes. I guess it is a special case of it so maybe we could re-use common logic of those two.

The important point is that it doesn't iterate the entire list of nodes. That's very important when scaling up. And since nodes is a Map, there's no short circuit mechanism like .some or .every. Any short cicuiting approach would mean doing a full loop through all elements, or worse (generate an array with all keys or smth, before being able to do a partial loop).

If we were to track a shadow array with all keys of the map then short circuiting would be an option. This would cut down mandatory Map induced O(n) or O(n^2) loops by half. Not relevant for big oh, but very relevant if the total runtime is "only" 2h. But that's for another time. And would still be miles worse than the O(1) operation (-> one hash lookup) the new function offers.

There's really just three steps in the new function; get nodes by type, fetch node by index, decorate node. The existing function does the first and, in a loop, the second.

While it's possible to share code, I'm not seeing a solution where the solution isn't at least as bad-if-not-worse in terms of maintanance than the existing duplication. I am open to suggestions.

pvdz · 2020-01-15T09:24:06Z

I had jumped on a call with @freiksenet for a sanity check to confirm this before creating these PR's :) But I welcome his review.

While there is a check for queries by `id`, this one circumvents a few more steps. It prevents having to build up an array based on type. Instead, if it sees a query by `id`, it will immediately just fetch that node directly. This makes many sites that use plain queries by id a helluvalot faster. One site that made me look into this problem went down from 4.5 hours to about 5 minutes.

pvdz requested a review from a team as a code owner January 14, 2020 21:52

wardpeet reviewed Jan 14, 2020

View reviewed changes

packages/gatsby/src/redux/nodes.js Outdated Show resolved Hide resolved

packages/gatsby/src/redux/run-sift.js Show resolved Hide resolved

vladar reviewed Jan 15, 2020

View reviewed changes

vladar requested a review from freiksenet January 15, 2020 09:16

pvdz force-pushed the real-by-id-shortcut branch from ad345a8 to 3e2e62e Compare January 15, 2020 09:35

freiksenet previously approved these changes Jan 15, 2020

View reviewed changes

pvdz dismissed freiksenet’s stale review via 7ee8127 January 15, 2020 11:53

pvdz force-pushed the real-by-id-shortcut branch from 3e2e62e to 7ee8127 Compare January 15, 2020 11:53

freiksenet approved these changes Jan 15, 2020

View reviewed changes

pvdz merged commit fa4ff69 into master Jan 15, 2020

delete-merged-branch bot deleted the real-by-id-shortcut branch January 15, 2020 15:47

This was referenced Jan 15, 2020

Is there a hard limit on maximum number of pages that Gatsby can build? #20338

Closed

perf(gatsby): Create index on the fly for non-id index #20729

Merged

ascorbic mentioned this pull request Jan 23, 2020

Explore page build optimisations #20785

Closed

pvdz mentioned this pull request Jan 28, 2020

[Request] Real-world Gatsby sites (50k+ pages) #19512

Closed

me4502 mentioned this pull request Mar 6, 2020

Direct-ID lookup for nodes in graphql fails in some instances #22004

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(gatsby): Shortcut trivial queries by id, potential huge win #20609

perf(gatsby): Shortcut trivial queries by id, potential huge win #20609

pvdz commented Jan 14, 2020 •

edited

KyleAMathews commented Jan 14, 2020

DSchau commented Jan 14, 2020

TylerBarnes commented Jan 14, 2020

wardpeet left a comment

vladar left a comment

vladar Jan 15, 2020

pvdz Jan 15, 2020 •

edited

pvdz commented Jan 15, 2020

perf(gatsby): Shortcut trivial queries by id, potential huge win #20609

perf(gatsby): Shortcut trivial queries by id, potential huge win #20609

Conversation

pvdz commented Jan 14, 2020 • edited

KyleAMathews commented Jan 14, 2020

DSchau commented Jan 14, 2020

TylerBarnes commented Jan 14, 2020

wardpeet left a comment

Choose a reason for hiding this comment

vladar left a comment

Choose a reason for hiding this comment

vladar Jan 15, 2020

Choose a reason for hiding this comment

pvdz Jan 15, 2020 • edited

Choose a reason for hiding this comment

pvdz commented Jan 15, 2020

pvdz commented Jan 14, 2020 •

edited

pvdz Jan 15, 2020 •

edited