Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(gatsby): Shortcut trivial queries by id, potential huge win #20609

Merged
merged 1 commit into from Jan 15, 2020

Conversation

pvdz
Copy link
Contributor

@pvdz pvdz commented Jan 14, 2020

While there is a check for queries by id, this one circumvents a few more steps. It prevents having to build up an array based on type. Instead, if it sees a query by id, it will immediately just fetch that node directly.

This makes many large sites that use plain queries by id a helluvalot faster. One site that made me look into this problem with 145k pages went down from 4.5 hours to about 5 minutes (that is not a typo).

@pvdz pvdz requested a review from a team as a code owner January 14, 2020 21:52
@KyleAMathews
Copy link
Contributor

🔥

@DSchau
Copy link
Contributor

DSchau commented Jan 14, 2020

Nice!! Well done 👏

@TylerBarnes
Copy link
Contributor

😮 wow! amazing work!

Copy link
Contributor

@wardpeet wardpeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! 🔥 This looks awesome, i'm excited what it should do for .org.

I've added a few nitpicks & questions. I have no idea how redux/nodes work so I'm going to defer that part to our graphql gurus.

packages/gatsby/src/redux/nodes.js Outdated Show resolved Hide resolved
packages/gatsby/src/redux/run-sift.js Show resolved Hide resolved
Copy link
Contributor

@vladar vladar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds fantastic! But I'd also wait for a review from @freiksenet as this is one of the most complex parts in Gatsby core and I don't have enough expertise on it yet.

}

return node
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function looks a lot like addResolvedNodes. I guess it is a special case of it so maybe we could re-use common logic of those two.

Copy link
Contributor Author

@pvdz pvdz Jan 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The important point is that it doesn't iterate the entire list of nodes. That's very important when scaling up. And since nodes is a Map, there's no short circuit mechanism like .some or .every. Any short cicuiting approach would mean doing a full loop through all elements, or worse (generate an array with all keys or smth, before being able to do a partial loop).

If we were to track a shadow array with all keys of the map then short circuiting would be an option. This would cut down mandatory Map induced O(n) or O(n^2) loops by half. Not relevant for big oh, but very relevant if the total runtime is "only" 2h. But that's for another time. And would still be miles worse than the O(1) operation (-> one hash lookup) the new function offers.

There's really just three steps in the new function; get nodes by type, fetch node by index, decorate node. The existing function does the first and, in a loop, the second.

While it's possible to share code, I'm not seeing a solution where the solution isn't at least as bad-if-not-worse in terms of maintanance than the existing duplication. I am open to suggestions.

@pvdz
Copy link
Contributor Author

pvdz commented Jan 15, 2020

I had jumped on a call with @freiksenet for a sanity check to confirm this before creating these PR's :) But I welcome his review.

freiksenet
freiksenet previously approved these changes Jan 15, 2020
While there is a check for queries by `id`, this one circumvents a few more steps. It prevents having to build up an array based on type. Instead, if it sees a query by `id`, it will immediately just fetch that node directly.

This makes many sites that use plain queries by id a helluvalot faster. One site that made me look into this problem went down from 4.5 hours to about 5 minutes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants