Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(query): improve query performance #17682

Merged
merged 10 commits into from Sep 25, 2019
Merged

Conversation

@me4502
Copy link
Member

me4502 commented Sep 17, 2019

Description

One of the big caveats currently of large sites in Gatsby is having to use gatsby-node when performing large numbers of queries. This has downsides, such as losing the ability to reload pages when the data changes.

I encountered an issue recently which made it much more preferable to use page queries, so I've looked into /why/ this is slow.

This PR is only a start, but it's taken 18k page queries from 11 minutes to 4.5 minutes, so over a 50% reduction in build time, which I feel is a decent improvement.

@me4502 me4502 requested a review from gatsbyjs/core Sep 17, 2019
@KyleAMathews

This comment has been minimized.

Copy link
Contributor

KyleAMathews commented Sep 17, 2019

Wow! This is awesome.

Could you check how this improves the markdown benchmark scores? https://github.com/gatsbyjs/gatsby/blob/master/benchmarks/markdown/README.md

@me4502

This comment has been minimized.

Copy link
Member Author

me4502 commented Sep 17, 2019

I just ran the benchmark, it doesn't seem to be as dramatic in there (assuming I ran it correctly)

I've been testing with my normal site in develop though, and I feel this would make more of an impact there.

Here are the benchmark times for build,

Before: 424.935
After: 381.775

I ran it, then checked out the other branch then ran it again - is that correct?

@KyleAMathews

This comment has been minimized.

Copy link
Contributor

KyleAMathews commented Sep 17, 2019

That's a good way to test things.

Query running used to at least have more caching than during development. That might have changed with all the changes @gatsbyjs/core has been doing.

@me4502

This comment has been minimized.

Copy link
Member Author

me4502 commented Sep 17, 2019

I've worked out why it's much faster in my site - due to extra data dependencies this code gets called a lot more often for the same page count.

@me4502

This comment has been minimized.

Copy link
Member Author

me4502 commented Sep 17, 2019

A vast majority of the time is now spent within Sift itself. I ran a test with Loki, and my local site that was 11 minutes before this change (using Sift), is now 80 seconds with Loki.

Edit: Now 70 :)

@me4502

This comment has been minimized.

Copy link
Member Author

me4502 commented Sep 17, 2019

So just as a final summary of this PR:

Using Loki, queries are almost instant from my testing. The second the query needs to offload to Sift it becomes slower. It's still magnitudes faster than it was before this PR, but Sift now appears to be the biggest slowdown.

@muescha

This comment has been minimized.

Copy link
Contributor

muescha commented Sep 17, 2019

In node-model you also changed to new Set() ... Has from includes
Maybe revert this also

@me4502

This comment has been minimized.

Copy link
Member Author

me4502 commented Sep 17, 2019

Ah thanks, I’ll do that when I’m back at a computer (if a Gatsby team member wants to merge before then feel free to change those two instances in node-model)

@pieh

This comment has been minimized.

Copy link
Contributor

pieh commented Sep 17, 2019

Using Loki, queries are almost instant from my testing. The second the query needs to offload to Sift it becomes slower. It's still magnitudes faster than it was before this PR, but Sift now appears to be the biggest slowdown.

sift + redux store doesn't have indexing (on fields other than id - see explanation below) and it was primary reason to add loki backend which scales much better as datasets grows

I just ran the benchmark, it doesn't seem to be as dramatic in there (assuming I ran it correctly)

I think this is because of query/filter we use in this benchmark:

fakeMarkdown(id: { eq: $id }) {

In sift we have special case for this:

// If the the query for single node only has a filter for an "id"
// using "eq" operator, then we'll just grab that ID and return it.
if (isEqId(firstOnly, siftFilter)) {
const node = getNode(siftFilter[0].id[`$eq`])
if (
!node ||
(node.internal && !nodeTypeNames.includes(node.internal.type))
) {
return []
}
return [node]
}

So this is one case where redux+sift is in the same ballpark as loki when it comes to filtering performance (mostly because we don't trully use sift there)

Might be interesting to change this query from selecting id to something else (or add something else to filter so we don't hit this optimization) and see results then

@pieh pieh self-assigned this Sep 17, 2019
@me4502

This comment has been minimized.

Copy link
Member Author

me4502 commented Sep 17, 2019

Ah that makes sense, thanks. I’ll make the query a bit more complex in the benchmark in the morning as well :)

@KyleAMathews

This comment has been minimized.

Copy link
Contributor

KyleAMathews commented Sep 17, 2019

Really exciting to see the speedups on your site! We're looking to make Loki default soon so other sites can see the benefits. Thanks for all your investigation work on this.

@me4502 me4502 force-pushed the clipchamp:feature/speedy-queries branch from 6078874 to 7c7f77c Sep 22, 2019
@pieh

This comment has been minimized.

Copy link
Contributor

pieh commented Sep 24, 2019

@me4502 let me know if you are fine with pushing to your branch? I did some adjustments to query benchmark to be able to test this (I probably will create different PR for it) - but results are astounding:

Current master:

QUERY_LINKED_NODES=1 gatsby build
[...]
success run page queries - 56.825 s — 901/901 15.86 queries/second
[...]
info Done building in 68.015 sec

With this PR:

QUERY_LINKED_NODES=1 gatsby build
[...]
success run page queries - 11.681 s — 901/901 77.30 queries/second
[...]
info Done building in 26.15 sec

I've also created some tests in #17813 to catch issues I manually discovered earlier in review, so would be good to get that one in before this PR

@me4502

This comment has been minimized.

Copy link
Member Author

me4502 commented Sep 24, 2019

Yeah I'm happy with you pushing to the branch :)

@pieh

This comment has been minimized.

Copy link
Contributor

pieh commented Sep 24, 2019

@me4502

This comment has been minimized.

Copy link
Member Author

me4502 commented Sep 24, 2019

Hmm, that option appears to already be checked. It may be a setting on the organisation that the fork was made in preventing pushes, I don't think I have the ability to change those settings. If you send a patch file I could apply it?

@pieh

This comment has been minimized.

Copy link
Contributor

pieh commented Sep 24, 2019

Don't worry about it - I'll create standalone PR for that benchmark change (and then you could rebase/merge master in)

@pieh

This comment has been minimized.

Copy link
Contributor

pieh commented Sep 24, 2019

#17837

Takeaways from this benchmark is that query running slow downs in master pretty quickly as more and more data deps are being tracked, and with this PR queries per second metric is pretty stable

@me4502 me4502 force-pushed the clipchamp:feature/speedy-queries branch from 7c7f77c to 7c24322 Sep 24, 2019
@me4502

This comment has been minimized.

Copy link
Member Author

me4502 commented Sep 24, 2019

Just rebased on master, thanks :)

@pieh

This comment has been minimized.

Copy link
Contributor

pieh commented Sep 24, 2019

I hope to finish tests ( #17813 ) soon and merge that one in too (just verifying if my changes there to mock persistent loki storage don't break regular usage of loki there now)

@pieh

This comment has been minimized.

Copy link
Contributor

pieh commented Sep 24, 2019

Tests pr was merged - could you rebase again?

@pieh

This comment has been minimized.

Copy link
Contributor

pieh commented Sep 24, 2019

I think this is in good spot right now, let's just wait for

Tests pr was merged - could you rebase again?

and we can merge this in

me4502 added 10 commits Sep 17, 2019
This reverts commit 901d462.
@me4502 me4502 force-pushed the clipchamp:feature/speedy-queries branch from 7c24322 to be623ff Sep 24, 2019
@me4502

This comment has been minimized.

Copy link
Member Author

me4502 commented Sep 24, 2019

I rebased, and the checks have all passed :)

@pieh
pieh approved these changes Sep 25, 2019
Copy link
Contributor

pieh left a comment

Awesome work on this @me4502!

Will merge this first thing tomorrow!

@pieh pieh merged commit a1b1396 into gatsbyjs:master Sep 25, 2019
20 checks passed
20 checks passed
Danger All good
Details
Peril All green. Well done.
Details
ci/circleci: bootstrap Your tests passed on CircleCI!
Details
ci/circleci: e2e_tests_development_runtime Your tests passed on CircleCI!
Details
ci/circleci: e2e_tests_gatsby-image Your tests passed on CircleCI!
Details
ci/circleci: e2e_tests_path-prefix Your tests passed on CircleCI!
Details
ci/circleci: e2e_tests_production_runtime Your tests passed on CircleCI!
Details
ci/circleci: integration_tests_gatsby_pipeline Your tests passed on CircleCI!
Details
ci/circleci: integration_tests_long_term_caching Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
ci/circleci: starters_validate Your tests passed on CircleCI!
Details
ci/circleci: themes_e2e_tests_development_runtime Your tests passed on CircleCI!
Details
ci/circleci: themes_e2e_tests_production_runtime Your tests passed on CircleCI!
Details
ci/circleci: unit_tests_node10 Your tests passed on CircleCI!
Details
ci/circleci: unit_tests_node12 Your tests passed on CircleCI!
Details
ci/circleci: unit_tests_node8 Your tests passed on CircleCI!
Details
ci/circleci: unit_tests_www Your tests passed on CircleCI!
Details
ci/circleci: windows_unit_tests Your tests passed on CircleCI!
Details
cypress: default-group 67 tests passed in 00:29
Details
unit_tests_windows Build #20190924.99 succeeded
Details
@pieh

This comment has been minimized.

Copy link
Contributor

pieh commented Sep 25, 2019

Published gatsby@2.15.24

@me4502 me4502 deleted the clipchamp:feature/speedy-queries branch Sep 25, 2019
@sw-yx

This comment has been minimized.

Copy link
Contributor

sw-yx commented Oct 1, 2019

this is awesome!! question from the audience.. what is Loki and Sift?

@jonniebigodes

This comment has been minimized.

Copy link
Contributor

jonniebigodes commented Oct 1, 2019

@sw-yx sift and lokijs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.