Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gatsby-source-graphql): Query batching #22347

Merged
merged 16 commits into from
Mar 25, 2020

Conversation

vladar
Copy link
Contributor

@vladar vladar commented Mar 16, 2020

Description

This PR allows gatsby-source-graphql users to improve performance 2-10 times by introducing query batching strategy along with a new environment variable for controlling query concurrency.

By default, gatsby-source-graphql executes each query in a separate network request. Query batching strategy implemented in this PR merges several queries into a single query and saves several network round-trips.

Consider the following queries:

{
  query: `query(id: Int!) {
    node(id: $id) {
      foo
    }
  }`,
  variables: { id: 1 },
}
{
  query: `query(id: Int!) {
    node(id: $id) {
      bar
    }
  }`,
  variables: { id: 2 },
}

They will be merged into a single query:

{
  query: `
    query(
      $gatsby0_id: Int!
      $gatsby1_id: Int!
    ) {
      gatsby0_node: node(id: $gatsby0_id) {
        foo
      }
      gatsby1_node: node(id: $gatsby1_id) {
        bar
      }
    }
  `,
  variables: {
    gatsby0_id: 1,
    gatsby1_id: 2,
  }
}

Then the response will be transformed back to the shape expected for original queries.

Caveat: Batching is only possible for queries starting approximately at the same time. In other words, it is bounded by the number of parallel GraphQL queries executed by Gatsby (by default it is 4).

To work around this limitation this PR also introduces a new environment variable GATSBY_EXPERIMENTAL_QUERY_CONCURRENCY.

By tweaking this concurrency level and number of queries in batch it possible to significantly improve the performance in some cases.

Bench on my test site (queries per second, bigger is better):

no batching 5 queries per batch 10 queries per batch
Concurrency: 4 (baseline) 5 q/s 4.8 q/s 4.8 q/s
Concurrency: 20 18.95 q/s 19.07 q/s 22.73 q/s
Concurrency: 50 15 q/s 40.3 q/s 45 q/s
Concurrency: 100 server died :) 44.3 q/s 48 q/s

Documentation

The documentation is outlined in the README.md (happy for any advice)

Related Issues

Fixes #13425

@vladar vladar requested review from a team as code owners March 16, 2020 19:17
Copy link
Contributor

@laurieontech laurieontech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome Vlad! I know this is a WIP so did an editing pass on the README and can do a more thorough review when it's ready.

packages/gatsby-source-graphql/README.md Outdated Show resolved Hide resolved
packages/gatsby-source-graphql/README.md Outdated Show resolved Hide resolved
packages/gatsby-source-graphql/README.md Outdated Show resolved Hide resolved
packages/gatsby-source-graphql/README.md Outdated Show resolved Hide resolved
packages/gatsby-source-graphql/README.md Outdated Show resolved Hide resolved
packages/gatsby-source-graphql/README.md Show resolved Hide resolved
packages/gatsby-source-graphql/README.md Outdated Show resolved Hide resolved
packages/gatsby-source-graphql/README.md Show resolved Hide resolved
packages/gatsby-source-graphql/README.md Show resolved Hide resolved
packages/gatsby-source-graphql/README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@freiksenet freiksenet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work!

Some minor comments.

@laurieontech
Copy link
Contributor

I'd actually prefer that we remove the object wrapper from those queries to make it a valid graphql query. It'll help clarify things since otherwise everything is a js tag. I'm mixed on the js vs. json. I'll let you pick which you prefer.

@vladar
Copy link
Contributor Author

vladar commented Mar 20, 2020

@laurieontech So GraphQL request is sent to the server as JSON:

{
  "query": "query ($id: Int!) { foo(id: $id) }",
  "variables": {
    "id": "id",
  }
}

And batching transforms both the query and the variables. So we must somehow reflect it in the example. That's why we have this wrapper vs just graphql query.

P.S. Thanks a lot for your feedback on docs 🌮

vladar and others added 12 commits March 23, 2020 12:40
Co-Authored-By: LB <laurie@gatsbyjs.com>
Co-Authored-By: LB <laurie@gatsbyjs.com>
Co-Authored-By: LB <laurie@gatsbyjs.com>
Co-Authored-By: LB <laurie@gatsbyjs.com>
Co-Authored-By: LB <laurie@gatsbyjs.com>
Co-Authored-By: LB <laurie@gatsbyjs.com>
Co-Authored-By: LB <laurie@gatsbyjs.com>
Co-Authored-By: LB <laurie@gatsbyjs.com>
Co-Authored-By: LB <laurie@gatsbyjs.com>
Co-Authored-By: LB <laurie@gatsbyjs.com>
@vladar vladar force-pushed the source-graphql-query-batching branch from 6860b72 to 1e4cb72 Compare March 23, 2020 07:56
@laurieontech
Copy link
Contributor

Ah, that makes sense. Hadn't thought about the transformation piece. In that case I'd go for JS as well. So keep it as is :)

@vladar vladar added the bot: merge on green Gatsbot will merge these PRs automatically when all tests passes label Mar 25, 2020
@gatsbybot gatsbybot merged commit 2a4c7fd into master Mar 25, 2020
@delete-merged-branch delete-merged-branch bot deleted the source-graphql-query-batching branch March 25, 2020 08:25
@vladar
Copy link
Contributor Author

vladar commented Mar 25, 2020

Published in gatsby@2.20.5 and gatsby-source-graphql@2.3.0

@yaacovCR
Copy link
Contributor

yaacovCR commented Apr 6, 2020

@vladar, any chance this batching link can be broken off into a separate package, even included within graphql-tools?

Would be helpful to community, I think.

@vladar
Copy link
Contributor Author

vladar commented Apr 6, 2020

The problem is that some parts of this code rely on Gatsby validations for simplicity. So it may fail in several edge cases outside of Gatsby. It is possible to make it work everywhere, it just requires time.

I think a good idea would be to wait for feedback from Gatsby users and then move this to a separate package (or graphql-tools-fork).

yaacovCR added a commit to ardatan/graphql-tools that referenced this pull request Aug 31, 2020
WIP:

When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests.

Adapted from Gatsby query batcher by @vladar.

Caveats:
* Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen!
* Does not pass `info` argument to the batched executor call, disabling any executors that used extensions annotated onto info.

TODO:
- Add testing!
- Extensions support should be added by a custom option?

Related:

gatsbyjs/gatsby#22347 (comment)
#1710 (comment)
#1959 (comment)
#1954
yaacovCR added a commit to ardatan/graphql-tools that referenced this pull request Aug 31, 2020
WIP:

When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests.

Adapted from Gatsby query batcher by @vladar.

Caveats:
* Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen!
* Does not pass `info` argument to the batched executor call, disabling any executors that used extensions annotated onto info.

TODO:
- Add testing!
- Extensions support should be added by a custom option?

Related:

gatsbyjs/gatsby#22347 (comment)
#1710 (comment)
#1959 (comment)
#1954
yaacovCR added a commit to ardatan/graphql-tools that referenced this pull request Aug 31, 2020
WIP:

When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests.

Adapted from Gatsby query batcher by @vladar.

Caveats:
* Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen!
* Does not pass `info` argument to the batched executor call, disabling any executors that used extensions annotated onto info.

TODO:
- Add testing!
- Extensions support should be added by a custom option?

Related:

gatsbyjs/gatsby#22347 (comment)
#1710 (comment)
#1959 (comment)
#1954
yaacovCR added a commit to ardatan/graphql-tools that referenced this pull request Aug 31, 2020
WIP:

When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests.

Adapted from Gatsby query batcher by @vladar.

Caveats:
* Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen!
* Passed `info` argument from first executor call to the batched executor call, making info argument unreliable.

TODO:
- Add testing!

Related:

gatsbyjs/gatsby#22347 (comment)
#1710 (comment)
#1959 (comment)
#1954

fix
yaacovCR added a commit to ardatan/graphql-tools that referenced this pull request Aug 31, 2020
WIP:

When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests.

Adapted from Gatsby query batcher by @vladar.

Caveats:
* Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen!
* Passed `info` argument from first executor call to the batched executor call, making info argument unreliable.

TODO:
- Add testing!

Related:

gatsbyjs/gatsby#22347 (comment)
#1710 (comment)
#1959 (comment)
#1954
yaacovCR added a commit to ardatan/graphql-tools that referenced this pull request Aug 31, 2020
WIP:

When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests.

Adapted from Gatsby query batcher by @vladar.

Caveats:
* Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen!
* Passed `info` argument from first executor call to the batched executor call, making info argument unreliable.

TODO:
- Add testing!

Related:

gatsbyjs/gatsby#22347 (comment)
#1710 (comment)
#1959 (comment)
#1954
yaacovCR added a commit to ardatan/graphql-tools that referenced this pull request Aug 31, 2020
WIP:

When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests.

Adapted from Gatsby query batcher by @vladar.

Caveats:
* Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen!
* Passed `info` argument from first executor call to the batched executor call, making info argument unreliable.

TODO:
- Add additional testing!

Related:

gatsbyjs/gatsby#22347 (comment)
#1710 (comment)
#1959 (comment)
#1954
yaacovCR added a commit to ardatan/graphql-tools that referenced this pull request Aug 31, 2020
WIP:

When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests.

Adapted from Gatsby query batcher by @vladar.

Caveats:
* Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen!
* Passed `info` argument from first executor call to the batched executor call, making info argument unreliable.

TODO:
- Add additional testing!

Related:

gatsbyjs/gatsby#22347 (comment)
#1710 (comment)
#1959 (comment)
#1954
yaacovCR added a commit to ardatan/graphql-tools that referenced this pull request Aug 31, 2020
WIP:

When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests.

Adapted from Gatsby query batcher by @vladar.

Caveats:
* Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen!
* Passed `info` argument from first executor call to the batched executor call, making info argument unreliable.

TODO:
- Add additional testing!

Related:

gatsbyjs/gatsby#22347 (comment)
#1710 (comment)
#1959 (comment)
#1954
yaacovCR added a commit to ardatan/graphql-tools that referenced this pull request Aug 31, 2020
WIP:

When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests.

Adapted from Gatsby query batcher by @vladar.

Caveats:
* Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen!
* Passed `info` argument from first executor call to the batched executor call, making info argument unreliable.

TODO:
- Add additional testing!

Related:

gatsbyjs/gatsby#22347 (comment)
#1710 (comment)
#1959 (comment)
#1954
yaacovCR added a commit to ardatan/graphql-tools that referenced this pull request Aug 31, 2020
When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests.

Adapted from Gatsby query batcher by @vladar.

Caveats:
* Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen!
* Passed `info` argument from first executor call to the batched executor call, making info argument unreliable.

Related:

gatsbyjs/gatsby#22347 (comment)
#1710 (comment)
#1959 (comment)
#1954
ardatan pushed a commit to ardatan/graphql-tools that referenced this pull request Sep 1, 2020
* enable batch execution

When `batch` is set to true for a given subschemaConfig, batches all delegated root fields into a combined request passed to the executor. Moreover, batches all requests to a given subschema into the minimum number of requests, collecting queries and mutations separately, preserving operation order. Distributes properly pathed errors to the originating requests.

Adapted from Gatsby query batcher by @vladar.

Caveats:
* Uses a Dataloader under the hood, which is created anew upon each request -- relies on a unique context argument per request to make this happen!
* Passed `info` argument from first executor call to the batched executor call, making info argument unreliable.

Related:

gatsbyjs/gatsby#22347 (comment)
#1710 (comment)
#1959 (comment)
#1954

* fix(delegate): pass extensions to executor/subscriber
@gmac
Copy link

gmac commented Oct 7, 2021

For posterity, this PR now has a shoutout in the GraphQL Tools documentation here: https://www.graphql-tools.com/docs/batch-execution#batch-execution. The GraphQL Tools implementation is more generic, but is unflappably based on the excellent work in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bot: merge on green Gatsbot will merge these PRs automatically when all tests passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gatsby-source-graphql: n+1 queries to remote GraphQL APIs, and other concerns
6 participants