parser: limit maximum number of tokens #3684

IvanGoncharov · 2022-07-28T16:10:01Z

Motivation: Parser CPU and memory usage are linear to the number of tokens in a document however, in extreme cases, it becomes quadratic due to memory exhaustion.
On my machine, it happens on queries with 2k tokens.
For example:

{ a a <repeat 2k times> a }

It takes 741ms on my machine.
But if we create a document of the same size but a smaller number of tokens, it would be a lot faster.
Example:

{ a(arg: "a <repeat 2k times> a" }

Now it takes only 17ms to process, which is 43 times faster.

If we just limit document size, we should make this limit small since it takes only two bytes to create a token, e.g. a.
But that will create issues for legit documents with long tokens (comments, descriptions, strings, long names, etc.).

That's why this PR adds a mechanism to limit the number of tokens in the parsed document.
Also, exact same mechanism is implemented in graphql-java, see:
graphql-java/graphql-java#2549

I also tried the alternative approach of counting nodes, and it gives
slightly better approximation of how many resources would be consumed.
However, compared to the tokens, AST nodes are an implementation detail of graphql-js
so it's impossible to replicate in other implementations (e.g. to count
this number on a client).

netlify · 2022-07-28T16:10:07Z

✅ Deploy Preview for compassionate-pike-271cb3 ready!

Name	Link
🔨 Latest commit	`8769378`
🔍 Latest deploy log	https://app.netlify.com/sites/compassionate-pike-271cb3/deploys/62e8fe0ac1035c0007695754
😎 Deploy Preview	https://deploy-preview-3684--compassionate-pike-271cb3.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

github-actions · 2022-07-28T16:10:41Z

Hi @IvanGoncharov, I'm @github-actions bot happy to help you with this PR 👋

Supported commands

Please post this commands in separate comments and only one per comment:

@github-actions run-benchmark - Run benchmark comparing base and merge commits for this PR
@github-actions publish-pr-on-npm - Build package from this PR and publish it on NPM

Motivation: Parser CPU and memory usage is linear to the number of tokens in a document however in extreme cases it becomes quadratic due to memory exhaustion. On my mashine it happens on queries with 2k tokens. For example: ``` { a a <repeat 2k times> a } ``` It takes 741ms on my machine. But if we create document of the same size but smaller number of tokens it would be a lot faster. Example: ``` { a(arg: "a <repeat 2k times> a" } ``` Now it takes only 17ms to process, which is 43 time faster. That mean if we limit document size we should make this limit small since it take only two bytes to create a token, e.g. ` a`. But that will hart legit documents that have long tokens in them (comments, describtions, strings, long names, etc.). That's why this PR adds a mechanism to limit number of token in parsed document. Also exact same mechanism implemented in graphql-java, see: graphql-java/graphql-java#2549 I also tried alternative approach of counting nodes and it gives slightly better approximation of how many resources would be consumed. However comparing to the tokens, AST nodes is implementation detail of graphql-js so it's imposible to replicate in other implementation (e.g. to count this number on a client).

github-actions · 2022-07-29T16:52:05Z

@github-actions run-benchmark

@saihaj Please, see benchmark results here: https://github.com/graphql/graphql-js/runs/7580332109?check_suite_focus=true#step:6:1

saihaj

I think this should be something that lives on server side code not in a reference implementation. The users of library can chose to limit the tokens onParse phase

yaacovCR · 2022-08-02T01:26:58Z

I think by having ability to throw an error within parse, you don't have to have a pre-parse step that would separately count the number of tokens?

src/language/parser.ts

yaacovCR · 2022-08-02T01:35:50Z

@IvanGoncharov looks good to me, I suggested changes to the wording on a few of the comments if you think an improvement.

Co-authored-by: Yaacov Rydzinski <yaacovCR@gmail.com>

IvanGoncharov · 2022-08-02T10:39:52Z

looks good to me, I suggested changes to the wording on a few of the comments if you think an improvement.

Thanks, @yaacovCR I merged those.

I think this should be something that lives on server side code not in a reference implementation. The users of library can chose to limit the tokens onParse phase

@saihaj The problem here is that parse is sync so as @yaacovCR pointed out the only other option is to count tokens as a separate step. But that would have a performance impact.

saihaj · 2022-08-02T12:10:33Z

looks good to me, I suggested changes to the wording on a few of the comments if you think an improvement.

Thanks, @yaacovCR I merged those.

I think this should be something that lives on server side code not in a reference implementation. The users of library can chose to limit the tokens onParse phase

@saihaj The problem here is that parse is sync so as @yaacovCR pointed out the only other option is to count tokens as a separate step. But that would have a performance impact.

Maybe we should consider making parse async too 🤔

michaelstaib · 2022-08-03T07:39:08Z

@IvanGoncharov will you have a default max token size or is that just a new option that you add and leave the setting up to the user?

IvanGoncharov · 2022-08-04T11:55:14Z

will you have a default max token size or is that just a new option that you add and leave the setting up to the user?

@michaelstaib The idea is to leave the default limit to the more high-level libraries.
Since we are using parse for all types of documents, e.g., SDL files. We can't choose one limit that will serve all use cases.
That said, all tools/libraries that implement pipelines (e.g., server libraries) are encouraged to set some limit by default.

IvanGoncharov · 2022-08-04T12:09:45Z

Maybe we should consider making parse async too 🤔

@saihaj This is a way bigger task.
Also not sure what API this parser will have.
AST is not an array but a tree, so not sure what return value this parser would have.
Note: by parser being sync, I meant that you can't interrupt it, so you get a full tree.
If the parser simply returns the promise of AST that doesn't change the situation in that respect.

I propose merging this as a solution to a problem and return back to the discussion if we will have an async parser in the future.

Backport of graphql#3684 Motivation: Parser CPU and memory usage is linear to the number of tokens in a document however in extreme cases it becomes quadratic due to memory exhaustion. On my mashine it happens on queries with 2k tokens. For example: ``` { a a <repeat 2k times> a } ``` It takes 741ms on my machine. But if we create document of the same size but smaller number of tokens it would be a lot faster. Example: ``` { a(arg: "a <repeat 2k times> a" } ``` Now it takes only 17ms to process, which is 43 time faster. That mean if we limit document size we should make this limit small since it take only two bytes to create a token, e.g. ` a`. But that will hart legit documents that have long tokens in them (comments, describtions, strings, long names, etc.). That's why this PR adds a mechanism to limit number of token in parsed document. Also exact same mechanism implemented in graphql-java, see: graphql-java/graphql-java#2549 I also tried alternative approach of counting nodes and it gives slightly better approximation of how many resources would be consumed. However comparing to the tokens, AST nodes is implementation detail of graphql-js so it's imposible to replicate in other implementation (e.g. to count this number on a client). * Apply suggestions from code review Co-authored-by: Yaacov Rydzinski <yaacovCR@gmail.com> Co-authored-by: Yaacov Rydzinski <yaacovCR@gmail.com>

IvanGoncharov added the PR: feature 🚀 requires increase of "minor" version number label Jul 28, 2022

IvanGoncharov requested review from yaacovCR and a team July 28, 2022 16:10

IvanGoncharov force-pushed the pr_branch4 branch from e04ce2e to 814efb8 Compare July 28, 2022 16:45

This comment has been minimized.

Sign in to view

saihaj reviewed Jul 29, 2022

View reviewed changes

yaacovCR reviewed Aug 2, 2022

View reviewed changes

src/language/parser.ts Outdated Show resolved Hide resolved

yaacovCR reviewed Aug 2, 2022

View reviewed changes

src/language/parser.ts Outdated Show resolved Hide resolved

Apply suggestions from code review

8769378

Co-authored-by: Yaacov Rydzinski <yaacovCR@gmail.com>

IvanGoncharov merged commit 9df9079 into graphql:main Aug 8, 2022

IvanGoncharov deleted the pr_branch4 branch August 8, 2022 17:02

IvanGoncharov mentioned this pull request Aug 16, 2022

parser: limit maximum number of tokens #3702

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parser: limit maximum number of tokens #3684

parser: limit maximum number of tokens #3684

IvanGoncharov commented Jul 28, 2022 •

edited

netlify bot commented Jul 28, 2022 •

edited

github-actions bot commented Jul 28, 2022

This comment has been minimized.

github-actions bot commented Jul 29, 2022

saihaj left a comment

yaacovCR commented Aug 2, 2022

yaacovCR commented Aug 2, 2022

IvanGoncharov commented Aug 2, 2022

saihaj commented Aug 2, 2022 •

edited by IvanGoncharov

michaelstaib commented Aug 3, 2022

IvanGoncharov commented Aug 4, 2022 •

edited

IvanGoncharov commented Aug 4, 2022

parser: limit maximum number of tokens #3684

parser: limit maximum number of tokens #3684

Conversation

IvanGoncharov commented Jul 28, 2022 • edited

netlify bot commented Jul 28, 2022 • edited

✅ Deploy Preview for compassionate-pike-271cb3 ready!

github-actions bot commented Jul 28, 2022

This comment has been minimized.

github-actions bot commented Jul 29, 2022

saihaj left a comment

Choose a reason for hiding this comment

yaacovCR commented Aug 2, 2022

yaacovCR commented Aug 2, 2022

IvanGoncharov commented Aug 2, 2022

saihaj commented Aug 2, 2022 • edited by IvanGoncharov

michaelstaib commented Aug 3, 2022

IvanGoncharov commented Aug 4, 2022 • edited

IvanGoncharov commented Aug 4, 2022

IvanGoncharov commented Jul 28, 2022 •

edited

netlify bot commented Jul 28, 2022 •

edited

saihaj commented Aug 2, 2022 •

edited by IvanGoncharov

IvanGoncharov commented Aug 4, 2022 •

edited