Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize OverlappingFieldsCanBeMerged #296

Closed
objmagic opened this issue Nov 20, 2017 · 16 comments · Fixed by sangria-graphql-org/sangria#12
Closed

Optimize OverlappingFieldsCanBeMerged #296

objmagic opened this issue Nov 20, 2017 · 16 comments · Fixed by sangria-graphql-org/sangria#12

Comments

@objmagic
Copy link

objmagic commented Nov 20, 2017

QueryReducer.rejectComplexQueries become stuck with a 412KBytes GraphQL query selecting ~150000 fields. Did a profiling and stacktrace shows we are doing tremendous amount of work in OverlappingFieldsCanBeMerged

An example of the query is at here

We are using:

  • Sangria version 1.3.2
  • JVM 8 (with some Twitter internal customization)

A screenshot of part of the stacktrace (sorry I cannot provide complete stacktrace).

image

@OlegIlyenko OlegIlyenko self-assigned this Nov 20, 2017
@OlegIlyenko
Copy link
Member

OlegIlyenko commented Nov 20, 2017

Thanks for reporting the issue! Also, the example query would be quite useful. I think the main issue is with the OverlappingFieldsCanBeMerged validation. It is one of the most complex validations, and in this particular query, it probably does a lot of comparisons between all these top-level selection sets.

I will investigate the root cause.

@objmagic
Copy link
Author

@OlegIlyenko

Thanks a lot for your prompt reply. I would love to work through this issue with you.

Let me see if I can get a better stacktrace.

@OlegIlyenko
Copy link
Member

This would be awesome! 👍 As a first step, I think it would be really helpful to verify that mentioned validation is indeed the bottleneck. Can you please exclude it in your tests:

QueryReducerExecutor.reduceQueryWithoutVariables(
  ...
  queryReducers = ...,
  queryValidator = 
    new RuleBasedQueryValidator(
      QueryValidator.allRules
        .filterNot(_.isInstanceOf[OverlappingFieldsCanBeMerged])))

// or

Executor.execute(
  ...
  queryReducers = ...,
  queryValidator = 
    new RuleBasedQueryValidator(
      QueryValidator.allRules
        .filterNot(_.isInstanceOf[OverlappingFieldsCanBeMerged])))

If we can verify that it cases the slowdown, then we can focus on OverlappingFieldsCanBeMerged.

@objmagic
Copy link
Author

remove the OverlappingFieldsCanBeMerged validator solves the issue.

@OlegIlyenko
Copy link
Member

OlegIlyenko commented Nov 20, 2017

Great, this means that we can isolate the issue and focus the effort on optimising the OverlappingFieldsCanBeMerged validation. Maybe you could also change the issue title to reflect this fact?

If you would like to investigate the issue as well, here are some useful resources:

@OlegIlyenko
Copy link
Member

OlegIlyenko commented Nov 20, 2017

Now that I think about it, recent bugfix introduced a change that reduces usage of the cache. i described this issue here: graphql/graphql-js#780 (review)

This might be a promising path to investigate. For example, we can revert the bugfix and see how it performs without it. This commit introduced the bugfix: 9866a7b#diff-9ce8420e3a37d01d401ac1bc89e17e6e

@OlegIlyenko
Copy link
Member

The validation rules are normally completely self-contained. So you can just copy/paste this old version of the validation under a different name in your test and add it in the RuleBasedQueryValidator instead of the new version of OverlappingFieldsCanBeMerged.

@objmagic objmagic changed the title QueryReducer.rejectComplexQueries unable to reject really large query Optimize OverlappingFieldsCanBeMerged Nov 20, 2017
@OlegIlyenko OlegIlyenko removed their assignment Nov 20, 2017
@objmagic
Copy link
Author

Thanks. I'll try to find some time to debug this.

@objmagic
Copy link
Author

Notice this problem can introduce serious problem. When doing curl on our API server, it takes 8~10 mins to get back a bad result. That said, attacker construct maliciously large query to DDoS your production service.

@OlegIlyenko
Copy link
Member

Indeed, this is a concern.

Assuming that parsing and OverlappingFieldsCanBeMerged as well as other validations are well optimized. If incoming request is allowed to have unlimited payload size, then eventually parsing and validation become too slow, it is just a matter of payload size.

So far I thought about following possible solutions that generally implemented outside of GraphQL execution:

  • Always limit the request payload size
  • Use persisted queries, if possible
  • Track execution time and cancel the execution if it takes more time than a threshold (this can be also limited in scope, e.g. only parsing + validation). This is actually something that library can make easier to do. Since parsing and validation are CPU-bound tasks, it should be relatively easy to add simple cancellation mechanism (it becomes harder when IO with external systems is involved).

I would be very interested to hear you opinion and ideas on possible solutions and countermeasures for this issue.

@objmagic
Copy link
Author

For now, we added a simple HTTP filter to filter out request that is above certain threshold.

@objmagic
Copy link
Author

objmagic commented Dec 1, 2017

image

@OlegIlyenko using old version of the validation file and got the result above. The situation is still bad but I got part of the result back and then got bunch of HTTP timeout failures. That said, it seems the bug fix does not matter much here.

@OlegIlyenko
Copy link
Member

I see, thanks for investigating this possibility! I guess we need to look at the original algorithm and try to optimise it.

@objmagic
Copy link
Author

objmagic commented Dec 1, 2017

Thanks. I wouldn't worry much now since we have put an HTTP filter at the very front.
Let's keep this issue open, just for other people who might have the same concern.

@SimonAdameit
Copy link
Contributor

We also experienced this issue at XING and invested some time to fix it. Here is the PR: #458 and here is the description of the algorithm.

@yanns
Copy link
Contributor

yanns commented Sep 27, 2022

The faster version is now the default one starting from https://github.com/sangria-graphql/sangria/releases/tag/v3.0.0

@yanns yanns closed this as completed Sep 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants