Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query validation caching #2018

Open
k0ka opened this issue Jan 3, 2022 · 10 comments
Open

Query validation caching #2018

k0ka opened this issue Jan 3, 2022 · 10 comments
Labels
enhancement A feature or improvement

Comments

@k0ka
Copy link
Collaborator

k0ka commented Jan 3, 2022

Are you willing to provide a PR for this issue or aid in developing it?

Yes.

What problem does this feature proposal attempt to solve?

Query validation takes a lot of time and resources (about 30-40% of processing time). However, a typical website uses a limited amount of queries that are compiled on the client. So we may cache the result of query validation, and identical queries will be processed much faster.

Which possible solutions should be considered?

There are only two hard things in Computer Science: cache invalidation and naming things. --Phil Karlton

In order to cache things, we have to figure out when to invalidate this cache. The validation is performed in the webonyx/graphql-php GraphQL::promiseToExecute()

Here is what it depends on:

  1. Schema
  2. Query
  3. Validation rules

The first two are presented as simple AST. The schema is already cached, and I opened #2017 to cache queries. So we can easily realize when to invalidate them.

The latter is more difficult. Rules are php classes that can't be easily tracked. They can also depend on any other data or code. For example, the QueryComplexity validator depends on all passed query variables.

I guess we can add a function getHash() to the ProvidesValidationRules interface. It should return the hash of all the data it depends on. So if anything is changed, we will be able to check it and invalidate the cache. For the default lighthouse ValidationRulesProvider it should return the hash of corresponding config values + query variables if the QueryComplexity validator is enabled.

Given these 3 hashes, we can store the result of validation and be assured it has not been changed.

I guess we should also extract the validation method in the webonyx/graphql-php GraphQL::promiseToExecute() to execute it independently.

Do you have any objections or suggestions? This may improve Lighthouse performance significantly.

@spawnia spawnia added the enhancement A feature or improvement label Jan 11, 2022
@spawnia
Copy link
Collaborator

spawnia commented Jan 11, 2022

Unless you have somebody DDOS'ing your server, I would expect invalid queries not to be repeated often. Thus, we could just cache queries that have executed successfuly, and then just skip validation entirely when they are queried again.

@k0ka
Copy link
Collaborator Author

k0ka commented Jan 15, 2022

I agree that in normal circumstances there should not be many queries with validation errors. However it's easy to implement (we'll store an array of validation errors and it'll be empty for validated queries).

Also I can imagine where this might be helpful. For example, old clients might spam old requests that are not validated against a new schema. Also a lot of DoS attacks are very dumb and reuse the same query.

@LastDragon-ru
Copy link
Contributor

LastDragon-ru commented Nov 30, 2022

Today I'm trying to optimize tests and found that query validation takes ~30% of the time :(

image

Look like we can pass empty validation rules into GraphQLBase::executeQuery() for known valid queries (by overriding $this->providesValidationRules). But I'm not sure how safe is it?

lighthouse/src/GraphQL.php

Lines 262 to 271 in 8f3b9fa

$result = GraphQLBase::executeQuery(
$schema,
$query,
$rootValue,
$context,
$variables,
$operationName,
null,
$this->providesValidationRules->validationRules()
);

@spawnia
Copy link
Collaborator

spawnia commented Nov 30, 2022

I think @k0ka has outlined quite nicely what would be necessary to safely recognize when validation for a query should lead to identical results. If we get this part right, we can basically extract the validation from the base GraphQL library and run it separately, caching its result: https://github.com/webonyx/graphql-php/blob/7a78690e99548de8155f605772eecf615329ee16/src/GraphQL.php#L130-L143

@LastDragon-ru
Copy link
Contributor

I've done a quick test and seems caching will not give any improvements for (our) tests :( The query and variables are almost always different -> new hash -> cache miss. For real application, the result probably will be similar or negligible because variables are usually different between requests.

Seems the only way is to separate all rules into two categories:

  1. Dependent from query variables - they should be run always, it will also solve the hash calculation problem for UploadedFile (my variant use json_encode and it fails in this case).

  2. Independent - only one time for each query

@spawnia
Copy link
Collaborator

spawnia commented Nov 30, 2022

The variables influence the result, but they are not relevant for basic document validation. See https://github.com/webonyx/graphql-php/blob/7a78690e99548de8155f605772eecf615329ee16/src/GraphQL.php#L143 - variables are not passed to DocumentValidator::validate().

@LastDragon-ru
Copy link
Contributor

@spawnia
Copy link
Collaborator

spawnia commented Nov 30, 2022

As I see they are passed into QueryComplexity ?

Right. I think we can check if lighthouse.security.max_query_complexity is enabled, and include the variables in the hash based on that. We can document that for optimal performance of the validation cache this rule needs to be disabled.

@LastDragon-ru
Copy link
Contributor

I think we can check if lighthouse.security.max_query_complexity is enabled, and include the variables in the hash based on that.

Or just run QueryComplexity rule always (implementation will be simpler)

@LastDragon-ru
Copy link
Contributor

I've found another edge case: in our application introspection is available only for user with specific permission. It doesn't work with cached validation :) Looks like the setting with "run always" rules will be useful in any case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A feature or improvement
Projects
None yet
Development

No branches or pull requests

3 participants