-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce number of queries during GraphQL execution #1386
Conversation
d0afe24
to
8929ee8
Compare
Rebased to latest master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skipped over some of the tests and I can't claim to understand all of the SQL or whether all the right values are inserted into the SQL because there's too much to digest overall.
What I did understand looked good!
That leaves us with the following combinations of whether the parent and | ||
child store a list or a scalar value, and whether the parent is derived: | ||
|
||
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to maybe check in query-prefetching.org
instead? GitHub can render that just like Markdown. Then we don't have the somewhat ugly HTML generated from Org Mode in the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat! I didn't know that is possible. Since we now have an RFC repo, I wonder if that doc should really go there - otherwise, it's going to be the only document about implementation in graph-node
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not quite an engineering plan but I agree, it's a good place to put it. Could you create an engineering plan PR with this document as PLAN-0001: SQL query combination
?
|
||
### Type A | ||
|
||
Use when parent is derived and child is a list (_tt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do the _tt
, _tf
, tf_
etc. stand for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shows which part of the table above each case covers, i.e. _tt
is for "doesn't matter if parent is a list; parent is derived; child is a list" I mostly put it there to make sure I covered all possibilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I still don't know what they stand for? _
is don't care, t
and f
are... what? 😁
Perhaps we can put something human-readable in the parentheses there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t
is true and f
is false, but I'll just take it out, since it just repeats what the sentence says in more concise form. It summarizes for which row in the table we use a query of that type.
order by parent_id, pos | ||
|
||
When there is only one window, we can simplify the above query. The | ||
simplifcation basically inlines the `matches` CTE. That is important as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: simplifcation
-> simplification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This typo hasn't been fixed yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't push updates to the doc since I was waiting whether you agree to put it in the RFC repo
graph/src/data/query/error.rs
Outdated
// Using single query and prefetch resolution yield different results | ||
IncorrectPrefetchResult { | ||
single: q::Value, | ||
prefetch: q::Value, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still prefer slow
instead of single
. Single sounds like a single query is executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed it.
graph/src/data/query/error.rs
Outdated
@@ -194,6 +200,10 @@ impl fmt::Display for QueryExecutionError { | |||
} | |||
TooDeep(max_depth) => write!(f, "query has a depth that exceeds the limit of `{}`", max_depth), | |||
UndefinedFragment(frag_name) => write!(f, "fragment `{}` is not defined", frag_name), | |||
IncorrectPrefetchResult{ .. } => write!(f, "Running query with prefetch \ | |||
and single query resolution yielded different results. \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
@@ -140,7 +142,10 @@ impl EntityData { | |||
// Simply ignore keys that do not have an underlying table | |||
// column; those will be things like the block_range that | |||
// is used internally for versioning | |||
if let Some(column) = table.column(&SqlName::from_snake_case(key)).ok() { | |||
if key == "parent_id" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't entities have a parentId
, parent_id
or parentID
field? Would this cause issues here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use __parent_id
or something that is reserved in GraphQL and therefor shouldn't appear in subgraph schemas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Postgres actually allows $
in identifiers, which is illegal in GraphQL names; I'll just change it to g$parent_id
(g
for 'graph) That way, it's impossible to collide with a GraphQL name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also renamed pos
in the queries to g$pos
to avoid possible collisions.
} | ||
} | ||
|
||
/// Convebience to pass the name of the column to order by around. If `name` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: Convebience
-> Convenience
.into_iter() | ||
.map(|child_type| { | ||
let attribute = WindowAttribute::Scalar("favorite_color".to_owned()); | ||
let link = EntityLink::Direct(attribute); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we indirectly testing the other EntityLink
variant or should we add tests for this here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are covered by the tests in core/tests/interfaces.rs
and graphql/tests/query.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok, good.
Addressed all review comments, except for where to put the implementation notes for all this. I think they really belong in the RFC repo now as that will contain similar material in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One typo remains and let's move the implementation notes into an engineering plan even if it isn't a proper one.
c82f699
to
87bc506
Compare
Removed the implementation details doc and opened graphprotocol/rfcs#2 for it. That should address all the comments in the review. |
We need more flexibility in how the query is generated.
We prematurely converted into a string, which caused unnecessary complications.
Before we passed first and skip separately, which was more awkward
This reduces the number of queries needed to respond to a query with nested associations quite dramatically. For a query like `parents { id children { id } }` we run only two queries rather than `number(parents) + 1` many queries. Part of addressing #857
When the GraphQL query contains a `@verify` directive, run the query both with prefetching and with the old single query resolution and compare the results. If they differ, return an error. Note that the `@verify` directive must be on a `query something @verify { .. }` construct; it does not work with anonymous queries.
We want to make sure that we do not inadvertently resolve objects individually when we already prefetched results so that we do not inadvertently run into the reduced performance of the old approach. Instead, we want queries to fail if prefetching does not return all the results needed for resolution so that we can fix those bugs.
…rent The queries without window functionality are quite a bit simpler; if there is only one parent, there is no need for using a window function.
That situation should be impossible, and indicates a bug in our code
The plain struct construction is incredibly annoying since there are a ton of tests that need to be changed everytime some detail of how an EntityQuery is represented changes.
All they were doing was make the tests harder to read
We now add `schema.table` to the Table struct in relational layouts so that tables can be used without passing additional information in query generation.
Different implementations of an interface may store the same attribute in different ways (different columns and/or difference in derived status of attribute) To accomodate that, reformulate how we handle windowing and generate SQL queries
This query took 50s to load EthereumContractHandlers for 100 EthereumContractMappings. After rewriting the query, it takes 280ms.
They were always the same value, and it's therefore unnecessary to pass them in.
That ensures that we will not possibly collide with any attribute the user might provide since '$' is not legal in GraphQL names.
This makes sure we don't possibly collide with a user's GraphQL property 'pos'
We now use 'prefetch:' instead of 'r:'
87bc506
to
3097ec8
Compare
Rebased to latest master |
For a query like
parents { id children { id } }
we used to run1 + number(parents)
queries, one to get the parents, and then one query to get the children of each parent. The more deeply a query was nested, the more this effect compounded.With this PR, we will now run 2 queries: one to get the parents, and one to get all children for the parents. Details of how this is done can be found in this document
It is possible to have
graph-node
compare the results of execution with and without prefetching by adding a@verify
directive to the query, as inquery stuff @verify { .. }
(this must use the form with an explicitquery
keyword) In this mode, the query will be executed in the old and the new way; if the two results differ, the query returns an error that contains both result for manual comparison.It is also possible to completely turn off prefetching by setting the environment variable
GRAPH_GRAPHQL_NO_PREFETCH
; if that variable is set to anything at all, query execution behaves like it did before this PR. (see #1340)Supersedes PR #1341