Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce number of queries during GraphQL execution #1386

Merged
merged 35 commits into from
Dec 11, 2019
Merged

Conversation

lutter
Copy link
Collaborator

@lutter lutter commented Nov 27, 2019

For a query like parents { id children { id } } we used to run 1 + number(parents) queries, one to get the parents, and then one query to get the children of each parent. The more deeply a query was nested, the more this effect compounded.

With this PR, we will now run 2 queries: one to get the parents, and one to get all children for the parents. Details of how this is done can be found in this document

It is possible to have graph-node compare the results of execution with and without prefetching by adding a @verify directive to the query, as in query stuff @verify { .. } (this must use the form with an explicit query keyword) In this mode, the query will be executed in the old and the new way; if the two results differ, the query returns an error that contains both result for manual comparison.

It is also possible to completely turn off prefetching by setting the environment variable GRAPH_GRAPHQL_NO_PREFETCH; if that variable is set to anything at all, query execution behaves like it did before this PR. (see #1340)

Supersedes PR #1341

@lutter lutter force-pushed the lutter/prefetch branch 2 times, most recently from d0afe24 to 8929ee8 Compare December 3, 2019 00:26
@lutter
Copy link
Collaborator Author

lutter commented Dec 3, 2019

Rebased to latest master

Copy link
Contributor

@Jannis Jannis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skipped over some of the tests and I can't claim to understand all of the SQL or whether all the right values are inserted into the SQL because there's too much to digest overall.

What I did understand looked good!

That leaves us with the following combinations of whether the parent and
child store a list or a scalar value, and whether the parent is derived:

<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to maybe check in query-prefetching.org instead? GitHub can render that just like Markdown. Then we don't have the somewhat ugly HTML generated from Org Mode in the repo.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat! I didn't know that is possible. Since we now have an RFC repo, I wonder if that doc should really go there - otherwise, it's going to be the only document about implementation in graph-node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not quite an engineering plan but I agree, it's a good place to put it. Could you create an engineering plan PR with this document as PLAN-0001: SQL query combination?


### Type A

Use when parent is derived and child is a list (_tt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do the _tt, _tf, tf_ etc. stand for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shows which part of the table above each case covers, i.e. _tt is for "doesn't matter if parent is a list; parent is derived; child is a list" I mostly put it there to make sure I covered all possibilities.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I still don't know what they stand for? _ is don't care, t and f are... what? 😁

Perhaps we can put something human-readable in the parentheses there?

Copy link
Collaborator Author

@lutter lutter Dec 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t is true and f is false, but I'll just take it out, since it just repeats what the sentence says in more concise form. It summarizes for which row in the table we use a query of that type.

order by parent_id, pos

When there is only one window, we can simplify the above query. The
simplifcation basically inlines the `matches` CTE. That is important as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: simplifcation -> simplification

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This typo hasn't been fixed yet.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't push updates to the doc since I was waiting whether you agree to put it in the RFC repo

Comment on lines 58 to 62
// Using single query and prefetch resolution yield different results
IncorrectPrefetchResult {
single: q::Value,
prefetch: q::Value,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still prefer slow instead of single. Single sounds like a single query is executed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed it.

@@ -194,6 +200,10 @@ impl fmt::Display for QueryExecutionError {
}
TooDeep(max_depth) => write!(f, "query has a depth that exceeds the limit of `{}`", max_depth),
UndefinedFragment(frag_name) => write!(f, "fragment `{}` is not defined", frag_name),
IncorrectPrefetchResult{ .. } => write!(f, "Running query with prefetch \
and single query resolution yielded different results. \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@@ -140,7 +142,10 @@ impl EntityData {
// Simply ignore keys that do not have an underlying table
// column; those will be things like the block_range that
// is used internally for versioning
if let Some(column) = table.column(&SqlName::from_snake_case(key)).ok() {
if key == "parent_id" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't entities have a parentId, parent_id or parentID field? Would this cause issues here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use __parent_id or something that is reserved in GraphQL and therefor shouldn't appear in subgraph schemas.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postgres actually allows $ in identifiers, which is illegal in GraphQL names; I'll just change it to g$parent_id (g for 'graph) That way, it's impossible to collide with a GraphQL name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also renamed pos in the queries to g$pos to avoid possible collisions.

}
}

/// Convebience to pass the name of the column to order by around. If `name`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Convebience -> Convenience

store/postgres/src/relational_queries.rs Outdated Show resolved Hide resolved
graphql/src/store/prefetch.rs Show resolved Hide resolved
.into_iter()
.map(|child_type| {
let attribute = WindowAttribute::Scalar("favorite_color".to_owned());
let link = EntityLink::Direct(attribute);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we indirectly testing the other EntityLink variant or should we add tests for this here as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are covered by the tests in core/tests/interfaces.rs and graphql/tests/query.rs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, good.

@lutter
Copy link
Collaborator Author

lutter commented Dec 9, 2019

Addressed all review comments, except for where to put the implementation notes for all this. I think they really belong in the RFC repo now as that will contain similar material in the future.

Copy link
Contributor

@Jannis Jannis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One typo remains and let's move the implementation notes into an engineering plan even if it isn't a proper one.

@lutter
Copy link
Collaborator Author

lutter commented Dec 10, 2019

Removed the implementation details doc and opened graphprotocol/rfcs#2 for it. That should address all the comments in the review.

We need more flexibility in how the query is generated.
We prematurely converted into a string, which caused unnecessary
complications.
Before we passed first and skip separately, which was more awkward
This reduces the number of queries needed to respond to a query with nested
associations quite dramatically. For a query like `parents { id children {
id } }` we run only two queries rather than `number(parents) + 1` many
queries.

Part of addressing #857
When the GraphQL query contains a `@verify` directive, run the query both
with prefetching and with the old single query resolution and compare the
results. If they differ, return an error.

Note that the `@verify` directive must be on a `query something @verify {
.. }` construct; it does not work with anonymous queries.
We want to make sure that we do not inadvertently resolve objects
individually when we already prefetched results so that we do not
inadvertently run into the reduced performance of the old
approach. Instead, we want queries to fail if prefetching does not return
all the results needed for resolution so that we can fix those bugs.
…rent

The queries without window functionality are quite a bit simpler; if there
is only one parent, there is no need for using a window function.
That situation should be impossible, and indicates a bug in our code
The plain struct construction is incredibly annoying since there are a ton
of tests that need to be changed everytime some detail of how an
EntityQuery is represented changes.
All they were doing was make the tests harder to read
We now add `schema.table` to the Table struct in relational layouts so that
tables can be used without passing additional information in query
generation.
Different implementations of an interface may store the same attribute in
different ways (different columns and/or difference in derived status of
attribute) To accomodate that, reformulate how we handle windowing and
generate SQL queries
This query took 50s to load EthereumContractHandlers for 100
EthereumContractMappings. After rewriting the query, it takes 280ms.
They were always the same value, and it's therefore unnecessary to pass
them in.
That ensures that we will not possibly collide with any attribute the user
might provide since '$' is not legal in GraphQL names.
This makes sure we don't possibly collide with a user's GraphQL property
'pos'
We now use 'prefetch:' instead of 'r:'
@lutter
Copy link
Collaborator Author

lutter commented Dec 11, 2019

Rebased to latest master

@lutter lutter merged commit 3097ec8 into master Dec 11, 2019
@lutter lutter deleted the lutter/prefetch branch December 11, 2019 21:15
@Jannis Jannis mentioned this pull request Dec 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants