Reduce number of queries during GraphQL execution #1386

lutter · 2019-11-27T20:52:07Z

For a query like parents { id children { id } } we used to run 1 + number(parents) queries, one to get the parents, and then one query to get the children of each parent. The more deeply a query was nested, the more this effect compounded.

With this PR, we will now run 2 queries: one to get the parents, and one to get all children for the parents. Details of how this is done can be found in this document

It is possible to have graph-node compare the results of execution with and without prefetching by adding a @verify directive to the query, as in query stuff @verify { .. } (this must use the form with an explicit query keyword) In this mode, the query will be executed in the old and the new way; if the two results differ, the query returns an error that contains both result for manual comparison.

It is also possible to completely turn off prefetching by setting the environment variable GRAPH_GRAPHQL_NO_PREFETCH; if that variable is set to anything at all, query execution behaves like it did before this PR. (see #1340)

Supersedes PR #1341

lutter · 2019-12-03T00:26:31Z

Rebased to latest master

Jannis

I skipped over some of the tests and I can't claim to understand all of the SQL or whether all the right values are inserted into the SQL because there's too much to digest overall.

What I did understand looked good!

Jannis · 2019-12-09T09:02:51Z

docs/implementation/query-prefetching.md

+That leaves us with the following combinations of whether the parent and
+child store a list or a scalar value, and whether the parent is derived:
+
+<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">


Do you want to maybe check in query-prefetching.org instead? GitHub can render that just like Markdown. Then we don't have the somewhat ugly HTML generated from Org Mode in the repo.

Neat! I didn't know that is possible. Since we now have an RFC repo, I wonder if that doc should really go there - otherwise, it's going to be the only document about implementation in graph-node.

It's not quite an engineering plan but I agree, it's a good place to put it. Could you create an engineering plan PR with this document as PLAN-0001: SQL query combination?

Jannis · 2019-12-09T09:10:46Z

docs/implementation/query-prefetching.md

+
+### Type A
+
+Use when parent is derived and child is a list (_tt)


What do the _tt, _tf, tf_ etc. stand for?

It shows which part of the table above each case covers, i.e. _tt is for "doesn't matter if parent is a list; parent is derived; child is a list" I mostly put it there to make sure I covered all possibilities.

I think I still don't know what they stand for? _ is don't care, t and f are... what? 😁

Perhaps we can put something human-readable in the parentheses there?

t is true and f is false, but I'll just take it out, since it just repeats what the sentence says in more concise form. It summarizes for which row in the table we use a query of that type.

Jannis · 2019-12-09T09:22:29Z

docs/implementation/query-prefetching.md

+     order by parent_id, pos
+
+When there is only one window, we can simplify the above query. The
+simplifcation basically inlines the `matches` CTE. That is important as


Typo: simplifcation -> simplification

This typo hasn't been fixed yet.

I didn't push updates to the doc since I was waiting whether you agree to put it in the RFC repo

Jannis · 2019-12-09T10:41:08Z

graph/src/data/query/error.rs

+    // Using single query and prefetch resolution yield different results
+    IncorrectPrefetchResult {
+        single: q::Value,
+        prefetch: q::Value,
+    },


I still prefer slow instead of single. Single sounds like a single query is executed.

Renamed it.

Jannis · 2019-12-09T11:20:41Z

graph/src/data/query/error.rs

@@ -194,6 +200,10 @@ impl fmt::Display for QueryExecutionError {
            }
            TooDeep(max_depth) => write!(f, "query has a depth that exceeds the limit of `{}`", max_depth),
            UndefinedFragment(frag_name) => write!(f, "fragment `{}` is not defined", frag_name),
+            IncorrectPrefetchResult{ .. } => write!(f, "Running query with prefetch \
+                           and single query resolution yielded different results. \


Jannis · 2019-12-09T12:42:01Z

store/postgres/src/relational_queries.rs

@@ -140,7 +142,10 @@ impl EntityData {
                    // Simply ignore keys that do not have an underlying table
                    // column; those will be things like the block_range that
                    // is used internally for versioning
-                    if let Some(column) = table.column(&SqlName::from_snake_case(key)).ok() {
+                    if key == "parent_id" {


Can't entities have a parentId, parent_id or parentID field? Would this cause issues here?

We could use __parent_id or something that is reserved in GraphQL and therefor shouldn't appear in subgraph schemas.

Postgres actually allows $ in identifiers, which is illegal in GraphQL names; I'll just change it to g$parent_id (g for 'graph) That way, it's impossible to collide with a GraphQL name.

I also renamed pos in the queries to g$pos to avoid possible collisions.

Jannis · 2019-12-09T12:53:47Z

store/postgres/src/relational_queries.rs

+    }
+}
+
+/// Convebience to pass the name of the column to order by around. If `name`


Typo: Convebience -> Convenience

store/postgres/src/relational_queries.rs

graphql/src/store/prefetch.rs

Jannis · 2019-12-09T14:43:40Z

store/postgres/tests/store.rs

+            .into_iter()
+            .map(|child_type| {
+                let attribute = WindowAttribute::Scalar("favorite_color".to_owned());
+                let link = EntityLink::Direct(attribute);


Are we indirectly testing the other EntityLink variant or should we add tests for this here as well?

Yes, they are covered by the tests in core/tests/interfaces.rs and graphql/tests/query.rs

Ah, ok, good.

lutter · 2019-12-09T20:39:13Z

Addressed all review comments, except for where to put the implementation notes for all this. I think they really belong in the RFC repo now as that will contain similar material in the future.

Jannis

One typo remains and let's move the implementation notes into an engineering plan even if it isn't a proper one.

lutter · 2019-12-10T16:19:18Z

Removed the implementation details doc and opened graphprotocol/rfcs#2 for it. That should address all the comments in the review.

We need more flexibility in how the query is generated.

We prematurely converted into a string, which caused unnecessary complications.

Before we passed first and skip separately, which was more awkward

…y types

This reduces the number of queries needed to respond to a query with nested associations quite dramatically. For a query like `parents { id children { id } }` we run only two queries rather than `number(parents) + 1` many queries. Part of addressing #857

@verify

When the GraphQL query contains a `@verify` directive, run the query both with prefetching and with the old single query resolution and compare the results. If they differ, return an error. Note that the `@verify` directive must be on a `query something @verify { .. }` construct; it does not work with anonymous queries.

We want to make sure that we do not inadvertently resolve objects individually when we already prefetched results so that we do not inadvertently run into the reduced performance of the old approach. Instead, we want queries to fail if prefetching does not return all the results needed for resolution so that we can fix those bugs.

…rent The queries without window functionality are quite a bit simpler; if there is only one parent, there is no need for using a window function.

That situation should be impossible, and indicates a bug in our code

The plain struct construction is incredibly annoying since there are a ton of tests that need to be changed everytime some detail of how an EntityQuery is represented changes.

All they were doing was make the tests harder to read

We now add `schema.table` to the Table struct in relational layouts so that tables can be used without passing additional information in query generation.

Different implementations of an interface may store the same attribute in different ways (different columns and/or difference in derived status of attribute) To accomodate that, reformulate how we handle windowing and generate SQL queries

This query took 50s to load EthereumContractHandlers for 100 EthereumContractMappings. After rewriting the query, it takes 280ms.

They were always the same value, and it's therefore unnecessary to pass them in.

That ensures that we will not possibly collide with any attribute the user might provide since '$' is not legal in GraphQL names.

This makes sure we don't possibly collide with a user's GraphQL property 'pos'

We now use 'prefetch:' instead of 'r:'

lutter · 2019-12-11T01:34:52Z

Rebased to latest master

lutter force-pushed the lutter/prefetch branch 2 times, most recently from d0afe24 to 8929ee8 Compare December 3, 2019 00:26

lutter mentioned this pull request Dec 6, 2019

Enable time-travel GraphQL queries #1397

Merged

Jannis mentioned this pull request Dec 9, 2019

Improve restart time of subgraphs with many data sources #1081

Closed

Jannis requested changes Dec 9, 2019

View reviewed changes

Jannis requested changes Dec 10, 2019

View reviewed changes

lutter force-pushed the lutter/prefetch branch from c82f699 to 87bc506 Compare December 10, 2019 16:18

Jannis approved these changes Dec 10, 2019

View reviewed changes

lutter added 19 commits December 10, 2019 17:31

graph: Add generic Value::from for vectors

379c269

graphql: Avoid some unnecessary clones of ExecutionContext

5ddceb4

store: Refactor running EntityQuery against JSONB storage

7dd8647

We need more flexibility in how the query is generated.

store: Simplify how we pass the order direction of an EntityQuery around

658cc78

We prematurely converted into a string, which caused unnecessary complications.

store: Pass the entire EntityRange around when running an EntityQuery

2107317

Before we passed first and skip separately, which was more awkward

store: Add an optional window to EntityQuery

cad2874

store: Add window functionality into the SQL queries we run

21c97f2

graphql: Do not pass initial_value into execute_root_selection_set

b5bfa9d

graphql: Pass field, not just its name to Resolver.resolve_objects

eb54778

graphql: Add prefetch to resolvers

d8a1d23

graphql: Add extension traits for some graphql_parser schema and quer…

d4e233f

…y types

mock: Set the __typename on all entities

34de342

core: Use a logger that respects GRAPH_LOG in tests

df57ab8

store: Prefer 'id' column for filtering arrays

21ed337

graphql: Allow turning prefetch off by setting GRAPH_GRAPHQL_NO_PREFETCH

37d8341

graphql: When prefetching, only window when there is more than one pa…

58d1189

…rent The queries without window functionality are quite a bit simpler; if there is only one parent, there is no need for using a window function.

lutter added 16 commits December 10, 2019 17:31

graphql: Have Resolver.prefetch return a Vec of errors

5fcdb93

graphql: Panic when we get more than one child for a non-derived field

2b73e40

That situation should be impossible, and indicates a bug in our code

graphql: Rename Resolver.get_child to Resolver.get_prefetched_child

0a8e67e

graph: Force construction of EntityQuery to use the new() method

3584e4f

The plain struct construction is incredibly annoying since there are a ton of tests that need to be changed everytime some detail of how an EntityQuery is represented changes.

graph: Make EntityQuery.order_by() more ergonomic to use

3ca9c49

store: Remove a ton of unneeded EntityFilter::And from the tests

66ca5ff

All they were doing was make the tests harder to read

server, store: Prefer EntityRange.first/skip over .range

b0491de

store: Add the qualified name to a table

d6eeced

We now add `schema.table` to the Table struct in relational layouts so that tables can be used without passing additional information in query generation.

store: Speed up JSONB prefetch query when parent has array of child ids

3b9dbc1

This query took 50s to load EthereumContractHandlers for 100 EthereumContractMappings. After rewriting the query, it takes 280ms.

store: Remove pos and conj parameters from limit_per_window

e226b5a

They were always the same value, and it's therefore unnecessary to pass them in.

graphql, store: Rename the 'parent_id' attribute to 'g$parent_id'

46c7533

That ensures that we will not possibly collide with any attribute the user might provide since '$' is not legal in GraphQL names.

store: Rename the special 'pos' column in queries to 'g$pos'

9f585da

This makes sure we don't possibly collide with a user's GraphQL property 'pos'

graphql: Change the prefix for prefetched attributes

4cad094

We now use 'prefetch:' instead of 'r:'

graph, graphql: Rename 'IncorrectPrefetchResult.single' to 'slow'

76e649b

store: Fix type in comment

3097ec8

lutter force-pushed the lutter/prefetch branch from 87bc506 to 3097ec8 Compare December 11, 2019 01:34

lutter merged commit 3097ec8 into master Dec 11, 2019

lutter deleted the lutter/prefetch branch December 11, 2019 21:15

Jannis mentioned this pull request Dec 23, 2019

SQL query combination #857

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce number of queries during GraphQL execution #1386

Reduce number of queries during GraphQL execution #1386

lutter commented Nov 27, 2019 •

edited

lutter commented Dec 3, 2019

Jannis left a comment

Jannis Dec 9, 2019

lutter Dec 9, 2019

Jannis Dec 10, 2019

Jannis Dec 9, 2019

lutter Dec 9, 2019

Jannis Dec 10, 2019

lutter Dec 10, 2019 •

edited

Jannis Dec 9, 2019

Jannis Dec 10, 2019

lutter Dec 10, 2019

Jannis Dec 9, 2019

lutter Dec 9, 2019

Jannis Dec 9, 2019

Jannis Dec 9, 2019

Jannis Dec 9, 2019

lutter Dec 9, 2019

lutter Dec 9, 2019

Jannis Dec 9, 2019

Jannis Dec 9, 2019

lutter Dec 9, 2019

Jannis Dec 10, 2019

lutter commented Dec 9, 2019

Jannis left a comment

lutter commented Dec 10, 2019

lutter commented Dec 11, 2019


		### Type A

		Use when parent is derived and child is a list (_tt)

Reduce number of queries during GraphQL execution #1386

Reduce number of queries during GraphQL execution #1386

Conversation

lutter commented Nov 27, 2019 • edited

lutter commented Dec 3, 2019

Jannis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lutter Dec 10, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lutter commented Dec 9, 2019

Jannis left a comment

Choose a reason for hiding this comment

lutter commented Dec 10, 2019

lutter commented Dec 11, 2019

lutter commented Nov 27, 2019 •

edited

lutter Dec 10, 2019 •

edited