Introduce planning phase to query execution #304

Open
wants to merge 97 commits into
from

Projects

None yet
@JeffRMoore
Contributor

This PR is intended to address issue #26 and improve developer experience around writing resolving functions.

When writing resolving functions, it can be useful to have foreknowledge of how the execution
engine will evaluate the query. This might allow the resolve function to request only the necessary
fields or to optimistically apply joins.

Currently, applying these optimizations means processing the query AST in a way parallel to
how the execution engine might do so. One must understand aliasing, fragments and directives
and the structure of the AST.

This PR exposes query evaluation foreknowledge by splitting execution into two phases,
a planning phase and an evaluation phase.

In the planning phase the AST in analyzed and a heirarchical plan structure is created indicating
how the executor will evaluate the query. The structure of this plan mirrors the structure of the schema,
not the structure of the query.

This planning information takes the place of GraphQLResolvingInfo in the calls
to resolving functions on the schema.

Pre-calculating this information serves two purposes:

  1. Provides a schema oriented interface to resolving functions predicting what will occur later.
  2. Avoids some re-calculation during query execution when evaluating list results

Examples

A set of examples are available here for various resolver function cases:

https://github.com/JeffRMoore/graphql-optimization-examples

Simple Example

from https://github.com/JeffRMoore/graphql-optimization-examples/blob/master/ex1.js

For this query on a user object

{
  hombre:user(id: "1") {
  id
    ...NameFrag
  }
}
fragment NameFrag on User {
  nombre:name
}

Here is part of a resolver function on a user field of the root query

function (source, args, info) {
  var fieldNames = Object.keys(info.returned.fields);

  console.log('WILL RESOLVE', info.fieldName, 'on', info.parentType.name);
  console.log( '    with fields', fieldNames);

  ...
}

Which produces the following output when executed

WILL RESOLVE user on Query
    with fields [ 'id', 'name' ]
RESULT:
{ hombre: { id: '1', nombre: 'Dan' } }

This shows how a resolve function can use the plan structure to perform a selection
of only the fields that will be accessed later.

Complex Example

from https://github.com/JeffRMoore/graphql-optimization-examples/blob/master/ex6.js

For this query on a user object with a nested location object

  {
    user(id: "1") {
      where: location {
        city
      }
    }
  }

Here is part of a resolver function on a user field of the root query

resolve: (source, args, info) => {
  const userFields = info.returned.fieldPlans;
  const userFieldNames = Object.keys(userFields);

  console.log('WILL RESOLVE',
    info.fieldName, 'on', info.parentType.name);
  console.log( '    with fields', userFieldNames);

  if (userFields.location) {
    userFields.location.forEach(fieldPlan => {
      const locationFields = fieldPlan.returned.fieldPlans;
      const locationFieldNames = Object.keys(locationFields);

      console.log('WILL RESOLVE',
        fieldPlan.fieldName, 'on', fieldPlan.parentType.name);
      console.log( '    with fields', locationFieldNames);
    });
  }

  ...
}

Which produces the following output when executed

WILL RESOLVE user on Query
    with fields [ 'location' ]
WILL RESOLVE location on User
    with fields [ 'city' ]
RESULT:
{ user: { where: { city: 'London' } } }

Here, we are looking ahead into the location field of user to see what fields
of location will be queried.

We have to use forEach to do this because any given field could be resolved
multiple times by the execution engine. A field can be resolved multiple times
with different arguments or multiple times with the same arguments. The execution
engine cannot make presumptions about the return value being the same or different.
(The resolver could literally be return random numbers.)

This resolver could collect the nested field set and pass a query to a backend
document store. Or it could use the location information to add a join clause to
master query for user.

The Planning Data Structures

GraphQLResolvingPlan

The GraphQLResolvingPlan structure describes the process of calling the resolve function to
resolve the value of a field on a GraphQLObjectType parent.

The GraphQLResolvingPlan data structure will be passed to the resolve function as the
third parameter, info.

type GraphQLFieldResolveFn = (
  source: mixed,
  args: {[argName: string]: mixed},
  info: GraphQLResolvingPlan
) => mixed

The first accessible plan data structure during query execution will be a GraphQLResolvingPlan
when resolve is called on the fields of the root GraphQLObjectType.

(There is a GraphQLOperationPlan that describes the operation, but it is not accessible
to functions attached to the schema.)

GraphQLResolvingPlan contains the following fields as well as the
"All Fields" and "GraphQLResolveInfo" fields described later.

Field Type Description
kind string always 'resolve' for a GraphQLResolvingPlan
fieldDefinition GraphQLFieldDefinition The definition of the field to be resolved
args { [key: string]: mixed } The arguments that will be passed to resolve
returned GraphQLCompletionPlan The plan which will be evaluated over the return value from resolve

the returned field describes the next step the execution engine will take depending
on the type of the field on GraphQLObjectType.

type GraphQLCompletionPlan =
  GraphQLSerializationPlan |
  GraphQLMappingPlan |
  GraphQLSelectionPlan |
  GraphQLCoercionPlan;
Plan Type Description
GraphQLSerializationPlan for GraphQLScalarType and GraphQLEnumType fields the return value will be serialized.
GraphQLMappingPlan for GraphQLListType the return value will be mapped, elements of the list will be further processed.
GraphQLSelectionPlan A GraphQLObjectType fields will have its fields selected.
GraphQLCoercionPlan for GraphQLUnionType or GraphQLInterfaceType fields, the return value will be coerced to the proper run time type

GraphQLSerializationPlan

GraphQLSerializationPlan describes the process of calling the serialize function
on a FieldDefinition. This is the leaf node of the planning tree.

GraphQLSerializationPlan contains the following fields as well as the
"All Fields" fields described later.

Field Type Description
kind string always 'serialize' for a GraphQLSerializationPlan

GraphQLMappingPlan

GraphQLMappingPlan describes the processing of iterating over the elements of a
GraphQLListType and completing each element value. This is an internal node in the
planning tree and is not passed directly to functions registered with the schema, but
instead is a child plan depending on the structure of the schema.

GraphQLMappingPlan contains the following fields as well as the
"All Fields" fields described later.

Field Type Description
kind string always 'map' for a GraphQLMappingPlan
listElement GraphQLCompletionPlan The plan which will be evaluated for each element in the list

GraphQLSelectionPlan

A GraphQLSelectionPlan indicates which fields will be selected from a GraphQLObjectType
as part of the return value completion process.

If the GraphQLObjectType has defined an isTypeOf function, this function will be called
before the selection operation is applied to verify that the type of the runtime value
matches the expected type. If no isTypeOf is defined, the value is presumed to be of
that type and evaluation proceeds. isTypeOf may also be called during the coercion process.
(see Below)

The GraphQLSelectionPlan data structure will be passed to the isTypeOf function as the
second parameter, info.

export type GraphQLIsTypeOfFn = (
  value: mixed,
  info?: GraphQLSelectionPlan
) => boolean

GraphQLSelectionPlan contains the following fields as well as the
"All Fields" fields described later.

Field Type Description
kind string always 'select' for a GraphQLSelectionPlan
fields {[fieldName: string]: [ GraphQLResolvingPlan ]} Maps fields that will be resolved on the to a list of GraphQLResolvingPlan.
fieldPlansByAlias {[alias: string]: GraphQLResolvingPlan} Maps object keys that will appear in the query results to a list of GraphQLResolvingPlan.

The keys of fields match the names of fields on the GraphQLObjectType so this will be the most
common way of access. fieldPlansByAlias contains the exact same plans, just with a different
organization.

Each value in fields is a list because the execution engine may attempt to resolve any given
field multiple times, if for example it had parameters or was aliased.

GraphQLCoercionPlan

A GraphQLCoercionPlan describes how to process a value of an abstract type
(GraphQLInterfaceType or GraphQLUnionType) based on the runtime type value.

A plan for each possible GraphQLObjectType type is constructed and placed in
typeChoices by type name.

A value is resolved in one of two ways:

  1. If the abstract type declares a resolveType function, that function is called and
    the name of the type is used to determine which plan to proceed with.
  2. Otherwise, a selection operation is started for each possible type, querying that
    type's isTypeOf function to determine if the value to coerce is an instance of that
    type. If no isTypeOf is defined, that plan will not be evaluated.

If the type cannot be resolved it is an error. This is usually due to an error in
constructing the schema.

The GraphQLCoercionPlan data structure will be passed to the resolve function as the
second parameter, info.

export type GraphQLTypeResolveFn = (
  value: mixed,
  info?: GraphQLCoercionPlan
) => ?GraphQLObjectType

GraphQLCoercionPlan contains the following fields as well as the
"All Fields" fields described later.

Field Type Description
kind string always 'coerce' for a GraphQLCoercionPlan
typeChoices {[typeName: string]: GraphQLSelectionPlan} A map of possible plans organized by type name.

Fields in All Plans

All plans (GraphQLResolvingPlan, GraphQLSerializationPlan, GraphQLMappingPlan,
GraphQLSelectionPlan, and GraphQLCoercionPlan) contain the following fields.

Field Type Description
kind string The kind of plan
fieldName string The name of the current field in the parent context
fieldASTs Array<Field> Portion of the AST that generated this plan
returnType GraphQLObjectType The type of the value returned after evaluating the planned operation
parentType GraphQLCompositeType The parent context on which a selection was last performed

Fields in GraphQLResolveInfo Plans

The plans that can be passed to schema functions (GraphQLResolvingPlan,
GraphQLSelectionPlan, and GraphQLCoercionPlan) contain the following additional fields.

Field Type Description
schema GraphQLSchema The schema instance that the query is being executed on
fragments { [fragmentName: string]: FragmentDefinition } Fragment definitions included in the query
rootValue mixed The root value passed to execute
operation OperationDefinition Description of operation being executed
variableValues { [variableName: string]: mixed } post processed variable values passed to execute

Open Issues

Big Diff

Sorry.

I know its hard to review something like this. See Changes section, hope that helps.

invariants vs Errors

This is the first time I've used typeflow so I'm not quire sure the invariant statements
I added are totally correct. Also, I was very confused about when to use invariant vs
throwing an error.

To me invariant is used when you expect the check to be removed in production. I'm not
sure that's the general usage in this code.

Please review this carefully, my concern is that I've hidden some error in my ignorance.

Backward compatibility

getObjectType has been moved from definitions.js to execute.js to be more like
default field resolver handling. This was necessary to avoid having isTypeOf accept
a union of types.

An invariant test is triggered if type resolution fails where before it would silently return null.

An invariant test is triggered if two types with the same name implement the same
GraphQLInterfaceType, or two types of the same name are assigned to the same GraphQLUnionType.

I left GraphQLResolvingInfo in for BC even though it isn't used any more in case someone
has referenced it.

Resolving functions

I experimented with adding a resolve function to each type of plan. This would allow
more conditional logic to be moved from the evaluation phase to the planning phase.

It also creates an interesting possibility of being able to pass the top level plan to
a transformation function which might analyze the query plan and return a DIFFERENT plan
with resolver functions replaced or wrapped to implement new behaviors.

For another time.

Changes

definition.js:

  • Introduced plan data structures: GraphQLResolvingPlan, GraphQLCompletionPlan, GraphQLSerializationPlan, GraphQLMappingPlan, GraphQLSelectionPlan, GraphQLCoercionPlan
  • Introduced GraphQLIsTypeOfFn ala GraphQLResolvingFn
  • Introduced GraphQLTypeResolveFn ala GraphQLResolvingFn
  • Changed the signature of info in GraphQLResolvingFn from GraphQLResolveInfo to GraphQLResolvingPlan
  • Changed the signature of info in GraphQLIsTypeOfFn from GraphQLResolveInfo to GraphQLSelectionPlan
  • Changed the signature of info in GraphQLTypeResolveFn from GraphQLResolveInfo to GraphQLCoercionPlan
  • Changed GraphQLResolveInfo to be the union of GraphQLResolvingFn, GraphQLIsTypeOfFn, and GraphQLTypeResolveFn
  • moved default type resolution functions getTypeOf and getObjectType to execute.js to live alongside the default field resolution function

execute.js

General

  • introduce GraphQLOperationPlan type
  • change execute to call planOperation function
  • change the signature of defaultResolveFn to accept a plan
  • Add findTypeWithResolveType method to support Coercion operation
  • Add findTypeWithIsTypeOf method to support Coercion operation using logic from getTypeOf from definition.js

executeOperation

  • introduce planOperation analog to executeOperation that produces a GraphQLOperationPlan
  • change executeOperation signature accept a GraphQLOperationPlan instead of an OperationDefinition
  • move call to collectFields in executeOperation to planOperation

executeFields and executeFieldsSerially

  • introduce planFields analog to executeFields and executeFieldsSerially that produces a fieldPlans and fieldsList results
  • removed parentType from signature of executeFields and executeFieldsSerially since this is now part of the plan
  • changed the fields parameter to executeFields and executeFieldsSerially to be an map of GraphQLResolvingPlan instead of Array

resolveField

  • introduce planResolveField analog to resolveField that returns a GraphQLResolvingPlan
  • added a plan parameter of type GraphQLResolvingPlan to resolveField
  • remove parentType and fieldASTs parameters from resolveField since these are part of the plan
  • Move FieldDefinition lookup from resolveField to planResolveField
  • Move call to getArgumentValues from resolveField to planResolveField
  • Remove construction of GraphQLResolveInfo from resolveField

resolveOrError

  • Change the info parameter on resolveOrError to accept a GraphQLResolvingPlan instead of GraphQLResolveInfo

completeValueCatchingError

  • Change the signature of completeValueCatchingError to accept a GraphQLCompletionPlan instead of GraphQLResolveInfo
  • Remove fieldASTs field from completeValueCatchingError, this information is now in the plan

completeValue

  • introduce planCompleteValue analog to completeValue that returns a GraphQLCompletionPlan
  • Change the signature of completeValue to accept a GraphQLCompletionPlan instead of GraphQLResolveInfo
  • move call to collectFields from completeValue to planSelection
  • Refactor coercion logic to be based on pre-calculated plans for each type
JeffRMoore added some commits Feb 25, 2016
@JeffRMoore JeffRMoore Extract Execution context building to a separate modules so that Exec…
…utionContext type can be exported, enabling plan building to be a separate module.
88c7455
@JeffRMoore JeffRMoore Create a planning module and move some initial functionality there th…
…at is only used in the planning phase
ecf7b8e
@JeffRMoore JeffRMoore Remove accidentally duplicated code; initial call to planOperation 72bccc2
@JeffRMoore JeffRMoore Introduce the OperationExecutionPlan type 08c08f0
@JeffRMoore JeffRMoore Begin executing based on the plan b8f16a6
@JeffRMoore JeffRMoore Introduce a FieldResolvingPlan 4c78283
@JeffRMoore JeffRMoore Execute based on the FieldResolvingPlan as far forward as it has been…
… calculated
9b598b0
@JeffRMoore JeffRMoore Create plans for CompleteValue execution, rename plan names to be mor…
…e consistent
4ffca0a
@JeffRMoore JeffRMoore Propigate execution plan into CompleteValue f4f561a
@JeffRMoore JeffRMoore Correct misspelling add kind field to Execution Plans 0fca85b
@JeffRMoore JeffRMoore Checkpoint in the middle of converting CompleteValue to use Execution…
… Plans
15f48cd
@JeffRMoore JeffRMoore Checkpoint in the middle of converting CompleteValue to use Execution…
… Plans, completed select support
a79d4e3
@JeffRMoore JeffRMoore Checkpoint converting completeValue over to execution plans b033bf8
@JeffRMoore JeffRMoore Remove function name suffixes now that we have collapsed into only us…
…ing execution plans
371448b
@JeffRMoore JeffRMoore Execution Plans are no longer optional 6a06fe1
@JeffRMoore JeffRMoore Remove ignoring execution plan 4b017d7
@JeffRMoore JeffRMoore Harmonize names of execution plans 93d8091
@JeffRMoore JeffRMoore Fix imports of execution plan types a1f8aa3
@JeffRMoore JeffRMoore Indeed, it is in intional to omit fields, change comment to reflect that db8ad59
@JeffRMoore JeffRMoore Remove wrapper around switch statement a1f6c3e
@JeffRMoore JeffRMoore Add kind as a constant on execution plan type definitions 8405ebb
@JeffRMoore JeffRMoore Fix incorrectly imported type definitions 5f8a43f
@JeffRMoore JeffRMoore Fix broken CoercionExecutionPlan definition d3bd76a
@JeffRMoore JeffRMoore Remove unnecessary invariant declarations d7e34b7
@JeffRMoore JeffRMoore Change ExecuteFields to accept a plan instead of a list of fields a4beecf
@JeffRMoore JeffRMoore No need to capture fields, this wasy a temporary artifact of transition f34ccac
@JeffRMoore JeffRMoore Rename ExecutionPlan to CompletionExecutionPlan. That's more indicati…
…ve of what it is. Frees up ExecutionPlan for more generic uses
d841bdc
@JeffRMoore JeffRMoore Capture type information in the Execution Plans 36fb2d1
@JeffRMoore JeffRMoore move fieldASTs information into the Execution Plans and out of the ca…
…lling chain.
cdbb5d5
@JeffRMoore JeffRMoore Narrow the width of the executeFields methods by removing the type pa…
…rameter, which is also on the plan
787c4e6
@JeffRMoore JeffRMoore Rename returnType to type to harmonize the name of the type field on …
…Execution plans
031bc44
@JeffRMoore JeffRMoore Remove more caller type propigation in favor of the Execution plan 8eb0381
@JeffRMoore JeffRMoore Brain dump on what is left to complete cae4343
@JeffRMoore JeffRMoore Document parallels between CompleteValue and planCompletion and Add i…
…nvariants to execution for conditions tested during planning
4b3db81
@JeffRMoore JeffRMoore Remove innerType, it does not provide useful additional information bef596b
@JeffRMoore JeffRMoore Additional Documentation 7fe185e
@JeffRMoore JeffRMoore Add fields from GraphQLResolveInfo to ResolvingExecutionPlan f6f035c
@JeffRMoore JeffRMoore Remove duplicate type information 4538e19
@JeffRMoore JeffRMoore There is already a resolver signature defined a8aef5f
@JeffRMoore JeffRMoore Organize TODO list 6ae2dba
@JeffRMoore JeffRMoore Harmonize and document plan terminology 52cc623
@JeffRMoore JeffRMoore Wire up the OperationExecutionPlan a61e372
@JeffRMoore JeffRMoore Remove unnecessary strategy in selection plan d984c28
@JeffRMoore JeffRMoore Merge GraphQLResolveInfo fields into GraphQLTypeResolvingPlan 2a816db
@JeffRMoore JeffRMoore Merge GraphQLResolveInfo fields into GraphQLSelectionCompletionPlan b7fcb3c
@JeffRMoore JeffRMoore Move plan type definitions to definitions.js 7a921cd
@JeffRMoore JeffRMoore Use the plan instead of GraphQLResolveInfo as the parameter to resolv…
…e and source of error information data
fe82467
@JeffRMoore JeffRMoore Change order of plan parameters 20a693d
@JeffRMoore JeffRMoore Simplify the parameter signature of resolveOrError f3ed20e
@JeffRMoore JeffRMoore Add some messaging for unreachable conditions 1b61e5e
@JeffRMoore JeffRMoore Giving up on getting reduce past typeflow b408f08
@JeffRMoore JeffRMoore Remove unnecessary typeflow boilerplate f23a827
@JeffRMoore JeffRMoore Recombine files so Pull request diff is not ridiculous a540d1e
@JeffRMoore JeffRMoore Re-consolidate plan.js into execute.js so that Pull request diffs are…
… cleaner
2c18391
@JeffRMoore JeffRMoore Distintions without difference: clean up diff d79a7ca
@JeffRMoore JeffRMoore Clean up plans removing items I don't know how to handle 802f50d
@JeffRMoore JeffRMoore Rename kind constants 067f1cd
@JeffRMoore JeffRMoore Rename innerCompletionPlan to completionPlan eba24dc
@JeffRMoore JeffRMoore Clean up plan naming d8a84f8
@JeffRMoore JeffRMoore Notes b8aa771
@JeffRMoore JeffRMoore Create NOTES.md 81302b8
@JeffRMoore JeffRMoore Update NOTES.md 344ae1e
@JeffRMoore JeffRMoore Update NOTES.md 224cce8
@JeffRMoore JeffRMoore Merge branch 'master' of https://github.com/JeffRMoore/graphql-js e10b2ec
@JeffRMoore JeffRMoore Update NOTES.md 76be755
@JeffRMoore JeffRMoore Merge branch 'master' of https://github.com/JeffRMoore/graphql-js e4dad0f
@JeffRMoore JeffRMoore Factor out duplicate code, so that isTypeOf will only receive one typ…
…e of plan
1b5e2bb
@JeffRMoore JeffRMoore Fix spacing fa2becb
@JeffRMoore JeffRMoore Move getObjectType logic from definitions to execution, setting up is…
…TypeOf to have a simpler signature in future refactoring
b87c2bc
@JeffRMoore JeffRMoore For each of the three resolving functions, we are now able to specify…
… which concrete plan they will receive in their info parameter
9b766d6
@JeffRMoore JeffRMoore Add returnType back to GraphQLCoercionPlan 43c0c27
@JeffRMoore JeffRMoore Update comments with evaluate terminology 5a71e1d
@JeffRMoore JeffRMoore Clean up invariant messages d94c4cd
@JeffRMoore JeffRMoore Remove NOTES.md file c7a6296
@JeffRMoore JeffRMoore Change the order of Plan type declarations to match documentation and…
… evaluation order
6877224
@JeffRMoore JeffRMoore Rename generic key portion of type to more explainatory alias or type…
…Name
95e33b3
@JeffRMoore JeffRMoore Introduce term alias instead of generic key for more descriptive typing 3d9fa60
@JeffRMoore JeffRMoore Remove resolveFn and store fieldDefinition instead which is more usef…
…ul, bringing resolve in line with the other plans, so that I don't have to document why its different
1b75c3a
@JeffRMoore JeffRMoore Incorrectly restored method signature 53a6e15
@JeffRMoore JeffRMoore Add precomputed fieldList property to selection plans, making the sim…
…ple resolver cases simple
7949a59
@JeffRMoore JeffRMoore Rename completionPlan to elementPlan for GraphQLMappingPlan to avoid …
…confusion when chaining fields.
65c774e
@JeffRMoore JeffRMoore Rename type to returnType for documentation consistency 0efee5b
@JeffRMoore JeffRMoore Differences without distinction: restore original order of operation …
…to make diff easier to understand
73321f2
@JeffRMoore JeffRMoore Move GraphQLOperationPlan to execute.js since it is not exposed publicly f8fd245
@JeffRMoore JeffRMoore Rename strategy to concurrencyStrategy 6b6ecb8
@JeffRMoore JeffRMoore Add an invariant in case types are setup incorrectly 9feb9f3
@JeffRMoore JeffRMoore Rename fieldPlans to fieldPlansByAlias and fieldList to fieldPlans dfd4259
@JeffRMoore JeffRMoore forgot to change the type of the fieldPlans array
675feec
@leebyron
Contributor
leebyron commented Mar 4, 2016

Whoa, this is awesome work!

It's going to take me a while to digest all of this, so thanks in advance for your patience.

Also since this library is designed to be a reference implementation to the spec, the next steps will probably finding the minimum viable variation of this which can be fitted into the working draft spec.

@leebyron
Contributor
leebyron commented Mar 4, 2016

Also I'm curious if while investigating this you've done any kind of performance testing? I'm really interested in knowing the relative size of the GraphQL schema you're working with and if you found any concrete performance effects from this change. I imagine it could be faster because there's some good memoization happening, but also possibly slower because of the additional step.

@leebyron
Contributor
leebyron commented Mar 4, 2016

The other thing I'd love to see are examples of usage of this API. Love the examples of console logging the data structure in the resolver - but what's fuzzier for me is how this fulfills the primary use cases cited in #26 - specifically, how might you write an optimized SQL query by using this tool? What's the API?

@JeffRMoore JeffRMoore Rename plan traversal fields to form a more fluent interface: complet…
…ionPlan to returned, elementPlan to listElement, fieldPlans to fields, and selectionPlansByType to typeChoices
f97def5
@JeffRMoore
Contributor

Thanks for taking a look, I know its a lot to process.

When I woke up this morning I realized that the xyzPlan suffixes were pretty unwieldy so I changed the names as follows to make the interface more fluent:

fieldPlans to fields, selectionPlansByType to typeChoices, elementPlan to listElement and completionPlan to returned.

Sorry for the instability. I haven't worked much on github and didn't realize when I pushed it to my repo it would update the PR automatically. I've updated the documentation above and the examples.

@JeffRMoore
Contributor

I have not done any performance testing on whether queries take more or less time because of the planning phase. My goal was mostly to improve the experience of consuming the info parameter in the resolve functions.

The overhead is proportional to the number of fields in the query, since planning creates one plan object per field in the query.

That overhead is magnified if your query that access abstract types since each possible type choice needs to be predicted.

Whether or not that overhead time is recovered depends on how many list elements the plan construction time can be amortized over.

@JeffRMoore
Contributor

Good point regarding the examples. I'll work on making them more concrete.

@JeffRMoore
Contributor

I ran some performance tests with a query that returns a list of objects with 12 string fields using 1 named fragment.

n released version planning version
1 2,312 2,186
10 1,865 2,038
100 550 1,217
1000 74 252

n is the number of elements in the list, result is queries per second.

Here is with caching ASTs:

n released version planning version
1 36,519 20,134
10 7,042 11,947
100 779 2,456
1000 70 275

The performance crossover for this query was just between n=2 and n=3

I also experimented with caching query plans. Caching the operation plan at n=1 resulted in 67,080 qps.

I did a version of caching field resolving plans (which could be shared between queries) and at n=1 ended up at 43,000. (basically erasing the penalty for planning)

Embedding rootValue and variableValues precludes plan caching for any query. Without these fields, the operation plan can be cached as a function of the input query text, just as the AST can be cached.

Embedding fieldASTs, fragments, operation as well as rootValue and variableValues prevents caching field resolving plans. Note that if they could be cached, they could be used between queries, even for planning never seen before queries.

Caching field resolving plans for fields with no arguments is trivial to write as well as caching operation plans for queries with no variables.

My gut feel is that implementing caching of field resolving plans for fields with arguments won't pay off.

Implementing variable substitution for operation plans would be somewhat difficult to write, but could be a big win (for the low n case).

I feel like the sweet spot for graphql is providing an internal api where one might expect a bounded set of query texts, relying mostly on variables. (this is my use case, which is why I'm interested in execution phase performance over validation or parsing.)

I also think (hope) that these cache-precluding data items will not be necessary with the other fields introduced to the plans. I may also be missing something.

@freiksenet
Contributor

This is amazing stuff, I've attempted to build a similar thing to this, but got turned off by the sheer complexity. I am very glad someone is working on this!

@charlieschwabacher

This is so great! Both the changes and how completely they are documented.

I have been experimenting w/ a 'schema defined backend' - this will make it much easier to convert a graphql query into an optimized SQL or cypher query at the root level. Being able to define resolution only once will be a huge win.

@JeffRMoore
Contributor

Thanks.

Regarding the caching of fragments, that was speculative. I'm not suggesting removing those here, my goal for this PR is maximum backward compatibility. I think removing rootValue would be a big blow to usability and I don't think its worth it. Thus, caching plans would require cloning to fill in the relevant info. I'll eventually get to testing out whether that's worth it or not.

As far as the performance loss to planning, after profiling, I found some areas of work duplication during the planning process. I have hope that removing that duplication will bring the n=1 record times back to near parity.

For now, I'm working on a benchmark suite, which I plan to submit as a separate PR.

One obstacle that I've had during this is trying to understand what should fieldASTs, returnType, fieldName and parentType contain for calls to resolve, resolveType and isTypeOf. resolve is easy to understand, except in the nonNullable case.

isType and ResolveType are less clear. Inside the info received by isType, should returnType be the value of the parent field being resolved or the current context? Should it be the nonNull wrapper or the inner type value?

Consider resolving a field that contains an array of non-nullable union types. What does and should be passed to the inner call to resolveType? to isTypeOf if resolveType is not specified?

The test suite is amazing and I could never have done this work without out it, but on these matters, its silent. If someone wanted to help out, augmenting the test suite with tests for these four fields under a variety of nested and non null conditions would really help and I think is necessary for this PR to be merged.

@JeffRMoore
Contributor

There is a test which checks for schema and rootValue, but not any of the other fields in GraphQLResolveInfo. The suite should have the same kind of test for the other fields, especially fieldASTs, returnType, and parentType in nested and nonNull conditions. also for all three functions that receive GraphQLResolveInfo.

https://github.com/graphql/graphql-js/blob/master/src/execution/__tests__/union-interface.js#L350

@denvned denvned commented on the diff Mar 19, 2016
src/execution/execute.js
+ exeContext: ExecutionContext,
+ returnType: GraphQLAbstractType,
+ fieldASTs: Array<Field>,
+ fieldName: string,
+ parentType: GraphQLCompositeType
+): GraphQLCoercionPlan {
+ const abstractType = ((returnType: any): GraphQLAbstractType);
+ const possibleTypes = abstractType.getPossibleTypes();
+ const typeChoices = Object.create(null);
+ possibleTypes.forEach(possibleType => {
+ invariant(
+ !typeChoices[possibleType.name],
+ 'Two types cannot have the same name "${possibleType.name}"' +
+ 'as possible types of abstract type ${abstractType.name}'
+ );
+ typeChoices[possibleType.name] = planSelection(
@denvned
denvned Mar 19, 2016 Contributor

It might make sense to calculate typeChoices lazily, otherwise number of nodes in a plan tree might grow exponentially with each nesting level of GraphQLInterface.

@denvned
denvned Mar 20, 2016 Contributor

Not as important, but it probably also makes sense to compute child plans lazily in general, because, for example, if a field resolves to null, we might not need to evaluate its child plan subtrees at all. In some cases performance improvement might be noticeable.

@JeffRMoore
Contributor

My primary goal for creating the planning phase was to be able to expose the plan to the resolve function so that it could prefetch items that will be resolved later, or exclude information that would not be used. Lazy creation of plans would defeat that. The only way I can think to do that is to pass a callback to the resolve function that materializes the plan if asked for. This would add more conditional checks back into the execution phase.

There is definitely extra overhead if plans are created for typeChoices and then not used, especially if there are many typeChoices.

I had not considered the case of resolving to null, I'll cover that case in the benchmarks I am building to see what the tradeoffs are. Thanks.

@denvned
Contributor
denvned commented Mar 21, 2016

@JeffRMoore First of all, thank you very much for the great work!

Lazy creation of plans would defeat that.

I don't think so.

The only way I can think to do that is to pass a callback to the resolve function that materializes the plan if asked for. This would add more conditional checks back into the execution phase.

If we use JS getters, no additional changes to resolvers or the executor are needed. For example, in the case of GraphQLCoercionPlan we could write:

...
const typeChoiceCache = {};
possibleTypes.forEach(possibleType => {
  Object.defineProperty(
    typeChoices,
    possibleType.name,
    {
      enumerable: true,
      get() {
        if (!Object.prototype.hasOwnProperty.call(typeChoiceCache, possibleType.name)) {
          typeChoiceCache[possibleType.name] = planSelection(...);
        }
        return typeChoiceCache[possibleType.name];
      }
    }
  );
});
...

Also we could memoize the coercion plan in the corresponding AST node, to guarantee that even if a user decided to walk the whole plan tree, it does not grow exponentially. That's because, with such memoization the plan is not a tree anymore, but a directed acyclic graph.

I had not considered the case of resolving to null, I'll cover that case in the benchmarks I am building to see what the tradeoffs are.

Please also cover the following case:

Given a schema:

interface Node { id: ID!, child: Node }
type T1 implements Node { id: ID!, child: Node }
type T2 implements Node { id: ID!, child: Node }
type T3 implements Node { id: ID!, child: Node }
type T4 implements Node { id: ID!, child: Node }
type T5 implements Node { id: ID!, child: Node }
type T6 implements Node { id: ID!, child: Node }
type T7 implements Node { id: ID!, child: Node }
type T8 implements Node { id: ID!, child: Node }
type T9 implements Node { id: ID!, child: Node }
type T10 implements Node { id: ID!, child: Node }
type Query { root: Node }

and a query:

{root{child{child{child{child{child{child{child{child{child{id}}}}}}}}}}}

The current implementation would have to calculate and store in RAM more than 1010=10,000,000,000 plan nodes...

With the proposed lazy evaluation the number of plan nodes would normally be proportional to number of AST nodes in a document.

And with the proposed memoization the number of plan nodes would guaranteedly be proportional to number of AST nodes in a document in the worst case.

@JeffRMoore
Contributor

That's a nasty worst case. Very helpful. I'll explore defineProperty. Thanks.

There would be a new worst case where the number of plan nodes can grow out of proportion, if one adds lists of Node, where the members of the list can vary in type.

@JeffRMoore
Contributor

Off the top of my head, maybe union and interface should be treated differently? In an interface, the selection plan isn't really different, except possibly for returnType, which I'm confused about anyway in this case. Maybe the typeChoices for a union type can be restricted based on what is discovered during the collectFields phase?

@denvned
Contributor
denvned commented Mar 21, 2016

There would be a new worst case where the number of plan nodes can grow out of proportion, if one adds lists of Node, where the members of the list can vary in type.

A list of Nodes shouldn't be more problematic than a simple Node field, because you still have only one coercion plan node per whole list.

maybe union and interface should be treated differently?

Actually, unions are not easier than interfaces. Consider a schema:

union Node = T1 | T2 | <...> | T10

type T1 { id: ID!, child: Node }
type T2 { id: ID!, child: Node }
<...>
type T10 { id: ID!, child: Node }

type Query { root: Node }

And a query:

query {
  root { ...level1 }
}

fragment level1 on Node {
  ... on T1 { child { ...level2 } }
  ... on T2 { child { ...level2 } }
  <...>
  ... on T10 { child { ...level2 } }
}

fragment level2 on Node {
  ... on T1 { child { ...level3 } }
  ... on T2 { child { ...level3 } }
  <...>
  ... on T10 { child { ...level3 } }
}

<...>

fragment level10 on Node {
  ... on T1 { id }
  ... on T2 { id }
  <...>
  ... on T10 { id }
}

Here again, without memoization and laziness, we have to calculate and store in RAM more than 1010=10,000,000,000 plan nodes...

Maybe the typeChoices for a union type can be restricted based on what is discovered during the collectFields phase?

This won't solve the problem, because selection plans for skipped types would be empty anyway. Besides, it might be useful to be able to get a selection plan for a possible type even if it is empty.

@leebyron
Contributor

There is definitely extra overhead if plans are created for typeChoices and then not used, especially if there are many typeChoices.

Unfortunately this is a common case. For example, Relay apps require many types to implement an interface Node { id: ID } and it's not uncommon to see fields which return type Node in such schema.

@JeffRMoore
Contributor

I'm convinced, the plan size needs to be proportional to query complexity. I think this is doable but will be conceptually more complex.

@helfer
Contributor
helfer commented Apr 9, 2016

@JeffRMoore I'm curious about the performance metrics you quoted, and would love to see what the resolve functions look like that you wrote for it. Do you have a repo for that where I could reproduce them? I'm guessing these were for some flavor of SQL?
Also, I'm trying to understand where and why the current executor is slower than your solution. What parts of the query are executed differently with your query planner?

And finally, sorry for the barrage of questions, but this looks really interesting and I want to make sure I understand it.

@JeffRMoore
Contributor

@helfer At this point you should disregard the actual metrics reported. I've abandoned my original methodology. My tests were not against any particular backend. Those times were only for the time consumed by the execute process itself. I used a memory generated data structure.

You can see my current work here: https://github.com/JeffRMoore/graphql-js/tree/benchmarks

I'm guessing it will be ready to submit by end of next weekend.

I'm standing in front of a herd of naked yaks on this one. Ended up writing a performance regression test library, which is why I've been quiet here.

https://github.com/JeffRMoore/async-benchmark-runner

The main difference in performance stems largely from the current implementation having to re-calculate collectFields on every iteration through the list.

@ruslantalpa

@JeffRMoore looked a bit at this PR. Am i right in saying that the core idea here is to "inline" fragments, variables and interfaces into the AST so that resolvers don't have to deal with that (and maybe provide the fieldAST in a slightly more digestible form)?

@JeffRMoore
Contributor

@ruslantalpa Instead of inlining into the AST, I would say that the idea is to introduce a separate intermediate representation that is tailored to the process of resolving (an execution plan). The idea is absolutely that resolve functions should not have to digest fieldASTs and fragments, and such.

@Globegitter

@JeffRMoore Found this proposal not too long ago and this seems an addition really useful to some of our use-cases. What is the current state of the PR? Not too easily to figure out through this discussion.

@syrusakbary

Would love to see this in the main codebase! Any updates?

@JeffRMoore
Contributor

Sorry, got distracted, will continue to be for at least two more weeks. Still planning to return to this.

@ghost ghost added the CLA Signed label Jul 12, 2016
@roippi roippi referenced this pull request in Youshido/GraphQL Jul 16, 2016
Closed

thoughts on implementing Query Complexity Analysis? #18

@mvgijssel

Thanks for all the great work! Would also love to see this shipped, as it would definitely help us!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment