Idea: Cost function on fields, to allow ballpark analysis of query complexity #105

AndrewIngram · 2015-10-11T13:33:26Z

const FooType = new GraphQLObjectType({
  name: 'Foo',
  fields: () => ({
    id: globalIdField('Food', obj => obj.id),
    bar: {
      type: new GraphQLList(BarType),
      resolve: obj => getBarsForFoo(obj.id),
      resolveCost: n => n * 5,
    },

In this example, n is the number of times this field is requested across all Foos within the query. This may not be possible in pure GraphQL, since the number of results requested is dependent on how the schema is configured, but with a standardised API for connections like in Relay, it might be possible. The 5 is arbitrary, cheap fields would probably have a 0 or 1.

The end result would be the ability to reject queries over a certain cost, but also allow IDEs to warn developers that a particular query is risky.

The text was updated successfully, but these errors were encountered:

leebyron · 2016-01-09T00:56:52Z

I think this would be an awesome project for someone to start up if anyone in the community is looking for ideas.

AndrewIngram · 2016-03-03T18:17:54Z

I just found out that this has been implemented in Sangria:

http://sangria-graphql.org/learn/#query-complexity-analysis

To that end, it might not be necessary to define it in the spec, and leave it up to implementations instead.

dylanahsmith · 2016-03-11T08:18:43Z

I think this provides too narrow a view of how the cost might be calculated. For instance, the cost would at least depend on the arguments in a lot of cases, like of a paginated field (e.g. users(first: $count)). The cost might also depend on other similar selections, which might even re-use the same data, e.g. when using dataloader to batch load and cache datastore queries.

Trying to calculate the cost of a query before it is made can actually be quite difficult, so another possible approach would be to use time as the unit of cost, where a timeout would be used to abort expensive queries.

I think it should just be an implementation detail.

varrodan · 2016-04-07T22:40:03Z

One can provide reasonably faithful cost estimations for evaluating a graph query, if the average branching factors of edge types (i.e. fields) can be calculated from the contents of the data store.

In case of the Star Wars example, if you have a query like
hero { name friends { name } }
then you can estimate the number of friends expected to be retrieved by calculating the average branching factor for friends which is (4+1+3+4+1+4+3)/7 = 2.85 in case of the Star Wars knowledge base. One can further customize this estimate for friends edges running between specific types of nodes.

In case of nested queries, like
{ hero { name friends { name appearsIn friends { name } } } }
one can estimate the size of the expected result set by a sum-product.
The result set is expected to contain

H nodes for heros, plus
H * F nodes by navigating along friends from each hero (where F=2.85 above), plus
H * F * A nodes by further navigating along appearsIn from each of the previous nodes.

We described this estimate 10 years ago in an academic paper, so better estimates are likely to exist since then.

(I am new to GraphQL as a language, sorry if I misused its terminology - or gave a far too academic answer...)

leebyron · 2016-04-07T22:42:12Z

This is great additional context and insight, thanks!

dylanahsmith · 2016-04-22T15:30:43Z

I wouldn't put faith in cost estimations that rely on averages with respect to branching factors. The average might be very low but if you are trying to protect against abuse, then you really want to pay attention the upper bound on the branch factor.

Also, be careful about calculating the cost of queries that require server-side filtering, since they could scan a large number of objects and return very few results. The cost of a request isn't always about the amount of data that is returned.

Maybe this discussion can be moved to the graphql-js repo, since it doesn't seem like something to standardize in the graphql spec.

stanams · 2018-04-19T10:12:41Z

Here's an interesting package that does a pretty good cost analysis for graphql queries: https://github.com/pa-bru/graphql-cost-analysis

leebyron · 2018-04-21T19:48:53Z

Closing this aging issue.

Seems like the consensus is that field cost alone is not always enough to estimate a query’s cost. Different approaches by different libraries indicate that there are many useful approaches with different trade offs, so there is not much to incorporate into the spec to account for this

leebyron added the help wanted label Jan 9, 2016

leebyron closed this as completed Apr 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea: Cost function on fields, to allow ballpark analysis of query complexity #105

Idea: Cost function on fields, to allow ballpark analysis of query complexity #105

AndrewIngram commented Oct 11, 2015

leebyron commented Jan 9, 2016

AndrewIngram commented Mar 3, 2016

dylanahsmith commented Mar 11, 2016

varrodan commented Apr 7, 2016

leebyron commented Apr 7, 2016

dylanahsmith commented Apr 22, 2016

stanams commented Apr 19, 2018

leebyron commented Apr 21, 2018

Idea: Cost function on fields, to allow ballpark analysis of query complexity #105

Idea: Cost function on fields, to allow ballpark analysis of query complexity #105

Comments

AndrewIngram commented Oct 11, 2015

leebyron commented Jan 9, 2016

AndrewIngram commented Mar 3, 2016

dylanahsmith commented Mar 11, 2016

varrodan commented Apr 7, 2016

leebyron commented Apr 7, 2016

dylanahsmith commented Apr 22, 2016

stanams commented Apr 19, 2018

leebyron commented Apr 21, 2018