Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GraphQL doing unnecessary/repetitious requests/resolves. #111

Closed
johanatan opened this issue Aug 8, 2015 · 26 comments
Closed

GraphQL doing unnecessary/repetitious requests/resolves. #111

johanatan opened this issue Aug 8, 2015 · 26 comments

Comments

@johanatan
Copy link
Contributor

Hi,

Given the following type language schema:

type Principal {
  id: ID!
  name: String!
  partners: [Partner]
}

type Partner {
  id: ID!
  name: String!
  address: Address
  principal: Principal
}

and the following query:

{ Partners { 
  id name principal { 
    name id partners { 
      name principal { 
        name } }  } } }

we see the following requests for data on the backend:

Received request: uid: c5efdde0-7453-4346-81f0-2303a15cc5f3, entity: Partner, filter: all()
Received request: uid: 5c7ad531-baf7-49d1-9b3a-4a10033fba55, entity: Principal, filter: id=1
Received request: uid: e8df54b7-a6d3-47f2-aaa8-7331888cb888, entity: Principal, filter: id=1
Received request: uid: 555e4cea-1cbe-4893-81ce-4c94337e6061, entity: Partner, filter: id=1
Received request: uid: 388165bb-c386-4599-9a2f-85d7cdfda373, entity: Partner, filter: id=2
Received request: uid: 909fd7e9-0921-406c-89d2-881cc827208d, entity: Partner, filter: id=1
Received request: uid: 2462476d-3436-487f-9270-68208e1e5296, entity: Partner, filter: id=2
Received request: uid: bc24de58-25ee-426b-ad77-3994e3dc03ad, entity: Principal, filter: id=1
Received request: uid: 9dea18b2-bc54-4939-b0b5-8d402d8ad1a3, entity: Principal, filter: id=1
Received request: uid: 9935c61b-e386-4fd3-ad36-fb7dff5ed954, entity: Principal, filter: id=1
Received request: uid: 6edd79b0-9333-4741-8bfd-6319909e7c5d, entity: Principal, filter: id=1

[uids are generated per unique call to resolve].

These repetitious requests came as a surprise given that one of GraphQL's biggest selling points is the elimination of N+1 queries and the introduction of intelligent caching/simplification/reduction etc etc etc

Is this a known issue?

@leebyron
Copy link
Contributor

leebyron commented Aug 8, 2015

The GraphQL execution engine on its own does not do any caching for you, but simply provides the hooks to determine what level of caching is appropriate for your application.

For example, the production GraphQL server at Facebook does not talk to a database directly, but instead talks to a read-through cache.

@johanatan
Copy link
Contributor Author

Even still caching isn't the only issue here: what about query execution planning/simplification (which pretty much any database will do for you)? And, just to be sure, you're familiar with which parts of the GraphQL whitepaper I'm referring to (in regards to the 'selling')? Also, what particularly does GraphQL do to mitigate N+1 scenarios (as claimed)?

Further, it does seem like GraphQL is essentially the front-end to a database in the traditional sense and thus it would make most sense to do many (if not all of) these things there so are there any plans to move in this direction?

[Updated]

@leebyron
Copy link
Contributor

leebyron commented Aug 8, 2015

GraphQL is more of a structured API surface. I would put GraphQL closer to REST than to SQL. It's designed to be a very thin layer of structure atop arbitrary application code, and thus there is often not enough information for the GraphQL execution engine itself to perform query simplification for you.

Database front-ends which do this are able to make assumptions about the underlying storage and fetching (e.g. SQL) that GraphQL simply cannot do.

What GraphQL can do, however, is provide the structured query to your application before and during an execution so that these hooks may be used for query planning and caching.

GraphQL-JS itself is very young and so while these hooks exist, not much has been built to take advantage of them yet. One exciting example of this exploration is the reducing executor (#26) which seeks to help plan efficient SQL-style queries from a GraphQL input.

@leebyron
Copy link
Contributor

leebyron commented Aug 8, 2015

And, just to be sure, you're familiar with which parts of the GraphQL whitepaper I'm referring to (in regards to the 'selling')?

Sorry, I'm not totally certain, no.

@johanatan
Copy link
Contributor Author

Oh, my apologies: looks like it was on the announcement/introduction of Relay+GraphQL found here:
http://facebook.github.io/react/blog/2015/02/20/introducing-relay-and-graphql.html

Under the first bullet labeled "Performance":

All queries flow through the framework code, where things that would otherwise be inefficient, repeated query patterns get automatically collapsed and batched into efficient, minimal queries. Likewise, the framework knows which data have been previously requested, or for which requests are currently "in flight", so queries can be automatically de-duplicated and the minimal queries can be produced.

This version of it mentioned "N+1" particularly:
https://gist.github.com/wincent/598fa75e22bdfa44cf47

So, apparently this 'performance'-centric portion of the system falls under Relay's purview rather than GraphQL's?

Anyway, great work! I'm finding the GraphQL language itself to be very well thought out/designed (and surely it will be easy to extend the reference (or other) implementation(s) with these additional features as desired).

@iyn
Copy link

iyn commented Aug 25, 2015

@johanatan I believe that the performance "selling point" refers to the communication/request between client and server (e.g mobile-app using Relay <--> API server with GraphQL as an "interface"), so it's not just Relay's pro, but Relay + GraphQL. Still, I believe that there's definitely a "market" for the generic solution that would sit between GraphQL server and the database/cache layer, optimizing the queries. The thing is, this seems to be very project-specific, so such solution needs to be easy to plug-in, at least that's my intuition.

@leebyron
Copy link
Contributor

@iyn sums this up well.

GraphQL itself is designed to be a thin mapping between a query and functions executed on the server. It has inherent possibility for exploiting parallelism and reaping great performance, however GraphQL does not do query planning for you as this will be different depending on your particular backend and product constraints.

I think there is ripe opportunity for the development of libraries to assist in this kind of query planning for various backends that people care about. Unfortunately these don't exist yet, since the GraphQL community is still under 2 months old! We've built some of these at Facebook, but unfortunately they're in PHP and back our custom infrastructure.

@johanatan
Copy link
Contributor Author

I understand your point however there are very basic optimizations available (if you can call them that-- they're really just about avoiding pathological behavior). Due to the semantics of the GraphQL language itself which necessarily involve interaction with a graph (regardless of which backend stores the data), there are high-level optimizations available-- e.g., detection of multiple identical or partially overlapping cycles in the request and recognition of the fact that you've already traversed the cycle (or parts of it) at least once and returning the same results as that first traversal. This is but one example of many I could give. I would encourage you to think in abstract terms and see what optimizations are available in that abstract apart from any storage mechanics (and I can assure you that there are many). It honestly caught me by surprise that this was not included in the language's design as it seems implied by its mere existence to me. Pushing this off on another layer is fine but in my view is just a side-step and a huge missed opportunity; in my opinion, the earliest point that something is possible (and I mean possible in the abstract-- i.e., for all possible downstreams) is where that something should occur.

@leebyron
Copy link
Contributor

To be clear - I'm not saying that GraphQL should not contain these optimizations, just that at present it is premature to include them and potentially accidentally make some backend integrations impossible. Again, GraphQL is early technology and at this point in its life we're expecting experimentation around backend integration and optimization strategies - as some of these become clear, we'll want to consider integrating them into the core library.

@freiksenet
Copy link
Contributor

The biggest problems with optimization currently is that not all parts of the tree are known beforehand, eg due to the way typed fragments work with interface or union resolving. We did this full optimization before graphql-js was released, but we lacked such features in our type system. I believe that it's possible to do some optimization beforehand, though, but I think a more reliable solution at this point is optimization through a query proxy, like described, eg, here https://www.reddit.com/r/reactjs/comments/3flgnu/building_a_graphql_server_with_nodejs_and_sql/ctqudkn

On Thu, Aug 27, 2015 at 10:19 PM, Lee Byron notifications@github.com
wrote:

To be clear - I'm not saying that GraphQL should not contain these optimizations, just that at present it would be premature to include them and potentially make some backend integrations impossible. Again, GraphQL is early technology and at this point in its life we're expecting experimentation around backend integration and optimization strategies - as some of these become clear, we'll want to consider integrating them into the core library.

Reply to this email directly or view it on GitHub:
#111 (comment)

@johanatan
Copy link
Contributor Author

@leebyron The problem with that approach as I see it is that the translation from GraphQL syntax to some number of parallel .resolve() calls to the backend is destructive-- it is impossible to deduce from a number of in-flight [parallel] calls to a backend what query was input which resulted in those calls being made (which is why I suggested making these optimizations in the front end [at the earliest possible place] "in the abstract-- for all possible backends").

@johanatan
Copy link
Contributor Author

@freiksenet Would that approach not introduce some inherent latency (while the proxy waits to see if more values of a particular entity are required in close proximity to others)? Otherwise one would need some sort of begin and end messages to demarcate the boundaries of a bulk request. [Actually I do think it would be possible to introduce these demarcations manually from the GraphQL server to the DB/proxy]. For a single get of a single value of an entity however, you would now have some increased latency compared to before (when just requesting the data directly without the bulk/query boundary demarcations).

@freiksenet
Copy link
Contributor

@johanatan You can sort-of ab(use) the fact that looping over the fields is synchronous, so when the proxy will get control all the queries from current set of children fields will already be in the 'queue', so no big latency problems.

@vladar
Copy link

vladar commented Aug 29, 2015

What graphql-js could do without adding too much app-specific semantics is to provide map in addition to resolve. So that when type is in "list" context it could resolve list of field values vs single value.

var Partner = new GraphQLObjectType({
  name: 'Partner',
  fields: {
    principal: {
      type: Principal,
      map: (partnerList) => {
        // Fetch all principals for list of partners in one request and map them appropriately
        // (when principal is null - just fill slot in mapped list with null)
      }
    }
  }
})

When in non-list context - you could still pass list of one item to map.

We plan to try this approach in https://github.com/webonyx/graphql-php some time soon

@Sandreu
Copy link

Sandreu commented Sep 7, 2015

In fact I don't think it should !
GraphQL doesn't know about entities relationship ! It's just a graph to define relations between object types... Your principal could be different in your partner's principal !
Imagine a principal is on vacation, you can have a temporary principal for your partner but your principal partners would be the same !

@johanatan
Copy link
Contributor Author

First of all, there is no "you" or "me" modeled in that snippet.

And the snippet captures the following:
1 - The entity Principal has a 1-to-many relationship to the entity Partner.
2 - The entity Partner has a 1-to-1 relationship to the entity Principal.

Thus for any Partner returned from the DB, there will be exactly one element in the partners list for the single Principal associated with it. I'm sure you can imagine how there would be other ways to construct cycles or partial cycles given that we are talking about a graph in the mathematical sense (and the language we are using is designed to query that graph structure).

I suppose it would be possible to diverge from this [obvious] interpretation of that type language snippet however (i.e., for the 1-to-1 side to not be reflected in the 1-to-many side) but it would be highly unorthodox and extremely confusing to any readers/consumers. In fact, this may point to a gap in the GraphQL specification itself-- namely, the lack of explicit handling of bi-directional cardinality specifiers-- leaving it to each individual author to do what is normal/expected/orthodox without enforcing it.

Regarding what is a temporary relationship or not-- I think it would be safe to assume that the underlying database is constant for the duration of a single GraphQL query request. So, your temporary principal at the beginning of the query execution is also your principal at the end of the query (conceptually of course). Otherwise there would seem to be inherent all sorts of race conditions (at such time as the subqueries are issued in parallel) and data integrity issues (regardless of parallelism).

@johanatan
Copy link
Contributor Author

Also, not to mention, GraphQL must be (and is) aware of the ID of each of the entities it is requesting [ID is the value passed into resolve callback from GraphQL when resolving references]. Thus for a given query, it need only request the entity instance for a given ID once.

[Notice the filter sections of the log output listed in the original bug report].

@Sandreu
Copy link

Sandreu commented Sep 7, 2015

In fact I'm not talking about your case details, I was just saying that in those kind of recursive fetches you may have different kind of resolve... In graph-ql there is a context dependency that you hadn't in REST. In REST when you ask a resource, you give its id and it gives you data, it's idempotent...
In GraphQL the two resolutions are not the same... for the first principal, the object will be solved in the query, with your id argument but the second may be solved in partner, with a partner context awareness... you may want to resolve it with different constraints or access level.

I'm just saying that I don't think GraphQL has to know about those things, and it's your execution context that matters. It's a new way of thinking, and I don't think this is a GraphQL's role to deal with this kind of optimizations, but your query executor.

For example MySQL handles a cache... If you do the same query twice, the second one will be immediate.

Relay do have this kind of considerations too... that's why it forces you to have global ids, And handles a store... I don't think that would be done in GraphQL...
That forces you to handle too complicated cases.... What about muti primary keys, what about ACL...

I think GraphQL is giving you a way to access resource into a given context, and not optimize the way you query/cache your data.

@johanatan
Copy link
Contributor Author

I think you're missing my second point: for two equal contexts, the same resolve method should return the same value.

Also, my case details are just an example-- try thinking about a graph as a static structure for the duration of a single query and I'm sure you can see how cycles are possible.

@Sandreu
Copy link

Sandreu commented Sep 7, 2015

I'm not saying it's not possible... Yes you are right, this kind of thing could happen... But what I'm saying, it's that keep a track of all resolve call with results and context, to deeply check context equality to avoid risks of calling twice the same resolve function... I'm not so sure it would be optimization...

Moreover, this kind of cycles are so easy to track with a query executor... And I'm pretty sure your DB already do! :)

Though, I think this thread is interesting because it clarifies borders of GQL, and points out why context is the big value GQL provides, besides graph queries...

@leebyron
Copy link
Contributor

leebyron commented Sep 9, 2015

@johanatan it sounds like you're talking about a memoization strategy on a per-type basis. This is a fine suggestion and I've seen it employed before. Just as a word of caution, it is a trade-off between computation and memory usage. When used generically across all resolver functions, in my experience this becomes a negative trade. Most resolver functions are very cheap and do not benefit from memoization. There's also the real concern within JavaScript of the objects and arrays often used being technically mutable. If you're wrapping over database queries, this may never bite you, but if you're wrapping over some ORM library you could see weird effects.

For resolver functions which truly are expensive - such as those which need to resolve to a database, I highly recommend using the memoization/caching layer of indirection you're talking about.

The reason GraphQL does not bake this in for you is because there are so many different techniques for memoization, each with their own tradeoffs. If you happen to stumble upon one which could be universally beneficial, I'd love to discuss the merits over a pull request.

@johanatan
Copy link
Contributor Author

Thanks, Lee. Good info.

Note that memorization wasn't the only (or original) suggestion here-- I still think that high-level semantic analysis of what's really being requested from the graph would provide ample opportunities for optimization (i.e., never calling 'resolve' in the first place when it is obvious from abstract analysis that it isn't required). It will be interesting to see if the project heads in that direction or not. If you would welcome a pull request on that angle, I would definitely be interested in giving it a try.

On Sep 8, 2015, at 8:12 PM, Lee Byron notifications@github.com wrote:

@johanatan it sounds like you're talking about a memoization strategy on a per-type basis. This is a fine suggestion and I've seen it employed before. Just as a word of caution, it is a trade-off between computation and memory usage. When used generically across all resolver functions, in my experience this becomes a negative trade. Most resolver functions are very cheap and do not benefit from memoization. There's also the real concern within JavaScript of the objects and arrays often used being technically mutable. If you're wrapping over database queries, this may never bite you, but if you're wrapping over some ORM library you could see weird effects.

For resolver functions which truly are expensive - such as those which need to resolve to a database, I highly recommend using the memoization/caching layer of indirection you're talking about.

The reason GraphQL does not bake this in for you is because there are so many different techniques for memoization, each with their own tradeoffs. If you happen to stumble upon one which could be universally beneficial, I'd love to discuss the merits over a pull request.


Reply to this email directly or view it on GitHub.

@leebyron
Copy link
Contributor

leebyron commented Sep 9, 2015

It's definitely easier to discuss these ideas more concretely over code. I'm not sure what form this kind of analysis might take, but I'm interested to see your idea take shape

@leebyron
Copy link
Contributor

@johanatan you may be interested in https://github.com/facebook/dataloader which was released today. It's intended to make your GraphQL type definitions easier to write while solving the primary issue you brought up here.

@johanatan
Copy link
Contributor Author

Ahh, very nice! Thx!

@gajus
Copy link

gajus commented Sep 10, 2016

On the subject, I have written an article about the use of DataLoader to batch GraphQL requests.

http://gajus.com/blog/9/using-dataloader-to-batch-requests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants