Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Note about HTTP/1.1 roundtrip reduction via array-based auto-batching #5

Open
sungam3r opened this issue Oct 10, 2019 · 18 comments
Open

Comments

@sungam3r
Copy link
Contributor

Is it worth mentioning batch requests? In this case, an array of objects is transmitted via http.

@benjie
Copy link
Member

benjie commented Nov 7, 2019

I wondered this too; but I think batch requests will be superseded by HTTP2 so I'm not sure how much value there is in it. That said, it's very straightforward and I think quite widely used, so it's probably worth mentioning - perhaps in an addendum?

@sungam3r
Copy link
Contributor Author

sungam3r commented Nov 7, 2019

batch requests will be superseded by HTTP2

Maybe yes maybe no. In any case, whole years may pass until this comes. But batching is already working. And I must say that his support does not carry any burden.

@michaelstaib
Copy link
Member

@benjie

be superseded by HTTP2 so I'm not sure how

Not really, we have implemented batching with the export directive where you can export results from the a preceding query as variables in the next.

And I think the PayPal guys were looking or discussing to use that for transactions. You can do with this nice mutation flows.

POST /graphql?batchOperations=[NewsFeed, StoryComments]

query NewsFeed {
  stories {
    id @export(as: "ids")
    actor
    message
  }
}

query StoryComments {
  stories(ids: $ids) {
    comments(first: 2) {
      actor
      message
    }
  }
}

We support two ways to batch ... operation batching like the above where you send in a document and specify the operations as a list in the url to specify which operation you want to batch...

And we also support the Apollo way which basically is a request batching. Batching for the sake to safe requests might not be necessary with HTTP/2 but combined with @export I see value in it for especially mutations.

@benjie
Copy link
Member

benjie commented Nov 10, 2019

@michaelstaib Sure; I was referring to straight-forward vanilla batching rather than with dependencies and chaining. Is there a full spec for the approach you raised? It seems you’d need a runtime on top of GraphQL to validate the types of the variables/exports before running any of the batch? That’s quite a but more complex than the simple array-based parallel batching that Apollo Server, PostGraphile and the like support.

@michaelstaib
Copy link
Member

michaelstaib commented Nov 10, 2019

@benjie

Hot Chocolate and Sangria have implemented it this way. You actually do not need a lot extra. Directives in Hot Chocolate can be associated with field middleware, so the execution is quite simple. Validation wise you can also do static analysis to checkout if outputs are compatible with the inputs. We basically infer the variables.

Oleg and I both inferred the specification from a talk that lee gave :)

There are some discussions also with the GraphQL-Java guys about this on the spec repo. Also there are some discussions on this with Ivan, the PayPal guys and so on

I really should put that all together to a spec document. I think I will write a little blog summing all up and then we have something with all the information in one document.

https://hotchocolate.io/docs/batching

On our docs is the talk from @leebyron. They introduced this back then to basically solve something that you actually better solve with @defer. But, as I said, the superpower of @export and batching lies more in mutations in my opinion.

@sungam3r
Copy link
Contributor Author

Just a note - I was talking about array-based batching approach.

@benjie
Copy link
Member

benjie commented Nov 13, 2019

These seem to be radically different topics that share some similarities.

With auto-batching the requests are captured and batched together as a purely transport-level optimisation, to reduce roundtrips. They introduce no new validation, GraphQL re-writing, or any other complexities. The operations typically run independently in parallel on the server, and the results are returned as an array in the same order to the client. They have trade-offs, and HTTP/2 (or using a websocket transport) solves the problem better.

With chained batches, the queries in the batch have to be prepared in advance so that they can be issued together, otherwise later operations may not validate as they rely on variables that are dynamically declared from previous operations' @export directives. They require a runtime to re-write the queries based on the @export directives, and must be ran serially (though there may be some parallelism depending on the variable dependencies) so the variables can be updated. The operations themselves are not valid GraphQL operations until the transforms have been applied, so many GraphQL validation/linting tools would reject them.

(By the way, I think the validation issues/rewrite requirements of these chained batches could be solved by changing the syntax for query StoryComments { to query StoryComments($ids: Int[] @import(from: "ids")) {.)

@sungam3r I suggest you rename this issue to "HTTP/1.1 roundtrip reduction via array-based auto-batching".

@michaelstaib I suggest you raise a new issue about the approach you mention, which I think might be better described as something like "chained operation batches".

@michaelstaib
Copy link
Member

@benjie you are right and we should have multiple issues describing the different approaches.

@sungam3r
Copy link
Contributor Author

@benjie I completely agree with your every word as if I wrote it myself.

@sungam3r sungam3r changed the title Note about batching Note about HTTP/1.1 roundtrip reduction via array-based auto-batching Nov 13, 2019
@sungam3r
Copy link
Contributor Author

The operations typically run independently in parallel on the server

Only one small remark - queries in the batch array may contain mutations, so it is unsafe to execute them in parallel. GraphQL-dotnet executes batched requests sequentially.

@benjie
Copy link
Member

benjie commented Nov 13, 2019

Interesting. I only use query auto-batching as a transport layer optimisation; so if, without batching, the application would have issued 4 operations at the same time over separate parallel HTTP requests (which would have ran independently and in parallel) then I treat it as acceptable to run those same operations independently and in parallel when received in a batch by the server. Making it such that mutations run in series does sound like a good idea though, even though it would make query batching even slower than it is already (taking it from as slow as the slowest operation, to as slow as the slowest chain of operations).

@michaelstaib
Copy link
Member

michaelstaib commented Nov 14, 2019

@benjie

Just one question here, do you not write the result to the stream as they arrive.

We have optimized batching to write to the response stream as soon as possible. Our .NET client Strawberry Shake will then start reading the results from the stream as they arrive and make them immediately available.

So, we do not wait for the full array to be finished to be written to the stream. This means that the time to the first response is quite quick.

@michaelstaib
Copy link
Member

@sungam3r

I think ordered execution for mutations is exactly what operation batching does. With that it is more explicit that you need an order.

The Apollo batching that is proposed here was about being able to merge multiple requests from a client into one in order to optimize transport. This is done by Apollo for instance under the hood.

@michaelstaib
Copy link
Member

Just one more general thing about performance here, when we do batching we preserve the same DataLoader for all merged requests. Meaning resolved entities are consistent and more quickly resolved for all responses of the batch.

@sungam3r
Copy link
Contributor Author

I do not put an equal sign between the client’s desire to make a batch request and execute all the queries in it in parallel. You can consider a batch request as a way to optimize data transmission over slow transport. Parallelization of the execution of individual elements in the transmitted array of requests is a further optimization, which in the general case cannot be performed without the explicit indication of the client. Similarly, with the question of returning the results as quickly as possible - the order of responses may be important to the client. Imagine that he performs a series of 10 mutations in a batch request over a client’s bank account. You can’t just take it and start executing individual queries in batch in parallel. It is necessary to somehow specify the way the client indicates how the server should perform its batch request. I think that for security reasons, by default you need to execute batch queries strictly sequentially and return the results in order.

I think ordered execution for mutations is exactly what operation batching does. With that it is more explicit that you need an order.

These two methods are orgogonal to each other. Sequential execution considerations apply to both.

@benjie
Copy link
Member

benjie commented Nov 14, 2019

@michaelstaib That’s an optimisation I’ve not implemented yet; AFAIK Apollo Client can’t make use of that optimisation? apollographql/apollo-server#396

@michaelstaib
Copy link
Member

michaelstaib commented Nov 14, 2019

@sungam3r

... may be important to the client. Imagine that he performs a series of 10 mutations in a batch request over a client’s bank account. You can’t just take it and start executing individual queries in batch in parallel. It is necessary to somehow specify the way the client indicates how the server should perform its batch request. I think that for security reasons, by default you need to execute batch queries strictly sequentially and return the results in order.

I get that, that is why we support @export and operation batching. But with vanilla batching why do you not add all of that in one query request. The execution algorithms for mutation does exactly guarantee that the mutations are executed serially. There is no need to use batching for that in this case. The requests that are batched together in this case are independent since they cannot share anything.

Just as an example:

mutation {
  a 
  b
  c
}

The above mutation would execute a,b,c in the exact order. So why batch the request in this case?

The vanially batching for HTTP 1.1 is about reducing requests.

The batching that I referred to earlier that Hot Chocolate and Sangria support is about those sequences.

mutation a {
  a @export(as: "transactionId")
}

mutation b {
  b(transactionId: $transactionId)
}

In that case sequences make sense since in the case of paypal you might have a transactionId that is created by the first mutation and then we use that in the next mutation and so on.

There is also a way to still tune the HTTP 1.1 batching it that guarantees order and parallelization. If you for instance merge the incoming batches into one query request you could still guarantee execution order in the case of mutations and you could parallelize in the case of queries.

basically two requests:

mutation a {
  a 
}

mutation b {
  b 
}

would become:

mutation ab {
  request_1: a 
  request_2: a 
}

I think I read a blog about Apollo server and that they do something like that. I could be mistaken.

I just see that here as an exchange of ideas. I do not mean to be condescending or making the case that you're solutions or reasons for your solutions are bad in any way. Just want to make that clear. I really love discussion these topics with other implementors like @IvanGoncharov or @benjie or now @sungam3r.

@michaelstaib
Copy link
Member

@benjie we support that in our .NET Client and we also build some fetcher logic for relay we are not that invested in Apollo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants