DataLoader dispatches together keys from different requests

At when using what I think is a standard setup (using  `graphql-spring-boot` with `DataLoaderDispatcherInstrumentation` and `DataLoaderRegistry` singleton beans) when two (http) requests from different callers request the same data type by the same key (i.e. use the same DataLoader) all keys are enqueued and dispatched together: `BatchLoader.load(List<K> keys)` is called with keys merged from both request. 
I have not used the [facebook node implementation](https://github.com/facebook/dataloader) but from what I understand, their DataLoaders are created per-request, so this merging doesn't happen.
While this behavior may be desirable in some cases it comes with some drawbacks:
- issues with keys on one request affect the other request and this not very deterministic (unless you backing service is smart enough to return per-key errors)
- if one request loads 1 key and another one loads 1K keys, both will have the latency of loading 1001 requests, and again, this is not very deterministic.
- if you are propagating authentication and your backing service only takes a global authentication principal (ie: an authorization header) you cannot send the requests together anyway, you need to split by requestor (or execution id) (you could live with this if you backing service took in a per-key principal but that would be pretty ugly i think)

i wonder:
- is this behavior intentional?
- is this a problem with the way I have it set up?
- would you be open for a PR that enables devs choose to merge or not to merge keys?

**if this is an issue with my setup then you can skip the rest, otherwise:** 

these are the options i'm considering at the moment:
- wrapping the `BatchLoader.load(...)` method with one that splits by execution id, this solves some interference issues but it still makes all concurrent requests wait until everyone else's data is available.
- subclassing `DataLoader` to implement something like [`sliceIntoBatchesOfBatches`](https://github.com/graphql-java/java-dataloader/blob/master/src/main/java/org/dataloader/DataLoader.java#L276) but doing it by execution id. this could work but it has two issues:
  - most of the things i would need to change in the `DataLoader` class are private so it would involve either copying code or gaining access by reflection :S 
  - this is fine for the  [`BatchLoader.dispatch()`](https://github.com/graphql-java/java-dataloader/blob/master/src/main/java/org/dataloader/DataLoader.java#L241) method because it doesn't wait for the overall result, but the [`dispatchAndJoin()`](https://github.com/graphql-java/java-dataloader/blob/master/src/main/java/org/dataloader/DataLoader.java#L360) would still wait for every request to finish. i don't mind because I don't use it and the instrumentation only ends up calling `dispatch()`
  - while this approach won't make callers wait, it would still sometimes dispatch "early" some keys of other requests maybe even before they are completely enqueued, resulting occasionally in more requests in a non-deterministic way) 
- another option i considered is to make the `DataLoader` a per-request object to make `DataLoader`s entirely isolated, this isn't easy though, I would need to provide means for `DataFetcher` to access the right `DataLoader` for given request, with some effort, I could keep a map by execution id but is not easy to manage it's life-cycle (I fear i would end up with leaked instances).

this is what i would like:
**option 1**
 - `DataLoader.dispatch()` and `DataLoader.dispatchAndJoin()` and `DataLoaderRegistry.dispatchAll()` should take an executionId as a parameter. Depending on a data loader option either all requests are dispatched or only requests for that execution id are dispatched. The `DataLoader.load(K key)` method would also need to take in an execution id (or a `DataFetchingEnvironment`)
- `DataLoaderDispatcherInstrumentation.dispatch()` passes the execution id to `DataLoaderRegistry.dispatchAll()`
- `DataLoaderDispatcherInstrumentation.beginExecution(instrumentationParameters).onEnd(...)` calls a new method `DataLoaderRegistry.discardAll(ExecutionId)` (that calls a new `DataLoader.discard(ExecutionId)` method) to make sure appropriate clean is on in case of errors/abortion.
- would that enough cleanup or is there any case in which keys may have been queued but beginExecution.onEnd is not called?

**option 2**
similarly but without changing the DataLoader make DataLoaderRegistry be aware of executions and keep a map of executionid -> DataLoaders (it would need to be built with DataLoader suppliers instead of DataLoaders directly (with this apporach only the DataLoaderRegistry.dispatchAll()` method needs to be modified to take in the execution id. in this case the DataLoaderRegistry would need to expose a means to retrieve the DataLoader for a specific execution for DataFetchers to use.

**option 3**
same thing but managed by the instrumentation, changing the `DataLoaderDispatcherInstrumentation` to take `DataLoaderRegistry` supplier instead of a `DataLoaderRegistry` this supplier or the instrumentation would to expose a method to return the DataLoaderRegistry associated with an execution id so that DataFetchers can get the right one. 
 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DataLoader dispatches together keys from different requests #21

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DataLoader dispatches together keys from different requests #21

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions