Skip to content

DataLoader dispatches together keys from different requests #21

@edacostacambioupgrade

Description

@edacostacambioupgrade

At when using what I think is a standard setup (using graphql-spring-boot with DataLoaderDispatcherInstrumentation and DataLoaderRegistry singleton beans) when two (http) requests from different callers request the same data type by the same key (i.e. use the same DataLoader) all keys are enqueued and dispatched together: BatchLoader.load(List<K> keys) is called with keys merged from both request.
I have not used the facebook node implementation but from what I understand, their DataLoaders are created per-request, so this merging doesn't happen.
While this behavior may be desirable in some cases it comes with some drawbacks:

  • issues with keys on one request affect the other request and this not very deterministic (unless you backing service is smart enough to return per-key errors)
  • if one request loads 1 key and another one loads 1K keys, both will have the latency of loading 1001 requests, and again, this is not very deterministic.
  • if you are propagating authentication and your backing service only takes a global authentication principal (ie: an authorization header) you cannot send the requests together anyway, you need to split by requestor (or execution id) (you could live with this if you backing service took in a per-key principal but that would be pretty ugly i think)

i wonder:

  • is this behavior intentional?
  • is this a problem with the way I have it set up?
  • would you be open for a PR that enables devs choose to merge or not to merge keys?

if this is an issue with my setup then you can skip the rest, otherwise:

these are the options i'm considering at the moment:

  • wrapping the BatchLoader.load(...) method with one that splits by execution id, this solves some interference issues but it still makes all concurrent requests wait until everyone else's data is available.
  • subclassing DataLoader to implement something like sliceIntoBatchesOfBatches but doing it by execution id. this could work but it has two issues:
    • most of the things i would need to change in the DataLoader class are private so it would involve either copying code or gaining access by reflection :S
    • this is fine for the BatchLoader.dispatch() method because it doesn't wait for the overall result, but the dispatchAndJoin() would still wait for every request to finish. i don't mind because I don't use it and the instrumentation only ends up calling dispatch()
    • while this approach won't make callers wait, it would still sometimes dispatch "early" some keys of other requests maybe even before they are completely enqueued, resulting occasionally in more requests in a non-deterministic way)
  • another option i considered is to make the DataLoader a per-request object to make DataLoaders entirely isolated, this isn't easy though, I would need to provide means for DataFetcher to access the right DataLoader for given request, with some effort, I could keep a map by execution id but is not easy to manage it's life-cycle (I fear i would end up with leaked instances).

this is what i would like:
option 1

  • DataLoader.dispatch() and DataLoader.dispatchAndJoin() and DataLoaderRegistry.dispatchAll() should take an executionId as a parameter. Depending on a data loader option either all requests are dispatched or only requests for that execution id are dispatched. The DataLoader.load(K key) method would also need to take in an execution id (or a DataFetchingEnvironment)
  • DataLoaderDispatcherInstrumentation.dispatch() passes the execution id to DataLoaderRegistry.dispatchAll()
  • DataLoaderDispatcherInstrumentation.beginExecution(instrumentationParameters).onEnd(...) calls a new method DataLoaderRegistry.discardAll(ExecutionId) (that calls a new DataLoader.discard(ExecutionId) method) to make sure appropriate clean is on in case of errors/abortion.
  • would that enough cleanup or is there any case in which keys may have been queued but beginExecution.onEnd is not called?

option 2
similarly but without changing the DataLoader make DataLoaderRegistry be aware of executions and keep a map of executionid -> DataLoaders (it would need to be built with DataLoader suppliers instead of DataLoaders directly (with this apporach only the DataLoaderRegistry.dispatchAll()` method needs to be modified to take in the execution id. in this case the DataLoaderRegistry would need to expose a means to retrieve the DataLoader for a specific execution for DataFetchers to use.

option 3
same thing but managed by the instrumentation, changing the DataLoaderDispatcherInstrumentation to take DataLoaderRegistry supplier instead of a DataLoaderRegistry this supplier or the instrumentation would to expose a method to return the DataLoaderRegistry associated with an execution id so that DataFetchers can get the right one.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions