Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java port #26

Closed
xak2000 opened this issue Apr 26, 2016 · 20 comments
Closed

Java port #26

xak2000 opened this issue Apr 26, 2016 · 20 comments

Comments

@xak2000
Copy link

xak2000 commented Apr 26, 2016

I'm very impressed by DataLoader and trying to port the DataLoader into Java world but I got stuck with this.

If I understand correctly, the key principle of DataLoader is JavaScript eventloop. It can automatically determine when to run batch function just by adding it's execution on next tick.

But in Java there are no evenloop or something similar. The loader code just can't know when to run this batching function.

I tried to do something with CompletableFuture but with no luck. It doable but only with manual execution of batching function of all loaders which were used to load something.

Something like this:

DataLoader userLoader = new DataLoader((keys) -> { //return values; });
CompletableFuture<String> result = CompletableFuture.supplyAsync(() -> {
  List<User> users = new ArrayList<>();
  userLoader.load(1).thenApply(user -> { users.add(user); });
  userLoader.load(2).thenApply(user -> { users.add(user); });
  return users;
}).thenApply(users -> {
  userLoader.batchLoad(); // we need to run this new function to process the queue
  return users;
});

I just want to ask for suggestions. I know this library is port of Haxl library for Haskell, but I know nothing about haskell. It has an eventloop too? Or maybe there's some other way to implement this that I can't see?

If this repo is not suitable for this kind of questions (I understand it's Java and not JS) then I apologize. I'm just hoping that maybe someone has already tried something like this or suddenly the DataLoader's author knows java. :)

@aschrijver
Copy link

Hi @xak2000 did you go any further with this?

For Java you can take a look at Vert.x from Eclipse at http://vertx.io/
It uses event loops, is light-weight, easy to use once you get familiar with the asynchronous programming model, and also it is polyglot, so you can mix-and-match code from a bunch of JVM-based languages, including Javascript, in a single application.

@xak2000
Copy link
Author

xak2000 commented Jul 4, 2016

Hi, @aschrijver. Thanks for your involvement!

Yes, I know about Vert.x, but Vert.x is a "container" for your applications (verticles). It is very good project, no doubt. But I wanted to implement just pure Java solution, without dependencies. It is straightforward to implement DataLoader in Vert.x (because of async nature of Vert.x and it's EventLoop), but then it will be usable only inside Vert.x application.

As for pure java implementation.. Now I don't have time to do futher investigations, but I think it is just undoable, sadly. :( Because there are no eventloop, we just can't know when to run a batch load function for all scheduled loaders. There are two options as I see:

  1. Make contract to call some dataloader.runLoaders() function from user code from place where all loaders already scheduled (for example: just before return request to the browser).

  2. Run all batchloaders automatically 10ms after no new loader.load function call. I.e.:

    • loader.load call will schedule batch load function to run after 10ms from this call.
    • If another loader.load call will be called in this 10ms timeframe, then cancel scheduled batch load function and schedule it again after 10ms from now.

    This way we can emulate current DataLoader functionality, but at the cost of increased (by 10ms) response time.

I don't really like any of this solutions because they lack of gracefulness of original idea of DataLoader - non-intrusive automatic batch loading.

Maybe combine this two solutions into one (shedule batch function to run automatically, but allow to run it manually if user knows when) can give a more or less acceptable result...

@aschrijver
Copy link

Vert.x does not have to be the container for your entire app. You can also run it embedded without exposing to clients. You only ship the transitive dependency on vertx-core in that case.

@xak2000
Copy link
Author

xak2000 commented Jul 5, 2016

Yes, you can run Vert.x embedded, but it is still a container (embedded into your application, but still a container), and DataLoader implementation based on Vert.x eventloop would be usable only from code runned from this embedded container, not the rest of the application.

So embedded it or not it doesn't matter. You still can't write pure-java solution which can be used in any java app as a library (not as a framework component).

@antmdvs antmdvs mentioned this issue Aug 15, 2016
@leebyron
Copy link
Contributor

Gave some advice on #30 that's relevant here

@aschrijver
Copy link

aschrijver commented Aug 19, 2016

Hi @xak2000 , @antmdvs
I finally got some time to look more closely at dataloader, and I now see your issue. Vert.x also does not address this problem (though there may be ways to tweak the EventBus).

But your options above, while less elegant than auto batching, still make it a useful utility I think. Additional options could be to have a maximum batch size after which stuff is dispatched automatically, regular intervals (in Vertx by setting a timer), or by a custom strategy an implementer plugs in.

I am writing a remote service proxy for a GraphQL schema, and intend to write a vertx-dataloader that will batch individual data fetching requests involved in an incoming query (on the client) and send a single event bus message (Json) to backend GraphQL service implementation.
The data loader would be invoked after the query has been processed at the client proxy side, by explicitly calling a dataloader.dispatch() which will hydrate the various data fetcher futures with values.
Implementation will first use CompositeFuture (equivalent to Promise.all), then later probably vertx's implementation of CompletableFuture.

In many use cases the dispatch call can be automatically triggered, e.g. by setting it in Vert.x. on the endHandler of a HTTP request.

Another interesting use I am thinking of is to implement the data loader with an asynchronous, cluster-wide map implementation (an AsyncMap in vertx), so I get the caching in load-balanced nodes.

@aschrijver
Copy link

BTW, what is funny to mention is that while data loader with the tick concept in NodeJS does not impact its asynchronous behaviour (correct?), in Vert.x it limits it slightly by being more of a delayed execution thing.

@leebyron
Copy link
Contributor

I would caution against using the delay to pick up the requests in a batch. I've seen that approach used before and it can introduce real latency into your system.

For JS environments that don't support the Promise queue, this timer approach is used, but the delay used is "0ms" which in the case of JS schedules for the immediate next tick of the event loop.

@aschrijver
Copy link

aschrijver commented Aug 19, 2016

Thanks, this is something I will mention in the docs, just like you did with HTTP request scope, etc.
In general one should take care not to make blocking calls when there is stuff in the queue, and not put too much logic between batches :)

I will have code up in a couple of hours at https://github.com/engagingspaces/vertx-dataloader

BTW many bunches of thanks for creating the test coverage. It is boring to translate to Java, but helps real well. I will mention you in acknowledgements section of README, if you don't mind

@aschrijver
Copy link

Code is there now.

@aschrijver
Copy link

aschrijver commented Aug 20, 2016

@leebyron I documented the change in the Java implementation very clearly (I hope) in vertx-dataloader/manual-dispatching

I think the change doesn't diminish the power of the utility in any way. On the contrary it does give additional and fine-grained control of batching and dispatching logic.

@xak2000
Copy link
Author

xak2000 commented Aug 21, 2016

Hi @aschrijver! Very glad to see you wrote some code! I'm opened this task, but I'm very busy now and can't give much effort to this task.

One question from a quick glance to the code: does it support many-step-loading?
What I mean:

  1. queuing load of some users by their ids.
  2. only after promises of each user load will be resolved we can queue loading of posts of each loaded user.

The problem is that dispatch will be called before any user promise will be resolved, so before any post loading will be queued. This is by design. If we don't call the dispatch - user promises will never be resolved. But! After batch loading of users will be completed (and dispatch will exit), resolved user promises will queue posts loading tasks. But who will do second dispatch call after this?

Ok I think now the example is slightly incorrect. Post should be loaded from another instance of DataLoader. But consider these are not posts, but firends of each user (so they are instances of user model).

And even if we will use second DataLoader for posts, how can we know the order of dispatch calls which we need to follow for correctly enqueue and dequeue all load calls?

Maybe I understand something wrong, but when I considered the possibility to implement DataLoader in java, I thinked about all these problems and can't find any reasonable solution without some eventloop-like system, when container (nodejs for example) will unqueue and dispatch all queued jobs in right order and any number of times for you.


Edit
Ok I think now we can load posts or friends in the first iteration using something like loadFriendsByUserId(userId). But if then we need to load friends of each friend (for example) - the problem persists.

Something like this in GraphQL, if i wrote this correctly:

users {
  friends {
    friends
  }
}

@aschrijver
Copy link

Hmm, yours is an interesting case to investigate further.

In general with manual dispatching being the exception the implementation should be exactly identical to its counterpart in NodeJS. I still haven't ported all tests, so you have to take me on my word here.

Just like the manual dispatching makes it necessary to invoke dispatch() at an appropriate location, you are also responsible creating load requests at the appropriate moments in the lifecycle of a batch operation.

In future I will add features that will flush the queue after a dispatchTimeout, or on regular timer intervals, depending on additional options you provide.

When you call dispatch() individual Futures will start completing / resolving already. The fetch immediately returns with a CompositeFuture of the aggregated result (similar to Promise.all). The composite is not complete yet, however. You will have to set a handler on it and check for success or failure before touching the list of constituent results (in Vert.x they are still Futures with the value <V> in future.result() if future.succeeded().

On your last point I documented the benefits you still have even with manual fetching (and some additional ones) in the differences to reference implementation section as well.

@aschrijver
Copy link

If you have the time later it would be nice if you created a PoC project or a Gist to test-drive your use case!

@aschrijver
Copy link

aschrijver commented Aug 21, 2016

Forgot to mention that the preparation stage of the data loader lifecycle (i.e. adding load requests to the batch queue, from loader instantiation to just before dispatch) is usually very fast as long as no code blocks the event loop!

In Vert.x every invocation is asynchronous by default, and if it is not it should be assigned to a worker verticle that runs on a thread pool (regular verticles always run on the same thread so you can regard them as single-threaded and reap the rewards of that simplicity, but you probably know this as its the main selling point of Vert.x, besides being clustered and polyglot, and modular, and... much more goodness, that is 😄 ).

@aschrijver
Copy link

I have created engagingspaces/vertx-dataloader#3 on the Vert.x DataLoader project for this use case.

@aschrijver
Copy link

I added a nice diagram to README.md that shows the concepts of the direction the Vert.x DataLoader is going to (not quite there yet):

vertx-dataloader-concepts

@leebyron
Copy link
Contributor

Closing this issue since there's now a project and discussion to take this further.

@xak2000
Copy link
Author

xak2000 commented Jan 28, 2018

For anyone interested: Pure Java 8 implementation exists now.
It still requires the manual dispatching for obvious reasons (discussed in this topic). But graphql-java natively supports it through DataLoaderDispatcherInstrumentation.

@aschrijver
Copy link

yes! that version was derived by @bbakerman from my vert.x implementation and has all vert.x dependencies removed :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants