Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
Already on GitHub? Sign in to your account
multi_run could (should?) return as soon as some requests finish #104
Comments
|
The idea is to use the callback functions in The callback can exit multi-run by raising an error. library(curl)
done <- function(res){
print("Done with a request. Going to do stuff now.")
}
pool <- new_pool()
multi_add(new_handle(url = "https://httpbin.org/delay/10"), pool = pool, done=done)
multi_add(new_handle(url = "https://httpbin.org/delay/2"), pool = pool, done=done)
system.time(multi_run(pool = pool, timeout = 5))What are you trying to do that cannot be implemented this way? |
|
Because that complicates the processing considerably, because I just cannot orchestrate the callbacks for an arbitrary number of requests. while (TRUE) {
reqs <- multi_run(pool = pool)
process_just_finished_requests()
if (! reqs$pending) break;
}I actually need to use the callbacks, to save the results, there is no other way to get the results, AFAIK. And using the callbacks does not help with the time lost while waiting, anyway. |
|
Callbacks are just not a good way to get a nice control flow. Not even in JS, and they have great tools for them, like So ideally my callbacks would just put down the result at a pre-specified place, and the actual program logic would go into my main event loop. |
|
I don't quite understand. Which time is lost? You can just call So do you suggest we add a parameter to I personally think the javascript closure-callback design is more elegant and gives finer control than C style the poll-and-return pattern but I guess that's a matter of taste. |
|
I think you've been doing too much |
|
I have a bunch of functions that have this kind of logic: function() {
cat("one\n")
resp1 <- async_http("https://httpbin.org/get?q=one")
cat(resp1$url)
cat("two\n")
resp2 <- async_http("https://httpbin.org/get?q=two")
cat(resp1$url)
cat("three\n")
}I.e. they do some HTTP calls occasionally. I want them to do the HTTP asynchronously, and I want that one such function running, while the others are waiting for the HTTP result. (Unless all of them are waiting.) This is trivial to implement in JS, but impossible in R, so I am trying to transform these functions to some construct that is roughly equivalent. I think it is mostly doable, but dealing with the callbacks is a pain in the neck.
I could, but that's the complicated thing. My functions (like the one above) are essentially dynamic, which means they can have an arbitrary number of async calls. Which means arbitrary number of callbacks, each one different. And since I don't know how the functions will look, I would probably have to dynamically generate the code of the callbacks. Etc.
Yeah, having an
Yes, if the callbacks are not too deep, and if the structure of the design is fixed in the source code. But it is also no wonder that the JS world is abandoning the callbacks and is switching to promises. Callbacks do not scale. But my main problem is that the structure is not fixed, so callbacks are just not a good fit here. |
|
Anyway, I think adding the |
|
OK i'll try to add that. btw: as you were using this, do you feel the busy wait problems solved? I never implemented unit tests for this because I don't know how. |
|
I don't know, sorry. I checked a couple of times, for various cases manually, and I haven't seen it, but it would be indeed good to test. I'll think about how we could test it. |
jeroen
added a commit
that referenced
this issue
Jul 9, 2017
|
|
jeroen |
68f16cb
|
|
OK I added something. Is this what you have in mind? Do you like the parameter name? |
|
I think this will work. The name is a little bit misleading for me. Isn't |
|
OK maybe that's better indeed. |
|
Although that kind of implies it overrides |
|
Yeah, the timeout is just stronger. It has to be stronger. So yeah, it returns as soon as there are |
|
We could also call it |
|
|
|
Though arguments |
|
Yeah. And also, in general, arguments that can take both logical or integer are not a good idea.... |
|
typos are not a big problem, because it will turn out immediately... |
|
I would add the argument at the end, though. |
|
Because in the middle it breaks current usage if the arguments are not named: multi_run(x, y) |
|
I know but I like the |
|
On CRAN only But it is up to you. :) I think if you know that you want to keep sg as last argument, it is better to have
|
|
Then they have to explicitly name it. |
|
This has landed in 2.8.1 on cran. |
jeroen
closed this
Jul 22, 2017
|
Cool! Thanks much for adding this! I built a very cool async HTTP tool with curl, I'll show it the next time we talk. |
gaborcsardi commentedJul 9, 2017
The current implementation of
multi_runmakes it hard to program around it without wasting precious processing time. Here is a snippet that illustrates the problem:This outputs:
The quicker request finishes at around 2 seconds, but
multi_runstill waits around until the timeout period expires. I guess for some applications this makes sense, but I would argue that for most it is better to return as soon as at least one request is completed. (If multiple requests are available immediately, it is OK to process them, but waiting for other requests to complete is not optimal, I think.)Specifying
timeout = 0does not help, either, because that effectively means that my app has to spend some time doing nothing, as I need to implement a wait-poll loop myself; or busy-waiting, which is not good, either.Ideally, I want my app to get the control back, as soon as there is something to work on, i.e. as soon as (at least) one request has completed. This would be similar to how the standard Unix
pollsystem call works.I am not sure how you would want to implement this, in a way that the current semantics is still supported for backwards compatibility, but maybe another
multi_poll()function could help. This would use the same internals, but it would have the new,poll-like semantics.