Possible to implement asynchronous (i.e. non-blocking io) in curl? #51

zachmayer · 2015-12-17T17:29:23Z

This is not an issue, more of a question that I wanted to pose to the community. If there is already a method for achieving this, I apologize for the repeat and would appreciate being pointed in the right direction.

I use httr extensively and often find myself in situations where the needed aggregation of data requires hundreds of thousands of REST calls. This becomes performance limiting in R because it seems like every http call made using httr is a blocking call.

Is there any plan or path forward for enabling asynchronous io within curl similar to what exists in Python via aiohttp or Scala via Akka-http?

zachmayer · 2015-12-17T17:29:52Z

And a follow up:
Here's an example from Rcurl:

getURIs =
function(uris, ..., multiHandle = getCurlMultiHandle(), .perform = TRUE)
{
  content = list()
  curls = list()

  for(i in uris) {
    curl = getCurlHandle()
    content[[i]] = basicTextGatherer()
    opts = curlOptions(URL = i, writefunction = content[[i]]$update, ...)    
    curlSetOpt(.opts = opts, curl = curl)
    multiHandle = push(multiHandle, curl)
  }

  if(.perform) {
     complete(multiHandle)
     lapply(content, function(x) x$value())
   } else {
     return(list(multiHandle = multiHandle, content = content))
   }
}

There is also getURIAsynchronous

Here's an example use case where this would be helpful. I want to submit 1,000 requests to the server, and each request takes 10 minutes to process (the server has to lookup some data and do some math that takes a long time). However, the sever can handle many thousands of simultaneous requests.

Currently, I'm looping through something like this:

requests <- lapply(urls, POST, ...)

Which blocks on each request and takes 1,000 x 10 minutes to complete. It'd be really nice to be able to send each request off to the server, without blocking on the request being completed. Then after they have all been submitted, we can block on collecting the results with a loop like this:

requests <- lapply(urls, POST, ..., async=TRUE)
results  <- lapply(results, httr::complete)

Where httr::complete would be similar to RCurl::complete. The second example is nice because we can submit all the requests at once and let the server start processing them before blocking on gathering the results. In theory, this loop would take ~10 minutes to complete, plus the overhead of the 2 lapply loops.

jeroen · 2015-12-18T13:32:53Z

I think the natural R solution might be a non-blocking connection object. Although R is limited to 128 connections which is kind of lame.

zachmayer · 2015-12-18T16:27:13Z

Would that be a feature request for R core? If so, how would I submit it to R core?

jeroen · 2016-06-13T14:41:13Z

There is now experimental support for async requests in the dev version of curl:

install.packages("https://github.com/jeroenooms/curl/archive/master.tar.gz", repos = NULL)
library(curl)
?multi

Let me know if you have any feedback.

Here is an example: https://github.com/jeroenooms/curl/blob/master/examples/crawler.R

zachmayer · 2016-06-13T15:13:23Z

Wahoo!!!

jeroen · 2016-09-16T18:13:35Z

curl 2.0 which includes the async stuff should be on CRAN this week.

jeroen closed this as completed Jun 13, 2016

zachmayer mentioned this issue Jun 13, 2016

Possible to implement asynchronous (i.e. non-blocking io) in httr? r-lib/httr#271

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible to implement asynchronous (i.e. non-blocking io) in curl? #51

Possible to implement asynchronous (i.e. non-blocking io) in curl? #51

zachmayer commented Dec 17, 2015

zachmayer commented Dec 17, 2015

jeroen commented Dec 18, 2015

zachmayer commented Dec 18, 2015

jeroen commented Jun 13, 2016 •

edited

Loading

zachmayer commented Jun 13, 2016

jeroen commented Sep 16, 2016

Possible to implement asynchronous (i.e. non-blocking io) in curl? #51

Possible to implement asynchronous (i.e. non-blocking io) in curl? #51

Comments

zachmayer commented Dec 17, 2015

zachmayer commented Dec 17, 2015

jeroen commented Dec 18, 2015

zachmayer commented Dec 18, 2015

jeroen commented Jun 13, 2016 • edited Loading

zachmayer commented Jun 13, 2016

jeroen commented Sep 16, 2016

jeroen commented Jun 13, 2016 •

edited

Loading