You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
library("listenv")
library("future")
LONG_TIME<-60LARGE_NUMBER<-1e6create_large_data<-function(ii) {
Sys.sleep(runif(1L, max=LONG_TIME)) ## Slow process
rnorm(runif(1L)*LARGE_NUMBER) ## Large data
}
x<- listenv()
for (iiin1:10) {
cat(sprintf("Future #%d\n", ii))
x[[ii]] %<=% { create_large_data(ii) }
}
If the above futures are processed on a cluster it may take some time to retrieve each of the future values because the values are large and they need to be serialized in order to be transfer back to the main processes. This may take different amount of time for different futures.
Now, if we use
y<- as.list(x)
to resolve and collect the values, we basically do so sequentially. In other words, x[[2]] won't be called until x[[1]] is completed. Now, if future x[[2]] is already resolved but x[[1]] takes a long time to be evaluated, our main process is forces to be idle until x[[1]] is resolved.
It would be better to be able to start retrieving the value of future x[[2]] in the meanwhile. In order to do this, we need to query the futures to check whether they're are resolved or not and only start retrieving values for futures that are resolved. Something like:
resolve<-function(...) UseMethod("resolve")
resolve.listenv<-function(x, ..., sleep=1.0) {
fs<- futureOf(envir=x, drop=TRUE)
resolved<-logical(length(fs))
while (!all(resolved)) {
for (iiin which(!resolved)) {
if (!resolved(fs[[ii]])) next## Retrieve value (allow for errors)
tryCatch({ value(fs[[ii]]) }, error=function(ex) {})
resolved[ii] <-TRUE
} # for (ii ...)## Wait a bit before checking againif (!all(resolved)) Sys.sleep(sleep)
} # while (...)## Touch every element to trigger removal of internal future variablefor (iiin seq_along(x)) force(x[[ii]])
x
} ## resolve() for listenv
which we then can use as:
x<- resolve(x)
x<- as.list(x)
The text was updated successfully, but these errors were encountered:
HenrikBengtsson
changed the title
https://github.com/HenrikBengtsson/listenv/issues
WISH: Add resolve() for efficiently resolving and retrieving values asynchronously
Dec 20, 2015
Consider a large number of futures like:
If the above futures are processed on a cluster it may take some time to retrieve each of the future values because the values are large and they need to be serialized in order to be transfer back to the main processes. This may take different amount of time for different futures.
Now, if we use
to resolve and collect the values, we basically do so sequentially. In other words,
x[[2]]
won't be called untilx[[1]]
is completed. Now, if futurex[[2]]
is already resolved butx[[1]]
takes a long time to be evaluated, our main process is forces to be idle untilx[[1]]
is resolved.It would be better to be able to start retrieving the value of future
x[[2]]
in the meanwhile. In order to do this, we need to query the futures to check whether they're are resolved or not and only start retrieving values for futures that are resolved. Something like:which we then can use as:
The text was updated successfully, but these errors were encountered: