New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

usage with multiple threads? #29

Open
davharris opened this Issue Oct 26, 2016 · 6 comments

Comments

Projects
None yet
6 participants
@davharris

davharris commented Oct 26, 2016

This looks like a great package. It's saving me and my collaborators a lot of unnecessary computation time.

I was wondering about how the package would perform if a memoized function were running in parallel on several threads, especially with caches stored on the filesystem. Given that the hashes are deterministic, it doesn't seem like there would be a problem, but I didn't see anything specifically about it in the documentation, so I thought it would be good to ask.

Thanks in advance!

@jimhester

This comment has been minimized.

Member

jimhester commented Oct 26, 2016

There is currently no support for this. In particular two processes could write to the same file simultaneously, producing a corrupted file. e.g. if both processes called a function with the same arguments at the same time. To avoid this you would need to use some sort of file locking, maybe with the flock package, although that package is not on CRAN, would need to be tested on windows and this would likely need its own cache, since file locking is only nessesary for multi-process code.

@davharris

This comment has been minimized.

davharris commented Oct 27, 2016

Thanks for the quick response!

@chochkov

This comment has been minimized.

chochkov commented Mar 29, 2017

A possible implementation could be to maintain cache per worker using the process ID as a prefix to the filename. This way repeated calls in the workers would be indeed sped up.

I mostly have a parLapply workflow, however, which means that there's nothing to be sped up in the workers. Therefore it already brings a lot to me if I wrap the whole computation in a memoiseed function, roughly so:

fn <- function() {
  cl <- makeCluster(detectCores(), outfile='')
  tryCatch({
    result <- parLapply(cl, objects, FUN=my.slow.computation)
  }, finally=stopCluster(cl))
}
fn <- memoise(fn)
fn()

Perhaps that might help in other workflows too.

@hadley

This comment has been minimized.

Member

hadley commented Aug 6, 2017

We could use @gaborcsardi's new file locking package

@privefl

This comment has been minimized.

privefl commented Aug 7, 2017

The package flock is on CRAN and it works really well.

@npatwa

This comment has been minimized.

npatwa commented Jan 2, 2018

Just to add to the ideas already given, I secured the lock on the cache file just before the return statement in my memoised function and released the lock immediately after the function call(memoised function) in the calling environment

@jimhester jimhester added the feature label May 4, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment