Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

usage with multiple threads? #29

Open
davharris opened this issue Oct 26, 2016 · 7 comments
Open

usage with multiple threads? #29

davharris opened this issue Oct 26, 2016 · 7 comments
Labels
feature a feature request or enhancement

Comments

@davharris
Copy link

This looks like a great package. It's saving me and my collaborators a lot of unnecessary computation time.

I was wondering about how the package would perform if a memoized function were running in parallel on several threads, especially with caches stored on the filesystem. Given that the hashes are deterministic, it doesn't seem like there would be a problem, but I didn't see anything specifically about it in the documentation, so I thought it would be good to ask.

Thanks in advance!

@jimhester
Copy link
Member

There is currently no support for this. In particular two processes could write to the same file simultaneously, producing a corrupted file. e.g. if both processes called a function with the same arguments at the same time. To avoid this you would need to use some sort of file locking, maybe with the flock package, although that package is not on CRAN, would need to be tested on windows and this would likely need its own cache, since file locking is only nessesary for multi-process code.

@davharris
Copy link
Author

Thanks for the quick response!

@chochkov
Copy link

chochkov commented Mar 29, 2017

A possible implementation could be to maintain cache per worker using the process ID as a prefix to the filename. This way repeated calls in the workers would be indeed sped up.

I mostly have a parLapply workflow, however, which means that there's nothing to be sped up in the workers. Therefore it already brings a lot to me if I wrap the whole computation in a memoiseed function, roughly so:

fn <- function() {
  cl <- makeCluster(detectCores(), outfile='')
  tryCatch({
    result <- parLapply(cl, objects, FUN=my.slow.computation)
  }, finally=stopCluster(cl))
}
fn <- memoise(fn)
fn()

Perhaps that might help in other workflows too.

@hadley
Copy link
Member

hadley commented Aug 6, 2017

We could use @gaborcsardi's new file locking package

@privefl
Copy link

privefl commented Aug 7, 2017

The package flock is on CRAN and it works really well.

@npatwa
Copy link

npatwa commented Jan 2, 2018

Just to add to the ideas already given, I secured the lock on the cache file just before the return statement in my memoised function and released the lock immediately after the function call(memoised function) in the calling environment

@drag05
Copy link

drag05 commented Mar 20, 2024

My understanding is that flock uses disk caching while memoise function itself has caching options so it can be set to write to disk from any (parallel) process.

Is there a way to create new subprocesses inside parallel processes that can be used by memoise only?

From flock documentation It is still unclear to me what "process synchronization" refers to and why the need for parallel process "synchronization".

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

7 participants