usage with multiple threads? #29

davharris · 2016-10-26T15:57:40Z

This looks like a great package. It's saving me and my collaborators a lot of unnecessary computation time.

I was wondering about how the package would perform if a memoized function were running in parallel on several threads, especially with caches stored on the filesystem. Given that the hashes are deterministic, it doesn't seem like there would be a problem, but I didn't see anything specifically about it in the documentation, so I thought it would be good to ask.

Thanks in advance!

jimhester · 2016-10-26T17:37:33Z

There is currently no support for this. In particular two processes could write to the same file simultaneously, producing a corrupted file. e.g. if both processes called a function with the same arguments at the same time. To avoid this you would need to use some sort of file locking, maybe with the flock package, although that package is not on CRAN, would need to be tested on windows and this would likely need its own cache, since file locking is only nessesary for multi-process code.

davharris · 2016-10-27T17:13:03Z

Thanks for the quick response!

chochkov · 2017-03-29T15:50:54Z

A possible implementation could be to maintain cache per worker using the process ID as a prefix to the filename. This way repeated calls in the workers would be indeed sped up.

I mostly have a parLapply workflow, however, which means that there's nothing to be sped up in the workers. Therefore it already brings a lot to me if I wrap the whole computation in a memoiseed function, roughly so:

fn <- function() {
  cl <- makeCluster(detectCores(), outfile='')
  tryCatch({
    result <- parLapply(cl, objects, FUN=my.slow.computation)
  }, finally=stopCluster(cl))
}
fn <- memoise(fn)
fn()

Perhaps that might help in other workflows too.

hadley · 2017-08-06T15:08:10Z

We could use @gaborcsardi's new file locking package

privefl · 2017-08-07T07:16:06Z

The package flock is on CRAN and it works really well.

npatwa · 2018-01-02T20:36:59Z

Just to add to the ideas already given, I secured the lock on the cache file just before the return statement in my memoised function and released the lock immediately after the function call(memoised function) in the calling environment

drag05 · 2024-03-20T15:36:45Z

My understanding is that flock uses disk caching while memoise function itself has caching options so it can be set to write to disk from any (parallel) process.

Is there a way to create new subprocesses inside parallel processes that can be used by memoise only?

From flock documentation It is still unclear to me what "process synchronization" refers to and why the need for parallel process "synchronization".

Thank you!

jimhester added the feature a feature request or enhancement label May 4, 2018

john-c-martens mentioned this issue Sep 11, 2020

Usage inside deployed app #114

Open

Rekyt mentioned this issue Feb 6, 2023

Using future and memoise together funecology/fundiversity#71

Closed

Rekyt mentioned this issue Mar 20, 2024

Memoised Function Along With Using foreach() In A Package #152

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

usage with multiple threads? #29

usage with multiple threads? #29

davharris commented Oct 26, 2016

jimhester commented Oct 26, 2016

davharris commented Oct 27, 2016

chochkov commented Mar 29, 2017 •

edited

Loading

hadley commented Aug 6, 2017

privefl commented Aug 7, 2017

npatwa commented Jan 2, 2018 •

edited

Loading

drag05 commented Mar 20, 2024 •

edited

Loading

usage with multiple threads? #29

usage with multiple threads? #29

Comments

davharris commented Oct 26, 2016

jimhester commented Oct 26, 2016

davharris commented Oct 27, 2016

chochkov commented Mar 29, 2017 • edited Loading

hadley commented Aug 6, 2017

privefl commented Aug 7, 2017

npatwa commented Jan 2, 2018 • edited Loading

drag05 commented Mar 20, 2024 • edited Loading

chochkov commented Mar 29, 2017 •

edited

Loading

npatwa commented Jan 2, 2018 •

edited

Loading

drag05 commented Mar 20, 2024 •

edited

Loading