Replies: 2 comments 19 replies
-
|
Here's the |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
Ok, I tried it with I watched |
Beta Was this translation helpful? Give feedback.
19 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment

Uh oh!
There was an error while loading. Please reload this page.
-
reposted from this stackoverflow at Henrik's request
Recently I've been playing with doing some parallel processing in R using
future(andfuture.applyandfurrr) which has been great mostly, but I've stumbled onto something that I can't explain. It's possible that this is a bug somewhere, but it may also be sloppy coding on my part. If anyone can explain this behavior it would be much appreciated.The setup
I'm running simulations on different subgroups of my data. For each group, I want to run the simulation
ntimes and then calculate some summary stats on the results. Here is some example code to reproduce my basic setup and demonstrate the issue I'm seeing:The outer loop is for testing two parallelization strategies: whether to parallelize over the subsets of data or over the 100 simulations.
Some caveats
plan(multicore)would be better here (though I'm sure if it would) but I'm more interested in figuring out what's happening withplan(multisession)The results
I ran this on an 8-vCPU Linux EC2 (I can give more specs if people need them) and created the following plot from the results (plotting code at the bottom for reproducibility):
First off,
plan(list(multisession, sequential))is faster (as expected, see caveat above), but what I'm confused about is the memory profile. The total system memory usage remains pretty constant forplan(list(multisession, sequential))which I would expect, because I assumed theresobject is overwritten each time through the loop.However, the memory usage for
plan(list(sequential, multisession))steadily grows as the program runs. It appears that each time through the loop theresobject is created and then hangs around in limbo somewhere, taking up memory. In my real example this got large enough that it filled my entire (32GB) system memory and killed the process about halfway through.Plot twist: it only happens when nested
And here's the part that really has me confused! When I changed the outer
future_lapplyto just regularlapplyand setplan(multisession)I don't see it! From my reading of this "Future: Topologies" vignette this should be the same asplan(list(sequential, multisession))but the plot doesn't show the memory growing at all (in fact, it's almost identical toplan(list(multisession, sequential))in the above plot)Note on other options
I actually originally found this with
furrr::future_map_dfr()but to be sure it wasn't a bug infurrr, I tried it withfuture.apply::future_lapply()and got the results shown. I tried to code this up with justfuture::future()and got very different results, but quite possibly because what I coded up wasn't actually equivalent. I don't have much experience with using futures directly without the abstraction layer provided by eitherfurrrorfuture.apply.Again, any insight on this is much appreciated.
Plotting code
Beta Was this translation helpful? Give feedback.
All reactions