New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
are the replicates "independent"? #4
Comments
TL;DR: in all my tests on a single computer, parallelizing the replicates actually slows things down. The answer is ..."sort of" lol. You can actually run the replicates in parallel after you run the first aitchison model fitting code on the original data. That is used as input to the replicates. But once you have that, you could run the replicates in parallel. In fact, this is what the DivNet R package does. However, I have found that on all computers I've tested on, it actually slows down the overall computation. I did try and parallelize the replicate code of divnet-rs with rayon (if you're not a rust coder, that's a work-stealing, data-parallelism library), and got worse performance in general with the parallel code. There is a lot of BLAS/LAPAKCE as well as other vector operations going on that seem to saturate the memory bandwidth of a single computer (a possible explanation, at least). Now, what I didn't try, and what I think probably would help, is to do the first model fitting, then write that to disk. Then provide a mini-app that just does the replicates...then you could, for example, submit each replicate as it's own job in a compute cluster...as long as they didn't all go to the same node, you should avoid the memory bandwidth problem (assuming nothing else on that node is taking it!). Which I think is more or less what you're getting at right? |
Yes! The latter - running the replicates as separate single-threaded jobs on an HPC - is what I was asking about. So, it wouldn't work without a little tinkering to get the model fitting to run separate from the bootstrap. I hadn't heard of rust until using divnet-rs, so probably won't tinker on my own, but you have satisfied my curiosity ;). Thanks! |
Hey, so I thought this was an interesting idea, so I threw together a (partial) solution during my lunch break :) You can check it out in the separate-bootstraps branch. You will need to clone this repository, and then pull that branch specifically.
You know in the divnet-rs book it has the following instruction for installation?
Just change the second line to
That will build two additional binaries: You use those sort of like the original You can recreate the full DN_MODEL_DAT=dn_model DN_MODEL_CSV=dn_model.csv ./target/release/fit_model ./test_files/small/config_small.toml
# Each of these could be run on separate nodes, or as separate scheduled jobs (eg torque, slurm, etc).
#
# You must change the replicate, seed, and outfiles by hand!
#
# DN_MODEL_DAT must be the same for these as for the fit_model run
DN_MODEL_DAT=dn_model DN_REPLICATE=1 DN_SEED=1 DN_BOOT_CSV=dn_boot_1.csv ./target/release/bootstrap ./test_files/small/config_small.toml
DN_MODEL_DAT=dn_model DN_REPLICATE=2 DN_SEED=2 DN_BOOT_CSV=dn_boot_2.csv ./target/release/bootstrap ./test_files/small/config_small.toml
DN_MODEL_DAT=dn_model DN_REPLICATE=3 DN_SEED=3 DN_BOOT_CSV=dn_boot_3.csv ./target/release/bootstrap ./test_files/small/config_small.toml
# Concatenate the outfiles
cat dn_model.csv dn_boot_*.csv > dn_out.csv
# Clean up
rm dn_model dn_model.csv dn_boot_*.csv Note that Now the cool thing is that the three (or however many replicates you want) I haven't tested it to see if you get a nice speedup or not on a cluster, but if you would be willing to try it out, I would be very interested in the results! |
I tested with a modest dataset I had on hand:
Running 5 replicates with the 'careful' tuning parameters using plain
Running
Instead of parallelizing the
SLURM also has an option
with the user time being exactly the amount of time running serially minus the Memory usage for |
This is great stuff @pooranis, thank you!!! I will have to give it a closer look and see how we can incorporate it. If not directly in the code, then at least in the documentation! 🚀 |
This is now in the main branch (as of e9cfe24). I still need to add to the docs and clean it up a bit, but it's fine for now. |
Is the computation of each replicate independent of the other? So, could you run divnet-rs with
replicates = 1
5 separate times and then combine the outputs (taking care to fix up the headers, replicates column, etc)? I am curious if it is possible to parallelize this way. Thanks!The text was updated successfully, but these errors were encountered: