How to tell if the parallel computation is actually turned on #249

lirui0321 · 2023-07-23T01:00:10Z

Hello, I note that BayesianTools package requires the likelihood function to be vectorized for parallel computation. However, it seems that the package will run anyway with option parallel = T, even if the function is not vectorized. When I check the CPU usage, I find that R does not take much resources, so I guess the parallel is probably not turned on. Is there an easy way to check the parallelization is applied? or can the package return a warning note if parallelization is not turned on for whatever reason? Thank you!

Rui

florianhartig · 2023-07-24T07:24:56Z

Hello Rui,

the likelihood is automatically vectorised in createBayesianSetup, so if you set parallel = T, your likelihood should be automatically parallelised. You should see that several Rsessions are open on your system in this case.

If your likelihood is very easy to compute, this will not show in much additional CPU load because most of the time, the CPUs are idle and time is spend in communicating within the socket cluster. I have found in practice that likelihood parallelisation makes sense for likelihoods with > 50ms evaluation time or so. For faster likelihoods, it makes more sense to parallelise the MCMC chains,

Best
Florian

lirui0321 · 2023-07-24T17:26:59Z

Hi Florian,

Many thanks for helping me address my question!

It takes ~1min to compute my likelihood function, where ODEs are computed multiple times against different data sets. I turned on MCMC for 10000 iterations 2 days ago while I posted this question, and it just finished 2000 iterations this morning. The package returned a message "parallel function execution created with39cores", and as you mentioned, R opened multiple sessions after that. However, the total CPU usage is less than 10%, so it seems that the computation has not taken advantage of parallelization. I am wondering if I coded likelihood function wrong, such that computation has to be done sequentially somehow. I remember that when I use package "DEoptim" for parallel computation, it requires, as an argument, a list of names of packages and functions used in my likelihood calculation. Does BayesianTools have a similar requirement?

Rui

florianhartig · 2023-07-24T17:42:02Z

Hello Rui,

you can control package export by hand, but per default in BT, your entire environment (data + packages) are exported, so be careful what you have in your environment or control by hand.

What algorithm are you using? Note that parallelisation can only work up to the number of internal chains in your algorithm - so if you run a DEzs with 3 internal chains, it doesn't help if you have 39 cores, it will still only use 3 cores at a time.

If you have a computer with 40 cores, I would think the best use is to run three independent MCMC chains (has to be done by hand) and then set up the DEzs with parallel chains to make best use of your hardware.

Best
F

lirui0321 · 2023-07-24T21:05:07Z

Hi Florian,

Please forgive my naive MCMC questions. I am using a metropolis / AM sampler. Does it mean that I can at best use 3 cores if I only have 3 MCMC chains? If I switch to DEzs sampler, I can maximize the use of CPU by increasing the number of internal chains of DEzs algorithm. Is this a correct understanding?

Could you provide me an example about how to set the number of internal chain vs independent MCMC chain? I believe that I can change the number of MCMC chain by specifying "nrChains" in runMCMC settings list. How to change the number of DEzs internal chain?

In addition, is there any general guidance about when should we use which sampling algorithm?

Thank you!
Rui

florianhartig · 2023-07-25T06:12:00Z

Hi Rui,

MCMCs are usually not parallelizable, because the next step depends on the previous step.

There are only a few specific things that can me parallelised, e.g. you can do parallel proposals all Samplers that apply rejection (basically all Samplers in BT, but not implemented in BT), our you can calculate the chains in population MCMCs such as DEzs in parallel.

If you want to use a large number of cores, you should probably go for an SMC, see our recent paper Speich, M., Dormann, C. F., & Hartig, F. (2021). Sequential Monte-Carlo algorithms for Bayesian model calibration–A review and method comparison✰. Ecological Modelling, 455, 109608. https://doi.org/10.1016/j.ecolmodel.2021.109608 The code for this is in a branch of the BT GitHub repo, I haven't managed yet to merge it into the main branch.

As a default for most users with complicated models are runtime problems, I would recommend to

Use DEzs and possibly increase the number of internal chains (this is set by the z-matrix, see help of DEzs)
Turn on parallelisation
If you want to run several independent MCMC chains for convergence checks (recommended), run this in parallel as well, see https://cran.r-project.org/web/packages/BayesianTools/vignettes/InterfacingAModel.html#parallelization

There is an open issue #181 to improve the documentation on the parallelisation, and I'll take this as a nudge to bump this up the priority list

lirui0321 · 2023-07-28T17:50:01Z

Many thanks for detailed explanation, Florian!

For my curiosity, is there a plan to include additional "popular" samplers into BayesianTools? For example, Gibbs and NUTS offered in BUGS and Stan? In your opinion, what are advantage and disadvantage of these samplers over SMC and DE you've included in BayesianTools?

florianhartig · 2023-07-29T10:49:33Z

No, currently my idea is that BT will only include "black box samplers" that do not require either derivatives or the structure of the likelihood. For Gibbs or NUTS, you would need a metalanguage such as in JAGS or STAN that allows the sampler to understand the mathematical structure of the likelihood.

florianhartig added the question label Jul 25, 2023

florianhartig mentioned this issue Jul 25, 2023

Improve parallelisation documentation and handling #181

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to tell if the parallel computation is actually turned on #249

How to tell if the parallel computation is actually turned on #249

lirui0321 commented Jul 23, 2023

florianhartig commented Jul 24, 2023

lirui0321 commented Jul 24, 2023

florianhartig commented Jul 24, 2023

lirui0321 commented Jul 24, 2023

florianhartig commented Jul 25, 2023

lirui0321 commented Jul 28, 2023

florianhartig commented Jul 29, 2023

How to tell if the parallel computation is actually turned on #249

How to tell if the parallel computation is actually turned on #249

Comments

lirui0321 commented Jul 23, 2023

florianhartig commented Jul 24, 2023

lirui0321 commented Jul 24, 2023

florianhartig commented Jul 24, 2023

lirui0321 commented Jul 24, 2023

florianhartig commented Jul 25, 2023

lirui0321 commented Jul 28, 2023

florianhartig commented Jul 29, 2023