some thoughts about "chunksize" in iter_parallel_chains function of beat/sampler/base.py #86

ranneylxr · 2021-08-15T08:22:04Z

Hi again,
In iter_parallel_chains function of beat/sampler/base.py:476-482

        if chunksize is None:
            if draws < 10:
                chunksize = int(np.ceil(float(n_chains) / n_jobs))
            elif draws > 10 and tps < 0.5:
                chunksize = int(np.ceil(float(n_chains) / n_jobs))
            else:
                chunksize = n_jobs

the tps seems to depend on hardware(I have installed libamdm), and if we set a bigger n_jobs, the chunksize will also be bigger when case tps > 0.5 and draws > 10 and stage > 0.

Refering https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool.map, the bigger chunksize leads to the smaller chunks count. when n_job > chunks count, the bigger n_job will decrease the number of parallels, which means the calculation time gets longer.

Is it correct? And can I set a arbitory chunksize in script manually?
Thank you!

The text was updated successfully, but these errors were encountered:

hvasbath · 2021-08-16T08:35:52Z

Hi again,

cool that you are still around ;) .
You are right. The intention behind that is, if your forward model takes a long time, you want to rather use a small chunksize, i.e. having the work distributed in smaller chunks to more workers, otherwise it often happens you have a single worker left with a big chunk of work, that all the other workers are waiting for to be finished until entering the next stage.
Vice versa if you have a fast forward modell you want to have a big chunk-size, because initialising the worker then takes longer than the sampling itself.
Is that understandable? Now I couldnt completely understand what your problem with that setup is. For now you cannot define chunksize in the config file, but if it would help you- we can surely add that- it is not a big deal.

Cheers!

ranneylxr · 2021-08-17T14:56:47Z

I understand it!
Thank you for explaining.

Best regards.

hvasbath · 2023-05-24T12:04:39Z

Sorry for the late fixing, but I apparently didnt get the point correctly until I tried myself with larger number of chains.
It is fixed in the current dev branch here: #121 and should be released to master soon.

hvasbath closed this as completed May 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some thoughts about "chunksize" in iter_parallel_chains function of beat/sampler/base.py #86

some thoughts about "chunksize" in iter_parallel_chains function of beat/sampler/base.py #86

ranneylxr commented Aug 15, 2021

hvasbath commented Aug 16, 2021

ranneylxr commented Aug 17, 2021

hvasbath commented May 24, 2023

some thoughts about "chunksize" in iter_parallel_chains function of beat/sampler/base.py #86

some thoughts about "chunksize" in iter_parallel_chains function of beat/sampler/base.py #86

Comments

ranneylxr commented Aug 15, 2021

hvasbath commented Aug 16, 2021

ranneylxr commented Aug 17, 2021

hvasbath commented May 24, 2023