Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelism of calculations in pyhf ala joblib (or similar) #807

Open
kratsg opened this issue Mar 18, 2020 · 2 comments
Open

Parallelism of calculations in pyhf ala joblib (or similar) #807

kratsg opened this issue Mar 18, 2020 · 2 comments
Labels
feat/enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed / contributions welcome question Further information is requested research experimental stuff

Comments

@kratsg
Copy link
Contributor

kratsg commented Mar 18, 2020

Description

There are starting to be locations in pyhf where one can start parallelizing certain calculations on behalf of the user (rather than the user explicitly parallelizing). For example, one that will come up is with the toy calculation added in #790 where we need to do a for-loop and calculate the test statistic for each toy.

This cannot be batched or vectorized quite simply because a statistical fit is performed for each toy (and num iterations is not necessarily the same for each toy). There may be other good examples in the code-base in the future that we will want the parallelism.

Is your feature request related to a problem? Please describe.

No.

Describe the solution you'd like

Perhaps something like pip install pyhf[toytools] or pyhf[toys-joblib] or pyhf[toys-dask].

Describe alternatives you've considered

Dunno. I didn't think hard enough yet.

Relevant Issues and Pull Requests

Additional context

Nope.

@kratsg kratsg added feat/enhancement New feature or request help wanted Extra attention is needed / contributions welcome good first issue Good for newcomers question Further information is requested research experimental stuff labels Mar 18, 2020
@kanishk16
Copy link
Contributor

I believe dask would be a nice option due to a variety of reasons, primarily scaling up of data in future but before making an opinion I wanted to know about any personal experiences of limitations of dask over joblib.

@matthewfeickert
Copy link
Member

matthewfeickert commented Sep 27, 2021

@kratsg For those who don't know, like me, what's the advantage of using concurrent.futures, as is currently done in the draft of PR #1158, over just using joblib (beyond concurrent.futures being built into the language)?

So replacing

signal_teststat = []
for sample in tqdm.tqdm(signal_sample, **tqdm_options, desc='Signal-like'):
signal_teststat.append(
teststat_func(
poi_test,
sample,
self.pdf,
self.init_pars,
self.par_bounds,
self.fixed_params,
)
)

with something like

from joblib import Parallel, delayed

...

        # n_jobs is set as kwarg
        signal_teststat = Parallel(n_jobs=n_jobs)(
            delayed(teststat_func)(
                poi_test,
                sample,
                self.pdf,
                self.init_pars,
                self.par_bounds,
                self.fixed_params,
            )
            for sample in tqdm.tqdm(signal_sample, **tqdm_options, desc='Signal-like')
        )

(and corresponding code for bkg_teststat) with the default "loky" backend I was seeing rates of over 500 toys/second on branches that have PR #1610 implimented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat/enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed / contributions welcome question Further information is requested research experimental stuff
Projects
None yet
Development

No branches or pull requests

3 participants