ENH: use joblib instead of multiprocessing #227

swkeemink · 2021-07-09T16:48:19Z

Multiprocessing was giving several issues (such as #147). This PR changes it to using joblib instead.

As a bonus, the code is now much more readable, and the same code is used for Python 2 and 3.

For now it's a draft request as there currently is a bug (known from joblib for a while) that stdout from jupyter notebooks only show up in the terminal, not the notebooks themselves. (See ipython/ipykernel#402 and jupyter/notebook#700).

It can be fixed by using a different backend, but the whole point is to use the standard joblib backend which fixes our issues.

swkeemink · 2021-07-09T16:54:47Z

The threading backend seems to both fix our issues, and give proper stdout, so using that now.

swkeemink · 2021-07-09T16:55:46Z

@scottclowe as this is a rather heavy update to how we handle multiprocessing, can you have a good look and then merge if you're okay with it? Especially in how it interacts with the changes you had made before.

codecov · 2021-07-09T16:58:03Z

Codecov Report

Merging #227 (ace5404) into master (bcae081) will decrease coverage by 0.12%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #227      +/-   ##
==========================================
- Coverage   93.18%   93.05%   -0.13%     
==========================================
  Files           8        8              
  Lines         939      922      -17     
  Branches      197      196       -1     
==========================================
- Hits          875      858      -17     
  Misses         33       33              
  Partials       31       31

Flag	Coverage Δ
nbsmoke	`65.29% <77.77%> (-0.74%)`	⬇️
unittests	`92.73% <100.00%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
fissa/core.py	`95.12% <100.00%> (-0.25%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bcae081...ace5404. Read the comment docs.

swkeemink · 2021-07-09T17:00:27Z

Ah. The usage of None is not consistent with multiprocessing I think, and we currently only use one core by default:
https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html

The maximum number of concurrently running jobs, such as the number of Python worker processes when backend=”multiprocessing” or the size of the thread-pool when backend=”threading”. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used. None is a marker for ‘unset’ that will be interpreted as n_jobs=1 (sequential execution) unless the call is performed under a parallel_backend context manager that sets another value for n_jobs.

Additionally, supposedly no multiprocessing code is used at all if n_jobs=1., meaning we can further simplify the code.

Handle ncores=None as one-per-CPU thread like with multiprocessing.Pool for backward compatibility. Handle ncores<0 correctly as per joblib.Parallel n_jobs<0, i.e. -1 = all cores -2 = 1 less than all cores etc

scottclowe · 2021-07-11T12:43:00Z

Ah. The usage of None is not consistent with multiprocessing I think, and we currently only use one core by default:
https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html

I changed our code to map None to -1 for backward compatibility. Users can specify 0 or 1 to use only one core, and -2 etc is handled correctly and passed to joblib resulting in all-but-one-core being utilised.

Additionally, supposedly no multiprocessing code is used at all if n_jobs=1., meaning we can further simplify the code.

I'd rather avoid going through at all joblib when ncores=1. This makes for a simpler error traceback and easier debugging, and allows the user to bypass joblib entirely if it is causing issues for them.

scottclowe

Approved, with changes I pushed.

swkeemink marked this pull request as ready for review July 9, 2021 16:54

swkeemink requested a review from scottclowe July 9, 2021 16:55

swkeemink and others added 3 commits July 11, 2021 13:12

ENH: Use joblib instead of multiprocessing

9958567

BUG: Use threading multiprocessing backend for stdout in notebooks

58436bd

BUG: Using ncores_separation for preparation step

62886ff

scottclowe force-pushed the joblib branch from acadd13 to 3fb0f12 Compare July 11, 2021 12:34

scottclowe added 4 commits July 11, 2021 13:41

MNT: Handle ncores=None as one-per-CPU thread, and ncores<0

577f867

Handle ncores=None as one-per-CPU thread like with multiprocessing.Pool for backward compatibility. Handle ncores<0 correctly as per joblib.Parallel n_jobs<0, i.e. -1 = all cores -2 = 1 less than all cores etc

MNT: More pythonic iterables

f9f5a34

API: Change default ncores values None -> -1 (no effect on UI)

d5d5b67

STY: Sort import order

ace5404

scottclowe force-pushed the joblib branch from 3fb0f12 to ace5404 Compare July 11, 2021 12:41

scottclowe approved these changes Jul 11, 2021

View reviewed changes

scottclowe merged commit ba0680e into rochefort-lab:master Jul 11, 2021

swkeemink mentioned this pull request Jul 12, 2021

JNB:BUG:ENH: Fix Suite2p example, and update Binder environment for Python 3.9 #220

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: use joblib instead of multiprocessing #227

ENH: use joblib instead of multiprocessing #227

swkeemink commented Jul 9, 2021 •

edited

swkeemink commented Jul 9, 2021

swkeemink commented Jul 9, 2021

codecov bot commented Jul 9, 2021 •

edited

swkeemink commented Jul 9, 2021 •

edited

scottclowe commented Jul 11, 2021

scottclowe left a comment

ENH: use joblib instead of multiprocessing #227

ENH: use joblib instead of multiprocessing #227

Conversation

swkeemink commented Jul 9, 2021 • edited

swkeemink commented Jul 9, 2021

swkeemink commented Jul 9, 2021

codecov bot commented Jul 9, 2021 • edited

Codecov Report

swkeemink commented Jul 9, 2021 • edited

scottclowe commented Jul 11, 2021

scottclowe left a comment

Choose a reason for hiding this comment

swkeemink commented Jul 9, 2021 •

edited

codecov bot commented Jul 9, 2021 •

edited

swkeemink commented Jul 9, 2021 •

edited