-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ROI] Per-ROI parallelization introduces dask overhead (with respect to per-chunk parallelization) #26
Comments
Closing this issue would require a comparison of run times for the per-chunk and per-FOV illumination correction tasks on the same dataset. |
Cool, very visual way to understand this, thanks @tcompa ! Can we run e.g. the illumination correction for the 10 well, 5x5 case example in the old and new setup? If they remain within a ~10% range of run time, I don't think we'd need to worry much about this. Also, this explains why it's save to run multi-ROI in parallel within a single job I'd guess :) |
The current discussion (see #27) is rather on memory, while running times of illumination correction are under control (they are even better, in a certain per-ROI version, than old per-chunk ones). I think we can close this issue as soon as we are happy with #27 (cause otherwise subsequent changes will require re-testing the running times). |
TL;DR
Working on an array through ROI indices (rather than chunks) produces more complex dask graphs and has a (small?) time overhead. This is not surprising, given the increased flexibility we aim for.
With per-ROI parallelization, the elements of a dask array (identified by their indices) are populated with values obtained from a delayed function (e.g. illumination correction, or labeling). The parallelization based on this loop of assignments through indices is more complex than the one provided by
map_blocks
, which rather acts directly onto chunks.As an example, here are two dask graphs for the OLD (per-chunk) and NEW (per-ROI) illumination-correction tasks (with
overwrite=False
, but that doesn't matter), when acting on an artificial array with shape(2, 2, 4320, 2560)
.The on-disk array has 8 chunks (2 channels, 2 Z planes and 2 images), and indeed we notice 8 branches in the OLD graph. The NEW graph also has 8 branches (one per ROI), but with a more complex structure.
Timing of these tests shows a small additional overhead in the ROI-based version (about 0.5 s of overhead, for a total runtime of about 5 s). This is not something we can get rid of, as we switched from a natural (per-chunk) parallelization scheme to an arbitrary one. Still, we should check that it is still under control for an example at scale.
OLD (per-chunk):
NEW (per-ROI):
The text was updated successfully, but these errors were encountered: