-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] First version of check_parallel function #3133
base: main
Are you sure you want to change the base?
Conversation
Hello @emmanuelle! Thanks for updating the PR.
Comment last updated on September 01, 2018 at 14:53 Hours UTC |
Codecov Report
@@ Coverage Diff @@
## master #3133 +/- ##
==========================================
- Coverage 86.81% 86.04% -0.77%
==========================================
Files 339 338 -1
Lines 27385 27419 +34
==========================================
- Hits 23773 23594 -179
- Misses 3612 3825 +213
Continue to review full report at Codecov.
|
Very cool; and I like that you can either pass in test data, or just the shape/dtype if that's what you have. I am not too concerned about execution cost, because this is the kind of calibration you will do once, and then use those parameters to optimize a pipeline—likely not for once-off experiments. |
def check_parallel(function, im=None, shape=(1000, 1000), | ||
dtype_input=np.uint8, depth_max=10, | ||
full_output=False, verbose = True, | ||
extra_arguments=(), extra_keywords={}): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We normalised a while back on image=
instead of im=
. ;)
@emmanuelle this is super great! ping @jakirkham I agree with @stefanv, the ability to provide an input image or some parameters for a random image is great. And I think keep it optional, and use good documentation to suggest to users to use their own images. I think up to ~100MB is a reasonable amount of RAM to use for this. So that's a cap of up to 4K x 4K x double for the image. (or 4K x 2K if you want to allow for the output.) So yeah, I would increase the default size. Thanks!!! |
Here are the results from benchmarking all functions for which check_parallel returns a finite depth (warning; there are still some bugs, there are still functions in the list which should not be used with chunking). I run the computation on my laptop (4 logical cores), on a 4000x4000 uint8 image.
|
And here is the same table on the BIDS machine which has a lot of cores (32). Thank you @yuvipanda!
|
The script for running this benchmark is https://github.com/emmanuelle/skimage-sprint/blob/master/benchmark_dask_simplified.py (it uses a pickled list of suitable functions that should be downloaded as well). Unfortunately the code only runs with this branch. |
As requested on gitted by @stsievert, here is the benchmarking result using joblib (on my laptop, 4 logical cores). And the script is https://github.com/emmanuelle/skimage-sprint/blob/master/benchmark_joblib.py
|
Thanks for doing this @stsievert ! If you have time can I suggest that we plot both values on the same plot, perhaps ordering the labels on the x axis by one of the y-values? I suspect that this will make it easier to identify changes. |
@mrocklin here you go: I chose to order the times by the joblib execution time because (I presume) we're trying to improve dask. Let me know if you want it the other way. |
Thanks for the ping, @jni. Definitely interesting. Thanks for the nice profiling data, @emmanuelle. Also that graph is great, @stsievert. This may have already been answered, but was curious whether this was using the Kubernetes backed Distributed Cluster or a different backend? If it was one of the Distributed Cluster backends, there is a bit of scheduling overhead and transmission overhead, which is probably not needed for this As a general note, we may consider picking a specific scheduler to use in |
@stsievert I generally agree that log scale is the right way to show ratios, but not in this specific case, because it is not symmetric: speedup is bounded by the number of processors (4), while slowdown can be infinite, and we are much more interested in the now-squashed 1-4 range than in the 0-1 range. |
@jakirkham this uses stock standard |
@jni good point. Physical constraints matter, especially if we care about values close to them. Either way, I'm glad to see both ends of the spectrum (when speedup is close to physical limit with the linear scale, and when slowdown is large with the log scale). |
I like this. 😆
So these end up being different things actually. What |
😠 |
As requested in #3128, this is a function which checks whether it is safe or not to use chunking with apply_parallel.
Several remaining caveats, on which I'd like to pick some thoughts
the function provides timing about potential acceleration using chunking, but for moderate sized array (say 1000x1000) the additional cost of dask is too large to observe an acceleration. However, choosing a large value for the default size can be problematic.
At the moment the function tries to make things easy for the user by having only the function name as required argument. However, the default parameters for dtype are not suited for all functions. Should we force the user to provide a test image?