Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] First version of check_parallel function #3133

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

emmanuelle
Copy link
Member

As requested in #3128, this is a function which checks whether it is safe or not to use chunking with apply_parallel.

Several remaining caveats, on which I'd like to pick some thoughts

  • the function provides timing about potential acceleration using chunking, but for moderate sized array (say 1000x1000) the additional cost of dask is too large to observe an acceleration. However, choosing a large value for the default size can be problematic.

  • At the moment the function tries to make things easy for the user by having only the function name as required argument. However, the default parameters for dtype are not suited for all functions. Should we force the user to provide a test image?

@pep8speaks
Copy link

pep8speaks commented May 31, 2018

Hello @emmanuelle! Thanks for updating the PR.

Line 152:25: E127 continuation line over-indented for visual indent
Line 153:25: E127 continuation line over-indented for visual indent
Line 153:51: E251 unexpected spaces around keyword / parameter equals
Line 153:53: E251 unexpected spaces around keyword / parameter equals
Line 154:25: E127 continuation line over-indented for visual indent
Line 193:37: E261 at least two spaces before inline comment
Line 200:50: E261 at least two spaces before inline comment
Line 200:67: W291 trailing whitespace
Line 213:46: E261 at least two spaces before inline comment
Line 217:41: E127 continuation line over-indented for visual indent
Line 218:41: E127 continuation line over-indented for visual indent
Line 219:41: E127 continuation line over-indented for visual indent
Line 227:5: E303 too many blank lines (2)
Line 229:1: E122 continuation line missing indentation or outdented
Line 242:13: E128 continuation line under-indented for visual indent
Line 242:14: E225 missing whitespace around operator
Line 243:33: E127 continuation line over-indented for visual indent
Line 244:10: E124 closing bracket does not match visual indentation

Line 8:41: E126 continuation line over-indented for hanging indent
Line 115:38: E127 continuation line over-indented for visual indent
Line 115:61: E231 missing whitespace after ':'
Line 118:38: E127 continuation line over-indented for visual indent
Line 118:76: E231 missing whitespace after ':'
Line 122:38: E127 continuation line over-indented for visual indent
Line 122:76: E231 missing whitespace after ':'

Comment last updated on September 01, 2018 at 14:53 Hours UTC

@codecov-io
Copy link

codecov-io commented May 31, 2018

Codecov Report

Merging #3133 into master will decrease coverage by 0.76%.
The diff coverage is 85.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3133      +/-   ##
==========================================
- Coverage   86.81%   86.04%   -0.77%     
==========================================
  Files         339      338       -1     
  Lines       27385    27419      +34     
==========================================
- Hits        23773    23594     -179     
- Misses       3612     3825     +213
Impacted Files Coverage Δ
skimage/util/tests/test_apply_parallel.py 100% <100%> (ø) ⬆️
skimage/util/__init__.py 100% <100%> (ø) ⬆️
skimage/util/apply_parallel.py 83.33% <80.48%> (-3.21%) ⬇️
skimage/io/_plugins/gdal_plugin.py 0% <0%> (-62.5%) ⬇️
skimage/exposure/setup.py 0% <0%> (-40%) ⬇️
skimage/external/setup.py 0% <0%> (-40%) ⬇️
skimage/draw/setup.py 0% <0%> (-38.47%) ⬇️
skimage/future/graph/setup.py 0% <0%> (-38.47%) ⬇️
skimage/io/setup.py 0% <0%> (-33.34%) ⬇️
skimage/segmentation/setup.py 0% <0%> (-31.25%) ⬇️
... and 78 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f03a2c...053e211. Read the comment docs.

@stefanv
Copy link
Member

stefanv commented May 31, 2018

Very cool; and I like that you can either pass in test data, or just the shape/dtype if that's what you have. I am not too concerned about execution cost, because this is the kind of calibration you will do once, and then use those parameters to optimize a pipeline—likely not for once-off experiments.

def check_parallel(function, im=None, shape=(1000, 1000),
dtype_input=np.uint8, depth_max=10,
full_output=False, verbose = True,
extra_arguments=(), extra_keywords={}):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We normalised a while back on image= instead of im=. ;)

@jni
Copy link
Member

jni commented Jun 1, 2018

@emmanuelle this is super great! ping @jakirkham

I agree with @stefanv, the ability to provide an input image or some parameters for a random image is great. And I think keep it optional, and use good documentation to suggest to users to use their own images.

I think up to ~100MB is a reasonable amount of RAM to use for this. So that's a cap of up to 4K x 4K x double for the image. (or 4K x 2K if you want to allow for the output.) So yeah, I would increase the default size.

Thanks!!!

@emmanuelle
Copy link
Member Author

Here are the results from benchmarking all functions for which check_parallel returns a finite depth (warning; there are still some bugs, there are still functions in the list which should not be used with chunking). I run the computation on my laptop (4 logical cores), on a 4000x4000 uint8 image.

function execution time  apply_parallel time ratio
rgb2grey 6.795999070163816e-06 0.02313908899668604 0.00029370210171788234
img_as_ubyte 1.4875004126224667e-05 0.02277622200199403 0.0006530935694656637
img_as_bool 0.001723583998682443 0.025542005001625512 0.06748037198226031
invert 0.0017323210049653426 0.02439795700047398 0.07100270751898156
rgb2gray 0.0031297489986172877 0.02285741499508731 0.13692488845698245
img_as_uint 0.003218386002117768 0.030702772994118277 0.10482395198421697
img_as_float32 0.010558027999650221 0.0460202069953084 0.22942156693746674
lab2lch 0.03773539699614048 0.07973315999697661 0.4732710580838808
img_as_float 0.037941883005260024 0.0738341539981775 0.513879836778472
img_as_float64 0.03939618499862263 0.07474972599447938 0.5270411961313708
lch2lab 0.04058516000077361 0.07914782800071407 0.5127766740536134
img_as_int 0.04102924699691357 0.04055746000085492 1.0116325577599956
adjust_gamma 0.08516024999698857 0.0649725590046728 1.3107110340361363
binary_erosion 0.10045521600113716 0.07805385100073181 1.286998843916046
remove_small_objects 0.10158984699955909 0.06649240999831818 1.5278412528907983
rescale_intensity 0.12709428600646788 0.07174670899985358 1.7714301851354222
dilation 0.1375568099974771 0.09433680899383035 1.4581456746801067
binary_dilation 0.13767812900186982 0.07602557800419163 1.8109448506169727
erosion 0.15314415600005304 0.09429431799799204 1.624107997719589
roberts_neg_diag 0.16481931599992095 0.1254620129984687 1.3136989600344815
laplace 0.1762534700028482 0.13204679999762448 1.33478032035626
roberts_pos_diag 0.17926109700056259 0.11881794699729653 1.5087038745471786
scharr_h 0.18170992199884495 0.1356080289988313 1.3399643320559653
sobel_h 0.18235799099784344 0.13196586500271223 1.3818572779718112
prewitt_h 0.18253791199822444 0.13301128299644915 1.3723490811159038
sobel_v 0.18282264300069073 0.1400067049980862 1.305813482312792
scharr_v 0.1830498479976086 0.13262762999511324 1.3801788360717386
prewitt_v 0.18566674700559815 0.1374973019992467 1.3503301105254804
remove_small_holes 0.19366607499978272 0.10923638999520335 1.7729080483919941
adjust_log 0.22139144600077998 0.10653375699621392 2.0781342200167403
binary_opening 0.23167673999705585 0.11655931299901567 1.987629594205075
clear_border 0.23671268400357803 0.16786381600104505 1.410147163591851
binary_closing 0.2503297160001239 0.14615476899780333 1.7127714525954763
opening 0.28756442300073104 0.16014558199822204 1.7956438099174263
black_tophat 0.2906419960054336 0.16512953200435732 1.7600849010930666
closing 0.29194321599788964 0.1672891980051645 1.7451408667096178
white_tophat 0.29321027800324373 0.16067294599633897 1.8248889144655673
find_boundaries 0.3017403129997547 0.15694394699676195 1.9225992386057433
median 0.3587525800030562 0.20159026100009214 1.7796126569968191
gaussian 0.37582986199413426 0.218739497999195 1.7181618566003902
corner_fast 0.3770911920000799 0.25288339099643053 1.491166305996753
roberts 0.4041518250014633 0.21760716299468186 1.8572542348311414
adjust_sigmoid 0.4110622699954547 0.18257709799945587 2.2514448663034385
scharr 0.44275888199626934 0.2309465989965247 1.9171483101291829
prewitt 0.4431294270034414 0.24439613699360052 1.813160520679771
sobel 0.4446095419989433 0.24177089899603743 1.8389704627198757
unsharp_mask 0.4799612399947364 0.29921119599748636 1.604088504758921
corner_moravec 0.6229208379954798 0.3554431709999335 1.7525187957418835
hessian_matrix_det 0.702788894996047 0.3339849679978215 2.1042530722539317
corner_kitchen_rosenfeld 1.20726042699971 0.4280286119974335 2.820513379621802
threshold_niblack 1.448343397998542 0.6386662589939078 2.2677625091391556
threshold_sauvola 1.476177006996295 0.6246605380001711 2.3631667396858846
convex_hull_image 1.4820463560026838 1.0993277939996915 1.3481387117581598
shape_index 1.5843946199966013 0.744873938994715 2.1270641071627594
corner_shi_tomasi 1.9268143989975215 0.8156647319992771 2.3622627329671393
corner_harris 1.9363305189981475 0.7972256700013531 2.4288361399537735
canny 2.137716088000161 0.9864898750020075 2.1669924265525897
local_maxima 2.7152849010017235 1.2502162119999412 2.171852256385435
local_minima 3.0668977460009046 1.4749290379986633 2.0793527464632398
denoise_nl_means 59.73091666000255 52.88144808100333 1.1295249813981505
convex_hull_object 62.728342925001925 16.531447746994672 3.794485751340538

@emmanuelle
Copy link
Member Author

And here is the same table on the BIDS machine which has a lot of cores (32). Thank you @yuvipanda!

function execution time apply_parallel time ratio
rgb2grey 1.7673010006546974e-05 0.10679058998357505 0.00016549220309828024
img_as_ubyte 3.270001616328955e-05 0.11863284301944077 0.0002756404999746224
img_as_bool 0.004286767973098904 0.1267053359770216 0.033832576505509776
invert 0.00430565100396052 0.12000373599585146 0.03587930799178925
rgb2gray 0.005187998991459608 0.11336254695197567 0.04576466505871137
img_as_uint 0.007858634984586388 0.20586912700673565 0.03817296502320752
img_as_float32 0.018745667999610305 0.15442456101300195 0.12139045678123697
img_as_float64 0.07576712605077773 0.20858099003089592 0.36325039036181955
lab2lch 0.07780440000351518 0.25112582702422515 0.30982237440679367
img_as_float 0.07796673703705892 0.21874569699866697 0.35642638052685466
img_as_int 0.07810256304219365 0.13368380401516333 0.5842335473438108
lch2lab 0.0850444280076772 0.26435764401685447 0.32170217102651694
binary_erosion 0.169429212983232 0.17702125798678026 0.9571122412647456
dilation 0.16960711602587253 0.1806675589759834 0.9387801384332581
erosion 0.17058854800416157 0.23412152600940317 0.728632479515404
remove_small_objects 0.18163854599697515 0.14504026103531942 1.2523319021933057
adjust_gamma 0.1911868829629384 0.13413403800223023 1.4253420370432712
binary_dilation 0.20767842198256403 0.13307620404521003 1.5605977302449157
rescale_intensity 0.25519531301688403 0.1569065639632754 1.626415788931644
roberts_pos_diag 0.2806987669900991 0.27032114297617227 1.0383899827430132
roberts_neg_diag 0.28662895900197327 0.2745221650111489 1.0441013350974144
laplace 0.33220082201296464 0.2921829710248858 1.136961612949957
remove_small_holes 0.3370305789867416 0.17217175994301215 1.9575253171501341
scharr_h 0.3381672860123217 0.2701092420029454 1.2519648846692708
white_tophat 0.33917671896051615 0.18838452798081562 1.8004489147593727
prewitt_v 0.3413911889656447 0.27485801302827895 1.242063803068097
sobel_v 0.3433059789822437 0.2772212159470655 1.2383827760419204
prewitt_h 0.3454915700131096 0.28438065299997106 1.214891260599028
opening 0.3472524250391871 0.17820649000350386 1.948595839760715
scharr_v 0.3499056010041386 0.26127566100331023 1.3392200393273732
closing 0.35062394302804023 0.18931903300108388 1.852026906486644
sobel_h 0.3514603739604354 0.26734727702569216 1.314621109556541
black_tophat 0.35881027998402715 0.19527941901469603 1.8374198458518785
find_boundaries 0.35945396998431534 0.1424805789720267 2.5228278308364196
binary_closing 0.37758919200859964 0.1425591010483913 2.6486502035420942
binary_opening 0.39165404299274087 0.15158400795189664 2.5837424955607955
adjust_log 0.4135783999809064 0.143463762011379 2.882807436404065
clear_border 0.4145890310173854 0.40837364399340004 1.0152198534734183
median 0.566598167992197 0.2051065200357698 2.762458101738474
corner_fast 0.686816839966923 0.32152095896890387 2.1361495131437107
adjust_sigmoid 0.7161388889653608 0.17286112596048042 4.142856787413757
roberts 0.7450532860239036 0.32393049803795293 2.30004056591365
gaussian 0.7618921610410325 0.32063930097501725 2.3761658621517383
scharr 0.8656917969929054 0.3178924649837427 2.7232221343691734
sobel 0.870451889000833 0.31081210100091994 2.8005727132170337
prewitt 0.8853721090126783 0.4003270570538007 2.2116219561289645
unsharp_mask 0.9583755559870042 0.33909888903144747 2.826242099242401
corner_moravec 1.0726075359852985 0.3013310689711943 3.5595650314034963
hessian_matrix_det 1.340665255964268 0.30463958700420335 4.400824164542246
corner_kitchen_rosenfeld 2.336976054997649 0.3602462050039321 6.487163563519402
threshold_sauvola 3.137220091011841 0.48640516900923103 6.449808289254226
threshold_niblack 3.176773699000478 0.551758271001745 5.757546131991614
convex_hull_image 3.235574327001814 1.2945832590339705 2.4993172933629837
shape_index 3.2708308119908907 0.6570095879724249 4.9783608517569595
corner_shi_tomasi 3.9265535849845037 0.5227456839638762 7.51140316493908
corner_harris 3.947809635021258 0.4742132850224152 8.32496633837378
canny 4.350087594008073 0.5505950530059636 7.9007022861154494
local_maxima 4.9551595960510895 0.632653490989469 7.8323437183618925
local_minima 5.456921778037213 0.9954854000243358 5.481669322225933
convex_hull_object 85.83817112696124 3.5882046470069326 23.922317585359117
denoise_nl_means 86.74374555196846 73.57202370197047 1.1790316643097207

@emmanuelle
Copy link
Member Author

The script for running this benchmark is https://github.com/emmanuelle/skimage-sprint/blob/master/benchmark_dask_simplified.py (it uses a pickled list of suitable functions that should be downloaded as well). Unfortunately the code only runs with this branch.

@emmanuelle
Copy link
Member Author

As requested on gitted by @stsievert, here is the benchmarking result using joblib (on my laptop, 4 logical cores).

And the script is https://github.com/emmanuelle/skimage-sprint/blob/master/benchmark_joblib.py

function execution time apply_parallel_joblib time ratio
rgb2grey 2.8601003577932715e-05 0.10664410899335053 0.0002681911251161191
img_as_ubyte 7.368299702648073e-05 0.108649541005434 0.000678171268324047
img_as_bool 0.0019251750054536387 0.10630788900016341 0.018109427471094636
rgb2gray 0.0038023719971533865 0.12389117199927568 0.030691226306064943
invert 0.004735723996418528 0.10783193500537891 0.04391763902022438
img_as_float32 0.010636519997206051 0.1585353539994685 0.06709241647917669
img_as_uint 0.011760600995330606 0.1110726760016405 0.10588203524651649
lab2lch 0.03869172299891943 0.18857225999818183 0.2051824748735179
img_as_int 0.040607943999930285 0.11057052000251133 0.36725832526615576
img_as_float64 0.04104137499962235 0.18343081200146116 0.22374308084780992
lch2lab 0.0425024090000079 0.18337382900062948 0.23178012495917288
img_as_float 0.07575539399840636 0.1854717599999276 0.40844705414148186
remove_small_objects 0.10303790099715116 0.10320194999803789 0.9984104079342507
binary_erosion 0.11981670200475492 0.10554432299977634 1.1352264015659925
adjust_gamma 0.12384047999512404 0.10864690800372045 1.1398435746637474
rescale_intensity 0.1470158470037859 0.12612305199581897 1.1656540551259378
binary_dilation 0.1540738500043517 0.10375860599742737 1.4849259829895158
roberts_pos_diag 0.15496917699783808 0.17878389399993466 0.866796071674637
dilation 0.16781969700241461 0.10365044400532497 1.619092890656532
roberts_neg_diag 0.16798673599987524 0.15470642999571282 1.0858419782844866
erosion 0.17278665799676673 0.10387283199816011 1.6634441814374261
scharr_v 0.18410904899792513 0.16962111899920274 1.0854134796669423
sobel_v 0.18546193800284527 0.16278687799785985 1.1392929226475093
prewitt_h 0.18548044199997094 0.1768914299973403 1.048555274852828
sobel_h 0.18585085999802686 0.16294151300098747 1.140598590102085
prewitt_v 0.18666317899624119 0.15230017399881035 1.225626826911565
scharr_h 0.18714152799657313 0.1718479129995103 1.0889950580726948
laplace 0.1956078540024464 0.15218282499699853 1.2853477651390972
clear_border 0.21665382399805821 0.23025971699826187 0.9409106674082017
adjust_log 0.21742394600005355 0.11339736499940045 1.917363300295409
remove_small_holes 0.2337169870006619 0.10295950799627462 2.269989353573042
binary_closing 0.23902167000051122 0.20475373000226682 1.1673617374289837
binary_opening 0.25504286700015655 0.10237522500392515 2.4912557407358857
black_tophat 0.29312533699703636 0.20291492700198432 1.4445725670747223
white_tophat 0.30555116600589827 0.20274250699731056 1.5070898083052287
closing 0.3138430100007099 0.2054968619995634 1.5272399147456408
find_boundaries 0.32082344100490445 0.2030536870006472 1.5799931818223123
opening 0.3228844190016389 0.20575693499995396 1.569251694975487
corner_fast 0.38077447900286643 0.2508769920023042 1.5177736147257745
roberts 0.389638331005699 0.26081353100016713 1.4939344960804557
gaussian 0.39156161999562755 0.2626253260023077 1.4909514857382296
adjust_sigmoid 0.41952994799794396 0.20869075199880172 2.0102948692252203
sobel 0.44896193299791776 0.30320661299629137 1.4807128662573372
prewitt 0.47154405099718133 0.3077437710016966 1.532261886121431
scharr 0.4736660779963131 0.29567932699865196 1.601958726044018
unsharp_mask 0.47418316500261426 0.2528168700009701 1.8755993814842924
hessian_matrix_det 0.7174537889950443 0.4268443010005285 1.6808325361573844
corner_moravec 0.7200076110020746 0.4544497099996079 1.58435046861994
convex_hull_image 0.865778074003174 1.1355058249973808 0.7624602665558065
corner_kitchen_rosenfeld 1.2061014720020466 0.4913526410018676 2.4546555189828774
threshold_sauvola 1.4609586049991776 0.710760037 2.0554878284457874
threshold_niblack 1.4650582980029867 0.6516576360008912 2.2482024564214314
shape_index 1.5980594939974253 0.7237486700032605 2.2080309922921533
corner_harris 1.9453036669947323 0.7338825779952458 2.650701522726888
corner_shi_tomasi 1.9516044560004957 0.736441533001198 2.6500467023460517
canny 2.166983764000179 0.9125663439990603 2.374604080295131
local_maxima 2.7366018219981925 1.3204747209965717 2.0724378728983615
local_minima 3.0712037720004446 1.5983767919969978 1.9214516798403396
denoise_nl_means 59.657378112002334 53.57151638199866 1.1136025660839548
convex_hull_object 60.07834142999491 16.60287198900187 3.618551144030515

@stsievert
Copy link
Contributor

Here's some graphs:

dask
joblib

Note I deleted the median function, and the functions are not in the same order for the two plots.

@mrocklin
Copy link

mrocklin commented Jun 1, 2018

Thanks for doing this @stsievert ! If you have time can I suggest that we plot both values on the same plot, perhaps ordering the labels on the x axis by one of the y-values? I suspect that this will make it easier to identify changes.

@stsievert
Copy link
Contributor

@mrocklin here you go:

joblib

I chose to order the times by the joblib execution time because (I presume) we're trying to improve dask. Let me know if you want it the other way.

@jakirkham
Copy link
Contributor

Thanks for the ping, @jni. Definitely interesting. Thanks for the nice profiling data, @emmanuelle. Also that graph is great, @stsievert.

This may have already been answered, but was curious whether this was using the Kubernetes backed Distributed Cluster or a different backend? If it was one of the Distributed Cluster backends, there is a bit of scheduling overhead and transmission overhead, which is probably not needed for this apply_parallel use case (i.e. working on NumPy Arrays in memory).

As a general note, we may consider picking a specific scheduler to use in apply_parallel based on the data type provided. This is useful in the event the user has a different Distributed scheduler already running (e.g. Cluster based) that would be suboptimal for the given workload. Specifically when a NumPy array is provided, we likely want a single machine, single process scheduler temporarily.

@stsievert
Copy link
Contributor

Here's a better visualization of the same chart shows the scaling the short time scaling better:

joblib-log

I think a log y-axis is the right way to visualize this graph of ratios, because logs respect division. They equally show large and small values because log(x) == -1 * log(1/x)

@jni
Copy link
Member

jni commented Jun 2, 2018

@stsievert I generally agree that log scale is the right way to show ratios, but not in this specific case, because it is not symmetric: speedup is bounded by the number of processors (4), while slowdown can be infinite, and we are much more interested in the now-squashed 1-4 range than in the 0-1 range.

@jni
Copy link
Member

jni commented Jun 2, 2018

@jakirkham this uses stock standard apply_parallel (not your New and Improved John's Apply Parallel ™️), which means the threaded backend.

@stsievert
Copy link
Contributor

@jni good point. Physical constraints matter, especially if we care about values close to them. Either way, I'm glad to see both ends of the spectrum (when speedup is close to physical limit with the linear scale, and when slowdown is large with the log scale).

@jakirkham
Copy link
Contributor

...this uses stock standard apply_parallel (not your New and Improved John's Apply Parallel ™️)...

I like this. 😆

...which means the threaded backend

So these end up being different things actually. What compute does is context dependent. So if a Distributed Client is already running and we don't specify get, it will use the Distributed backend to submit the computation.

@jni
Copy link
Member

jni commented Jun 2, 2018

What compute does is context dependent. So if a Distributed Client is already running and we don't specify get, it will use the Distributed backend to submit the computation.

😠

Base automatically changed from master to main February 18, 2021 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants