-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedup _inpaint_biharmonic_single_channel #3489
Conversation
… accordingly) in a parallel manner
Hello @GalAvineri! Thanks for submitting the PR.
|
@GalAvineri thanks for your PR! At the moment it seems that tests are not passing. Can you execute |
# Create biharmonic coefficients ndarray | ||
neigh_coef = np.zeros(b_hi - b_lo) | ||
neigh_coef[tuple(mask_pt_idx - b_lo)] = 1 | ||
neigh_coef = laplace(laplace(neigh_coef)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the call to laplace is the only one that is parallelizable
. The others are all python
and thus will get held up in the GIL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems that each masked point has some variables that are unique to it.
These contain b_lo, b_hi, neight_cief, and the variables it_inner, coefs, tmp_pt_idx and tmp_pt_is.
All of these varibales aren't shared between the processes and thus computing them for each point can be parralilzed in my opinion, as the GIL will not intervene here. Do you agree with me on this point?
I do agree that the GIL might intervene with the modifications of the matrix_unknown and matrix_known variables.
So regarding this, I can suggest the following:
Each process will produce a list of changes that should be made to these matrices, and the main thread will make all of them.
This way the computation of which modification should be made will be done in parallel, and the modification themselves will be done serially. What do you say?
@GalAvineri, @emmanuelle how much overhead does |
@emmanuelle I will happily do so once i find some spare time! :) |
Hi Gal, please disregard my comment about the GIL, I thought I read threads, but this Is probably starting up different putthon interpreters each having their own GIL. I would start by defining a benchmark for your particular use case. It could be a simple one using %timeit but we can work on integrating it with ASV once it is more fleshed out. You might find that you get very different speed ups on different workloads. |
The two codepaths would be: 1 starts the pool, the other just calls the function directly without staring a pool of 1 |
I'm still not entirely sure how you would have these 2 codes paths together, as mentioned here:
In case you are asking if the overhead of creating a pool is beneficial, I would say it depends on the amount of time that could be parallelized later. Did I answer your question? If not, can you try to elaborate more? :) |
I agree, although you might be correct about the modifications of the known and unknown matrices.
I'd love to do so once I find time for it :) |
@GalAvineri I'm always pretty excited about performance PRs. I'm curious to see what will come from this experiment of yours! Looking forward to see your modifications when you have time to make them. You asked for thoughts on your implementation. As it stands, even when
Here is a simple benchmark showing how costly In [1]: from multiprocessing import Pool
In [2]: %timeit -n 10 -r 10 Pool(1)
5.61 ms ± 1.67 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)
In [3]: %timeit -n 10 -r 10 pass
78.5 ns ± 41.5 ns per loop (mean ± std. dev. of 10 runs, 10 loops each) This benchmark indicates that starting a Pool of 1 worker is basically taking on the order of 5ms. That isn't really negligible compared to the noop of 100 ns. This is the kind of preliminery benchmark that shows us the usecase you are considering. Maybe more importantly than proving the performance speedup is proving the correctness as @emmanuelle pointed out. |
Thank you for the great elaboration! |
Hi @GalAvineri , As already mentioned by the other dev team members, it would be very natural (and also very convenient to the reviewers) to have PRs such as yours supplemented with the benchmarking info. A set of numbers for comparison (running time on small, middle, large image / small, middle, large inpainting domain) would set a very solid ground for a constructive discussion. Optionally, you may consider adding an ASV (AirSpeed Velocity) file, which implements an interface for automatic benchmarking of the function in our infrastructure. For multiprocessing implementation, you may consider other PR, e.g. #3233. Now for the PR itself. I think (as mentioned in #2008 (comment)), that the suggested change is, indeed, one of the ways of accelerating the function. From a brief profiling, it seems that a great amount of time comes from the
🤣 What kind of science you are doing? |
Superseded by #5240. |
Description
This is a speedup for the inpaint method, specifically for the '_inpaint_biharmonic_single_channel' method, in the part where the masked points are being iterated over.
I've added a bit of speedup in 2 steps, each in a separate commit:
Checklist
I'm new to contributing to open source projects so before I start working on the checklist I would like to receive your thoughts about it :)
Btw, as i'm new to contributing i'd love to hear your advice on my attempt to contribute and i'd appreciate your help in completing this feature enhancement :)
References
This is an enhancement regarding this issue #2008
For reviewers
(Don't remove the checklist below.)
later.
__init__.py
.doc/release/release_dev.rst
.@meeseeksdev backport to v0.14.x