-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: improve discrepancy performance #13576
Conversation
Thank you for this PR! Some quick notes: do not include the c files as these are cython generated and machine dependent. To answer your question, the path of the cython file is correct. I expected a small speed up for C2 as this was already vectorized in a proper way. Good that you compared to OT as it's C++ on their side 👍. I hope for better speed up for the rest as L2-star is particularly slow right now. |
441d711
to
63a7e57
Compare
Thank you! Indeed I was wondering what to do with the c file. I removed it and force pushed my change since there is now review yet on my code
I am going to test L2-Star right away to give more insights on the performance then.
To reproduce : import timeit
import numpy as np
from discrepancy_scipy import discrepancy_scipy
from discrepancy import discrepancy
np.random.seed(0)
TEST_VALUES = [
(100, 2),
(100, 10),
(1000, 10),
(1000, 100),
]
COL = [
"old scipy time", "cython time", "scipy / cython",
]
print("Beginig")
print("||" + "|".join(f"`{colname}`" for colname in COL) + "|")
print("|" + "---|" * (len(COL) + 1))
for n, d in TEST_VALUES:
sample = np.random.random_sample((n, d))
ot_sample = ot.Sample(sample)
scipy_time = np.array(timeit.repeat(
"discrepancy_scipy(sample, method='L2-star')",
number=50,
repeat=10,
setup="from __main__ import discrepancy_scipy, " "sample",
))
cython_time = np.array(timeit.repeat(
"discrepancy(sample, method='L2-star')",
number=50,
repeat=10,
setup="from __main__ import discrepancy, " "sample",
))
print(f"|time for `n={n}` & `d={d}`", end="|")
print(f"{scipy_time.mean():.4f}s", end="|")
print(f"{cython_time.mean():.4f}s", end="|")
print(f"{scipy_time.mean() / cython_time.mean():.4f}", end="|")
print() I removed Openturns, since I don't know which functions to use for comparing |
scipy/stats/_discrepancy.pyx
Outdated
for i in range(n): | ||
for j in range(n): | ||
for k in range(d): | ||
prod *= ( | ||
1 + 0.5 * fabs(sample_view[i, k] - 0.5) | ||
+ 0.5 * fabs(sample_view[j, k] - 0.5) | ||
- 0.5 * fabs(sample_view[i, k] - sample_view[j, k]) | ||
) | ||
disc2 += prod | ||
prod = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this loop is quite long, did you try with prange
from cython.parallel
? In this case, do not forget the openmp
flag in compilation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note openmp is not used in scipy, see this thread. However testing it can be a usefull information if parallelism can improve the performances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jb-leger. I have some issue with fopenm, but cython seems to compile nonetheless with orange
. The result are quite unbelievable for the centered discrepancy (I only tested this one since openmp is not used in scipy anyway). Here are the result :
old scipy time |
cython time |
scipy / cython |
|
---|---|---|---|
time for n=100 & d=2 |
0.0158s | 0.0022s | 7.1422 |
time for n=100 & d=10 |
0.0589s | 0.0018s | 33.4766 |
time for n=1000 & d=10 |
3.8109s | 0.0065s | 584.6393 |
EDIT:
But the test are not passing and the function is returning nan value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this can be parallelized, with a 10000×100
sample (the same in the two test), without paralellism:
In [4]: %time stats.qmc.discrepancy(sample)
CPU times: user 18.6 s, sys: 0 ns, total: 18.6 s
Wall time: 18.6 s
Out[4]: 489346.18589523673
With parallelism:
In [13]: %time _discrepancy.discrepancy(sample)
CPU times: user 33.1 s, sys: 44.7 ms, total: 33.2 s
Wall time: 4.25 s
Out[13]: 489346.1858951062
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will have a look how to do that directly with pthread
. @V0lantis, I will send you results (and code) of my tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only for your information, when you use prange, have a look on the c code. You will see a #pragma omp parallel
, you must check a reduction is recognized with the operator sum. And you must also check the initialization of private (thread dependant) variable, in your particular case the prod
must be initialized before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't it already the case? (line 158 or line 149)
scipy/stats/_discrepancy.pyx
Outdated
raise ValueError('{} is not a valid method. Options are ' | ||
'CD, WD, MD, L2-star.'.format(method)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you should use {!r}
to display method as it is (with quote when it is a string).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, add it in b1474c3
For the context, I work with @V0lantis. (Some kind of light supervision). For testing, I implement a similar computation than the discrepancy (it is not the discrepancy), and I tried to parallelize.
Results:
Then (questions are for scipy contributors and maintainers):
P.S.: It was the first time I mix cython and pthread, it was fun. |
scipy/stats/_discrepancy.pyx
Outdated
) | ||
|
||
|
||
def update_discrepancy(x_new, sample, double initial_disc): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, I would put this back in _qmc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I am calling cdef
inside update_discrepancy
and discrepancy
function, I cannot put this function in _qmc
. Take a look to this link. Apparently, cdef
is much faster than coded
. That's why I am letting this here. Maybe some reviews from more experienced cython developer would be useful here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure cdef
is faster than cpdef
as with cpdef
you basically have 2 versions, a python wrapper and the c one. So there is an overhead to do handle the logic. But as the function is complex, what really matters in the end is not really the way we call it as most of the CPU is going to be used inside the function itself.
You can leave this like that for now, and we can see in the end what other think.
A quick note about comparing results with OpenTURNS. You can do it, just be aware that there is a power of 2 difference between our results. For centered discrepancy, we return c2 and they return c. |
|
To complete my previous post, I tried the 4th solution, use of The results sumarized:
Therefore, C++ thread and pthread are equivalent, but code with C++ thread is |
FYI, since #8306 there is an experiment in SciPy with Pythran. |
scipy/stats/_discrepancy.pyx
Outdated
if not (np.all(x_new >= 0) and np.all(x_new <= 1)): | ||
raise ValueError('x_new is not in unit hypercube') | ||
|
||
if x_new.shape[0] != sample.shape[1]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good addition 👍
Yes, compiling Cython in C++ mode should be fine I believe, although we have no examples of that in SciPy yet AFAIK. We do have C++ with Cython bindings - see for example Do consider if you can fit in the |
That'd be nice to see as well indeed, although there may be an issue with parallelism. For now we use Pythran as an optional dependency (enabled via |
Following a test with Pythran, here are the results, thanks to @jb-leger :
For the memory, scipy's version consume 300MiB, Pythran's version (700KiB) and cython' version is the best one : (30 kiB). |
Could you share the code @V0lantis? I'm wondering what the difference is between plain cython or pythran, and the
There's probably a typo in there (MiB vs. KiB). Pythran also shouldn't use 25x more memory than Cython I'd think. |
For the memory, no typo.
For the code, this is only a adaptation of the scipy code (except, the initialization of It is perfectly understundable for the memory:
Edit: Memory usages are given for |
I can confirm Pythran's speed is not satisfying on that one, I'll investigate that performance bug, sounds interesting! |
This has a decent impact on performance of scipy/scipy#13576
@jb-leger can you check the pythran version performance from that pythran branch? serge-sans-paille/pythran#1727 |
@serge-sans-paille, quite the same results:
|
@jb-leger: how strange, I get significant speedup between the two revision, here is my benchmark:
|
We (@serge-sans-paille and I) was not benchmarking the same method. I rewrite the benchmark, for considering all the methots. Note: now times are given for on function call. Here is the script. And the results are following. There is a significant improvement for method='CD'
method='WD'
method='MD'
method='L2-star'
|
I am not sure why it is not complaining for |
@tupui, for
This is not (I think) a very good thing. (But it is private code, therefore, we can also do that). |
Indeed thanks I forgot we did that! I agree that it should not be untyped and skipped. Actually just noticed this was done in this issue where all mypy issues gh-13613 got fixed. So it was not intended as definitive but more to start working on the problem. |
Thank you @jb-leger ! I merged your branch into this one, but it still doesn't resolve the Mypy error. I pushed it anyway to gather more help, see if some others can see where it comes from. I tried to debug it, but it didn't work. I never used Mypy before so maybe someone more experimented with it could help me with this :-) |
I had a look on #11739 (which add library stub for a pyx module). And it seems we need to declare the pyi files in the @V0lantis, I submitted a new PR to you branch for that. |
add *.pyi stubs in scipy.stats
Thank you @jb-leger, it works ! python -u runtests.py --mypy
Building, see build.log...
Build OK (0:00:11.045731 elapsed)
Success: no issues found in 678 source files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really close to be ready IMO. I just have minor pep8 suggestions and 2 missing tests for input validation. Also there is one comment missing about the tread logic in _qmc_cy.pyx
.
Co-authored-by: Pamphile ROY <roy.pamphile@gmail.com>
Co-authored-by: Pamphile ROY <roy.pamphile@gmail.com>
Co-authored-by: Pamphile ROY <roy.pamphile@gmail.com>
Co-authored-by: Pamphile ROY <roy.pamphile@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be ready provided the PEP8 suggestions and the comment about threads.
Note: please use the batch mode if you accept multiple suggestions to have a single commit for this.
Mostly PEP8 modifications Co-authored-by: Pamphile ROY <roy.pamphile@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @V0lantis @jb-leger. @rgommers as you followed, do you have any more comments? If we would merge #13631 before, it might be possible to have inline type hints here as well.
I propose to merge this on Friday if there are no further discussions and we say we keep the type hints separately.
@V0lantis are you planning to have a look a this or shall I? |
Thank you for asking @tupui, I'll try to have a look tonight :-) |
update_discrepancy
though... versionadded:: X.Y.Z
. Not sure what it meanssetup.py
? I think so since I am able to runthe tests are passing
Reference issue
See gh-13474.
The new code is written in Cython in a new file
_discrepancy.pyx
. Since I don't know so much how it works with scipy, should I send a mail to the dev mailing-list to ask where to put my code ?What does this implement/fix?
This Enhancement improves the performance of
scipy.stats.qmc.discrepancy
andscipy.stats.qmc.update_discrepancy
.See additional information for performance.
In short, the cython code achieved in average, 4 times performance's improvement compared to the scipy code. Also, this code has similar performance as the library openturns
Additional information
To tests the performance, I run the following script :
The results are as follow :
old scipy time
cython time
openturns time
scipy / cython
openturns / cython
n=100
&d=2
n=100
&d=10
n=1000
&d=10
n=1000
&d=100
Also, I have written some benchmarks :
Unfortunately, due to some building issues, I haven't been able to compare with the previous version for this particular bencharmk. I will write an issue following this PR to get some more informations.