Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: improve discrepancy performance #13576

Merged
merged 29 commits into from
Apr 3, 2021

Conversation

V0lantis
Copy link
Contributor

@V0lantis V0lantis commented Feb 17, 2021

  • Did you check that the code can be distributed under a BSD license?
  • Are there unit tests with good code coverage?
  • Do all unit tests pass locally?
  • Do all public function have docstrings including examples?
  • Does the documentation render correctly? Not sure because I have some building issue. EDIT: yes It renders correctly
  • Is the code style correct? I don't know if there is something for cython code's format.
  • Are there benchmarks? There wasn't but I added some for the discrepancy. I didn't add any for
    update_discrepancy though.
  • Is the commit message? I think so
  • Is the docstring of the new functionality tagged with
    .. versionadded:: X.Y.Z. Not sure what it means
  • If compiled code is added, is it integrated correctly via setup.py? I think so since I am able to run
    the tests are passing

Reference issue

See gh-13474.
The new code is written in Cython in a new file _discrepancy.pyx. Since I don't know so much how it works with scipy, should I send a mail to the dev mailing-list to ask where to put my code ?

What does this implement/fix?

This Enhancement improves the performance of scipy.stats.qmc.discrepancy and scipy.stats.qmc.update_discrepancy.
See additional information for performance.
In short, the cython code achieved in average, 4 times performance's improvement compared to the scipy code. Also, this code has similar performance as the library openturns

Additional information

To tests the performance, I run the following script :

import timeit

import numpy as np
import openturns as ot

from discrepancy_scipy import discrepancy_scipy
from discrepancy import discrepancy

np.random.seed(0)
TEST_VALUES = [
    (100, 2),
    (100, 10),
    (1000, 10),
    (1000, 100),
]
COL = [
    "old scipy time", "cython time", "openturns time", "scipy / cython",
    "openturns / cython"
]
print("Beginig")
print("||" + "|".join(f"`{colname}`" for colname in COL) + "|")
print("|" + "---|" * (len(COL) + 1))
for n, d in TEST_VALUES:
    sample = np.random.random_sample((n, d))
    ot_sample = ot.Sample(sample)

    scipy_time = np.array(timeit.repeat(
        "discrepancy_scipy(sample)",
        number=50,
        repeat=10,
        setup="from __main__ import discrepancy_scipy, " "sample",
    ))
    cython_time = np.array(timeit.repeat(
        "discrepancy(sample)",
        number=50,
        repeat=10,
        setup="from __main__ import discrepancy, " "sample",
    ))
    ot_time = np.array(timeit.repeat(
        "ot.SpaceFillingC2().evaluate(ot_sample)",
        number=50,
        repeat=10,
        setup="from __main__ import ot_sample,ot",
    ))

    print(f"|time for `n={n}` & `d={d}`", end="|")
    print(f"{scipy_time.mean():.4f}s", end="|")
    print(f"{cython_time.mean():.4f}s", end="|")
    print(f"{ot_time.mean():.4f}s", end="|")
    print(f"{scipy_time.mean() / cython_time.mean():.4f}", end="|")
    print(f"{ot_time.mean() / cython_time.mean():.4f}", end="|")
    print()

The results are as follow :

old scipy time cython time openturns time scipy / cython openturns / cython
time for n=100 & d=2 0.0117 0.0027 0.0024 4.2846 0.8891
time for n=100 & d=10 0.0446 0.0094 0.0099 4.7471 1.0515
time for n=1000 & d=10 3.5753 0.9172 1.0067 3.8981 1.0976
time for n=1000 & d=100 36.3270 9.0041 9.7423 4.0345 1.0820

Also, I have written some benchmarks :
Capture d’écran 2021-02-17 à 15 04 52
Unfortunately, due to some building issues, I haven't been able to compare with the previous version for this particular bencharmk. I will write an issue following this PR to get some more informations.

@tupui tupui added Cython Issues with the internal Cython code base enhancement A new feature or improvement scipy.stats labels Feb 17, 2021
@tupui
Copy link
Member

tupui commented Feb 17, 2021

Thank you for this PR!

Some quick notes: do not include the c files as these are cython generated and machine dependent. To answer your question, the path of the cython file is correct.

I expected a small speed up for C2 as this was already vectorized in a proper way. Good that you compared to OT as it's C++ on their side 👍. I hope for better speed up for the rest as L2-star is particularly slow right now.

@V0lantis V0lantis force-pushed the enh_improve_discrepancy_perf branch 2 times, most recently from 441d711 to 63a7e57 Compare February 17, 2021 14:59
@V0lantis
Copy link
Contributor Author

V0lantis commented Feb 17, 2021

Some quick notes: do not include the c files as these are cython generated and machine dependent. To answer your question, the path of the cython file is correct.

Thank you! Indeed I was wondering what to do with the c file. I removed it and force pushed my change since there is now review yet on my code

I expected a small speed up for C2 as this was already vectorized in a proper way. Good that you compared to OT as it's C++ on their side 👍. I hope for better speed up for the rest as L2-star is particularly slow right now.

I am going to test L2-Star right away to give more insights on the performance then.
EDIT :
With `methid="L2-star":

old scipy time cython time scipy / cython
time for n=100 & d=2 0.0189s 0.0016s 11.5044
time for n=100 & d=10 0.0319s 0.0046s 6.9691
time for n=1000 & d=10 4.4023s 0.3395s 12.9677
time for n=1000 & d=100 64.5284s 6.0636s 10.6418

To reproduce :

import timeit

import numpy as np

from discrepancy_scipy import discrepancy_scipy
from discrepancy import discrepancy

np.random.seed(0)
TEST_VALUES = [
    (100, 2),
    (100, 10),
    (1000, 10),
    (1000, 100),
]
COL = [
    "old scipy time", "cython time", "scipy / cython",
]
print("Beginig")
print("||" + "|".join(f"`{colname}`" for colname in COL) + "|")
print("|" + "---|" * (len(COL) + 1))
for n, d in TEST_VALUES:
    sample = np.random.random_sample((n, d))
    ot_sample = ot.Sample(sample)

    scipy_time = np.array(timeit.repeat(
        "discrepancy_scipy(sample, method='L2-star')",
        number=50,
        repeat=10,
        setup="from __main__ import discrepancy_scipy, " "sample",
    ))
    cython_time = np.array(timeit.repeat(
        "discrepancy(sample, method='L2-star')",
        number=50,
        repeat=10,
        setup="from __main__ import discrepancy, " "sample",
    ))

    print(f"|time for `n={n}` & `d={d}`", end="|")
    print(f"{scipy_time.mean():.4f}s", end="|")
    print(f"{cython_time.mean():.4f}s", end="|")
    print(f"{scipy_time.mean() / cython_time.mean():.4f}", end="|")
    print()

I removed Openturns, since I don't know which functions to use for comparing

Comment on lines 163 to 172
for i in range(n):
for j in range(n):
for k in range(d):
prod *= (
1 + 0.5 * fabs(sample_view[i, k] - 0.5)
+ 0.5 * fabs(sample_view[j, k] - 0.5)
- 0.5 * fabs(sample_view[i, k] - sample_view[j, k])
)
disc2 += prod
prod = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this loop is quite long, did you try with prange from cython.parallel? In this case, do not forget the openmp flag in compilation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note openmp is not used in scipy, see this thread. However testing it can be a usefull information if parallelism can improve the performances.

Copy link
Contributor Author

@V0lantis V0lantis Feb 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jb-leger. I have some issue with fopenm, but cython seems to compile nonetheless with orange. The result are quite unbelievable for the centered discrepancy (I only tested this one since openmp is not used in scipy anyway). Here are the result :

old scipy time cython time scipy / cython
time for n=100 & d=2 0.0158s 0.0022s 7.1422
time for n=100 & d=10 0.0589s 0.0018s 33.4766
time for n=1000 & d=10 3.8109s 0.0065s 584.6393

EDIT:
But the test are not passing and the function is returning nan value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this can be parallelized, with a 10000×100 sample (the same in the two test), without paralellism:

In [4]: %time stats.qmc.discrepancy(sample)
CPU times: user 18.6 s, sys: 0 ns, total: 18.6 s
Wall time: 18.6 s
Out[4]: 489346.18589523673

With parallelism:

In [13]: %time _discrepancy.discrepancy(sample)
CPU times: user 33.1 s, sys: 44.7 ms, total: 33.2 s
Wall time: 4.25 s
Out[13]: 489346.1858951062

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have a look how to do that directly with pthread. @V0lantis, I will send you results (and code) of my tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only for your information, when you use prange, have a look on the c code. You will see a #pragma omp parallel, you must check a reduction is recognized with the operator sum. And you must also check the initialization of private (thread dependant) variable, in your particular case the prod must be initialized before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it already the case? (line 158 or line 149)

Comment on lines 139 to 177
raise ValueError('{} is not a valid method. Options are '
'CD, WD, MD, L2-star.'.format(method))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you should use {!r} to display method as it is (with quote when it is a string).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, add it in b1474c3

@jb-leger
Copy link
Contributor

jb-leger commented Feb 17, 2021

For the context, I work with @V0lantis. (Some kind of light supervision).

For testing, I implement a similar computation than the discrepancy (it is not the discrepancy), and I tried to parallelize.

  • test1.pyx : the reference
  • test2.pyx : the paralellized version with OpenMP. This is not allowed in scipy.
  • test3.pyx : the paralellized version with pthreads. This is quick code (and dirty). The number of threads is in argument.

Results:

Python 3.9.1 (default, Dec  8 2020, 07:51:42)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import numpy as np
   ...: import test1
   ...: import test2
   ...: import test3
   ...: M = np.random.uniform(size=(10000,100))

In [2]: test1.f(M)
Out[2]: 65268094.34825731

In [3]: test2.f(M)
Out[3]: 65268094.350631215

In [4]: test3.f(M,0)
Out[4]: 65268094.34825731

In [5]: test3.f(M,4)
Out[5]: 65268094.35037216

In [6]: test3.f(M,8)
Out[6]: 65268094.350631215

In [7]: %timeit test1.f(M)
8.86 s ± 92.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [8]: %timeit test2.f(M)
2.09 s ± 347 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [9]: %timeit test3.f(M,0)
7.94 s ± 110 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [10]: %timeit test3.f(M,4)
2.59 s ± 231 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [11]: %timeit test3.f(M,8)
1.79 s ± 197 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Then (questions are for scipy contributors and maintainers):

  • Do the usage of pthread (or similar) allowed in cython code for scipy?
  • If allowed, what is the best choice? The pthread C version (as I did in my test) or the thread lib in the std lib in C++11?
  • If allowed, do we have to produce alternative version if pthread not available in compilation? (seems a check like that in scipy/fft/_pocketfft/setup.py)
  • If allowed, do the number of thread should be provided as a argument for the user? I think yes, because when I use multiprocessing I want to disable threading.

P.S.: It was the first time I mix cython and pthread, it was fun.

@tupui tupui mentioned this pull request Feb 18, 2021
8 tasks
benchmarks/benchmarks/stats.py Outdated Show resolved Hide resolved
benchmarks/benchmarks/stats.py Outdated Show resolved Hide resolved
scipy/stats/_discrepancy.pyx Outdated Show resolved Hide resolved
scipy/stats/_discrepancy.pyx Outdated Show resolved Hide resolved
scipy/stats/_discrepancy.pyx Outdated Show resolved Hide resolved
)


def update_discrepancy(x_new, sample, double initial_disc):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I would put this back in _qmc.

Copy link
Contributor Author

@V0lantis V0lantis Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I am calling cdef inside update_discrepancy and discrepancy function, I cannot put this function in _qmc. Take a look to this link. Apparently, cdef is much faster than coded. That's why I am letting this here. Maybe some reviews from more experienced cython developer would be useful here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure cdef is faster than cpdef as with cpdef you basically have 2 versions, a python wrapper and the c one. So there is an overhead to do handle the logic. But as the function is complex, what really matters in the end is not really the way we call it as most of the CPU is going to be used inside the function itself.

You can leave this like that for now, and we can see in the end what other think.

@tupui
Copy link
Member

tupui commented Feb 18, 2021

A quick note about comparing results with OpenTURNS. You can do it, just be aware that there is a power of 2 difference between our results. For centered discrepancy, we return c2 and they return c.

@V0lantis
Copy link
Contributor Author

For the context, I work with @V0lantis. (Some kind of light supervision).

For testing, I implement a similar computation than the discrepancy (it is not the discrepancy), and I tried to parallelize.

  • test1.pyx : the reference
  • test2.pyx : the paralellized version with OpenMP. This is not allowed in scipy.
  • test3.pyx : the paralellized version with pthreads. This is quick code (and dirty). The number of threads is in argument.

Results:

Python 3.9.1 (default, Dec  8 2020, 07:51:42)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import numpy as np
   ...: import test1
   ...: import test2
   ...: import test3
   ...: M = np.random.uniform(size=(10000,100))

In [2]: test1.f(M)
Out[2]: 65268094.34825731

In [3]: test2.f(M)
Out[3]: 65268094.350631215

In [4]: test3.f(M,0)
Out[4]: 65268094.34825731

In [5]: test3.f(M,4)
Out[5]: 65268094.35037216

In [6]: test3.f(M,8)
Out[6]: 65268094.350631215

In [7]: %timeit test1.f(M)
8.86 s ± 92.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [8]: %timeit test2.f(M)
2.09 s ± 347 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [9]: %timeit test3.f(M,0)
7.94 s ± 110 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [10]: %timeit test3.f(M,4)
2.59 s ± 231 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [11]: %timeit test3.f(M,8)
1.79 s ± 197 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Then (questions are for scipy contributors and maintainers):

  • Do the usage of pthread (or similar) allowed in cython code for scipy?
  • If allowed, what is the best choice? The pthread C version (as I did in my test) or the thread lib in the std lib in C++11?
  • If allowed, do we have to produce alternative version if pthread not available in compilation? (seems a check like that in scipy/fft/_pocketfft/setup.py)
  • If allowed, do the number of thread should be provided as a argument for the user? I think yes, because when I use multiprocessing I want to disable threading.

P.S.: It was the first time I mix cython and pthread, it was fun.

@rgommers @mdhaber Do you have some opinions on this?

@jb-leger
Copy link
Contributor

jb-leger commented Feb 18, 2021

To complete my previous post, I tried the 4th solution, use of thread in C++. This is very clean comparing to pthread, no pointers, memory views can be passed, it is better than pthread (if C++ with cython allowed). The file for testing is test4.pyx.

The results sumarized:

Implementation nthreads time (s)
test1.pyx 8.28
test2.pyx (openmp) 8 2.18
test3.pyx (pthread) 0 (no threads) 7.60
test3.pyx (pthread) 4 2.21
test3.pyx (pthread) 8 1.42
test4.pyx (C++ thread) 0 (no threads) 8.28
test4.pyx (C++ thread) 4 2.54
test4.pyx (C++ thread) 8 1.58

Therefore, C++ thread and pthread are equivalent, but code with C++ thread is
more maintainable than one with pthread.

@tupui
Copy link
Member

tupui commented Feb 18, 2021

FYI, since #8306 there is an experiment in SciPy with Pythran.

if not (np.all(x_new >= 0) and np.all(x_new <= 1)):
raise ValueError('x_new is not in unit hypercube')

if x_new.shape[0] != sample.shape[1]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good addition 👍

@rgommers
Copy link
Member

To complete my previous post, I tried the 4th solution, use of thread in C++. This is very clean comparing to pthread, no pointers, memory views can be passed, it is better than pthread (if C++ with cython allowed). The file for testing is test4.pyx.

Yes, compiling Cython in C++ mode should be fine I believe, although we have no examples of that in SciPy yet AFAIK. We do have C++ with Cython bindings - see for example cKDTree which is threaded C++ code. test4.pyx looks quite clean.

Do consider if you can fit in the workers= keyword (see spatial.cKDTree or optimize.differential_evolution for examples). If you use auto-parallelization, then nested parallelism can lead to oversubscription. So we make it user-controllable everywhere, just like scikit-learn does with n_jobs. The only exception is usage of BLAS functions; both MKL and OpenBLAS use OpenMP, the threading behavior of which is a little annoying to control.

@rgommers
Copy link
Member

FYI, since #8306 there is an experiment in SciPy with Pythran.

That'd be nice to see as well indeed, although there may be an issue with parallelism. For now we use Pythran as an optional dependency (enabled via export SCIPY_USE_PYTHRAN=1). And IIRC Pythran can use OpenMP automatically with a build flag, but given that it's a pure Python to C++ transpiler, there's no good way to use C++ threading like done here. And we don't allow OpenMP in our official binaries, so the parallel Pythran would only be useful for people building from source. (Cc @serge-sans-paille)

@rgommers
Copy link
Member

This is a cool PR by the way, thanks @V0lantis and @jb-leger.

@V0lantis
Copy link
Contributor Author

V0lantis commented Feb 21, 2021

Following a test with Pythran, here are the results, thanks to @jb-leger :

old scipy time cython time pythran scipy/cython scipy/pythran
time for n=100 & d=2 0.0145s 0.0015s 0.0020s 9.6105 7.2988
time for n=100 & d=10 0.0382s 0.0036s 0.0045s 10.7045 8.4927
time for n=1000 & d=10 4.7000s 0.2964s 0.4655s 15.8585 10.0961
time for n=1000 & d=20 8.9429s 0.6584s 0.8420s 13.5831 10.6208

For the memory, scipy's version consume 300MiB, Pythran's version (700KiB) and cython' version is the best one : (30 kiB).
I am going to implement the threaded version in C++ now, and let the user choose how many workers he wants, as written by @rgommers

@rgommers
Copy link
Member

Could you share the code @V0lantis? I'm wondering what the difference is between plain cython or pythran, and the scipy/ versions.

For the memory, scipy's version consume 300MiB, Pythran's version (700KiB) and cython' version is the best one : (30 kiB).

There's probably a typo in there (MiB vs. KiB). Pythran also shouldn't use 25x more memory than Cython I'd think.

@jb-leger
Copy link
Contributor

jb-leger commented Feb 21, 2021

For the memory, no typo.

In [12]: tracemalloc.start()
    ...: b = discrepancy_scipy(sample, method='L2-star')
    ...: size = tracemalloc.get_traced_memory()[1]
    ...: tracemalloc.stop()
    ...: print(b, size)
    ...: del b
4.066986839987133e-05 320001619

In [13]: tracemalloc.start()
    ...: b = discrepancy(sample, method='L2-star')
    ...: size = tracemalloc.get_traced_memory()[1]
    ...: tracemalloc.stop()
    ...: print(b, size)
    ...: del b
4.0669868399871326e-05 29461

In [15]: tracemalloc.start()
    ...: b = discrepancy_pythran(sample, False, 'L2-star')
    ...: size = tracemalloc.get_traced_memory()[1]
    ...: tracemalloc.stop()
    ...: print(b, size)
    ...: del b
4.591988087619747e-05 649837

For the code, this is only a adaptation of the scipy code (except, the initialization of prod_arr, which is made by np.ones because pythran does'nt allow type change. The code is here. Maybe @serge-sans-paille could propose some changes.

It is perfectly understundable for the memory:

  • python version is highly vectorized and manipulates n×n matrices. Therefore a large amount of memory is needed.
  • pythran version manipulates the same matrices, but (I think) due do lazy computation, this matrices are not evaluated in the same time in memory,
  • cython version is not vectorized, and does'nt manipulate theses matrices. Only n×d matrix is used.

Edit: Memory usages are given for n,d = 1000,20

@serge-sans-paille
Copy link
Contributor

I can confirm Pythran's speed is not satisfying on that one, I'll investigate that performance bug, sounds interesting!

@serge-sans-paille
Copy link
Contributor

@jb-leger can you check the pythran version performance from that pythran branch? serge-sans-paille/pythran#1727

@jb-leger
Copy link
Contributor

@serge-sans-paille, quite the same results:

old scipy time cython time pythran pythran1727 scipy/cython scipy/pythran scipy/pythran1727
time for n=100 & d=2 0.0177s 0.0021s 0.0024s 0.0025s 8.5983 7.2864 7.0078
time for n=100 & d=10 0.0514s 0.0040s 0.0056s 0.0056s 12.9942 9.1316 9.1726
time for n=1000 & d=10 5.3696s 0.3088s 0.5759s 0.5918s 17.3913 9.3232 9.0727
time for n=1000 & d=20 10.1400s 0.7628s 1.0439s 1.0481s 13.2929 9.7134 9.6745

@serge-sans-paille
Copy link
Contributor

@jb-leger: how strange, I get significant speedup between the two revision, here is my benchmark:

python -m timeit -s 'import numpy as np; M = np.random.uniform(size=(10000,100)); from discrepancy_pythran import discrepancy_scipy' 'discrepancy_scipy(M, False, "CD")'

@jb-leger
Copy link
Contributor

jb-leger commented Feb 23, 2021

We (@serge-sans-paille and I) was not benchmarking the same method. I rewrite the benchmark, for considering all the methots. Note: now times are given for on function call. Here is the script. And the results are following.

There is a significant improvement for method="CD" and for method="MD", no improvement for method="WD" and for method="L2-star".

method='CD'

old scipy time cython time pythran pythran1727 scipy/cython scipy/pythran scipy/pythran1727
time for n=100 & d=2 128.5µs 52.5µs 84.7µs 61.5µs 2.4465 1.5183 2.0904
time for n=100 & d=10 530.3µs 196.3µs 350.1µs 257.6µs 2.7022 1.5148 2.0583
time for n=1000 & d=10 73.0ms 13.0ms 24.8ms 19.4ms 5.6355 2.9410 3.7692
time for n=1000 & d=20 141.2ms 29.8ms 48.0ms 38.5ms 4.7362 2.9393 3.6691

method='WD'

old scipy time cython time pythran pythran1727 scipy/cython scipy/pythran scipy/pythran1727
time for n=100 & d=2 65.4µs 32.0µs 39.0µs 37.1µs 2.0403 1.6744 1.7639
time for n=100 & d=10 292.5µs 92.7µs 147.6µs 135.4µs 3.1570 1.9818 2.1603
time for n=1000 & d=10 69.4ms 8.2ms 23.1ms 23.2ms 8.4492 3.0057 2.9838
time for n=1000 & d=20 151.0ms 20.5ms 41.6ms 41.4ms 7.3774 3.6280 3.6461

method='MD'

old scipy time cython time pythran pythran1727 scipy/cython scipy/pythran scipy/pythran1727
time for n=100 & d=2 150.0µs 46.5µs 85.5µs 60.5µs 3.2255 1.7546 2.4794
time for n=100 & d=10 657.3µs 176.0µs 376.8µs 250.8µs 3.7354 1.7445 2.6203
time for n=1000 & d=10 103.3ms 17.1ms 29.7ms 25.9ms 6.0380 3.4790 3.9823
time for n=1000 & d=20 211.5ms 34.2ms 58.7ms 48.8ms 6.1881 3.6031 4.3303

method='L2-star'

old scipy time cython time pythran pythran1727 scipy/cython scipy/pythran scipy/pythran1727
time for n=100 & d=2 284.9µs 29.6µs 38.4µs 39.6µs 9.6365 7.4095 7.1937
time for n=100 & d=10 753.0µs 70.5µs 88.2µs 87.7µs 10.6809 8.5407 8.5839
time for n=1000 & d=10 90.9ms 5.8ms 9.1ms 9.0ms 15.6317 10.0278 10.0509
time for n=1000 & d=20 175.5ms 13.0ms 17.0ms 16.1ms 13.4838 10.3211 10.8885

@jb-leger
Copy link
Contributor

@V0lantis, I did a PR to your branch for adding a stub. This is quick and dirty. Feel free to close this PR. It is only a draft of a stub (but this should do the job).

@tupui
Copy link
Member

tupui commented Mar 23, 2021

I am not sure why it is not complaining for sobol.pyx though. For me it's similar.

@jb-leger
Copy link
Contributor

@tupui, for _sobol, you have this in mypy.ini:

[mypy-scipy.stats._sobol]
ignore_missing_imports = True

This is not (I think) a very good thing. (But it is private code, therefore, we can also do that).

@tupui
Copy link
Member

tupui commented Mar 24, 2021

@tupui, for _sobol, you have this in mypy.ini:


[mypy-scipy.stats._sobol]

ignore_missing_imports = True

This is not (I think) a very good thing. (But it is private code, therefore, we can also do that).

Indeed thanks I forgot we did that! I agree that it should not be untyped and skipped.

Actually just noticed this was done in this issue where all mypy issues gh-13613 got fixed. So it was not intended as definitive but more to start working on the problem.

@V0lantis
Copy link
Contributor Author

@V0lantis, I did a PR to your branch for adding a stub. This is quick and dirty. Feel free to close this PR. It is only a draft of a stub (but this should do the job).

Thank you @jb-leger ! I merged your branch into this one, but it still doesn't resolve the Mypy error. I pushed it anyway to gather more help, see if some others can see where it comes from. I tried to debug it, but it didn't work. I never used Mypy before so maybe someone more experimented with it could help me with this :-)

@jb-leger
Copy link
Contributor

I had a look on #11739 (which add library stub for a pyx module). And it seems we need to declare the pyi files in the setup.py.

@V0lantis, I submitted a new PR to you branch for that.

@V0lantis
Copy link
Contributor Author

I had a look on #11739 (which add library stub for a pyx module). And it seems we need to declare the pyi files in the setup.py.

@V0lantis, I submitted a new PR to you branch for that.

Thank you @jb-leger, it works !

 python -u runtests.py --mypy
Building, see build.log...
Build OK (0:00:11.045731 elapsed)
Success: no issues found in 678 source files

Copy link
Member

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really close to be ready IMO. I just have minor pep8 suggestions and 2 missing tests for input validation. Also there is one comment missing about the tread logic in _qmc_cy.pyx.

scipy/stats/_qmc_cy.pyi Show resolved Hide resolved
scipy/stats/_qmc_cy.pyi Show resolved Hide resolved
scipy/stats/_qmc_cy.pyi Show resolved Hide resolved
scipy/stats/_qmc.py Outdated Show resolved Hide resolved
scipy/stats/_qmc.py Show resolved Hide resolved
scipy/stats/_qmc.py Show resolved Hide resolved
scipy/stats/_qmc.py Outdated Show resolved Hide resolved
V0lantis and others added 7 commits March 27, 2021 15:24
Co-authored-by: Pamphile ROY <roy.pamphile@gmail.com>
Co-authored-by: Pamphile ROY <roy.pamphile@gmail.com>
Co-authored-by: Pamphile ROY <roy.pamphile@gmail.com>
Co-authored-by: Pamphile ROY <roy.pamphile@gmail.com>
Copy link
Member

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be ready provided the PEP8 suggestions and the comment about threads.

Note: please use the batch mode if you accept multiple suggestions to have a single commit for this.

scipy/stats/setup.py Outdated Show resolved Hide resolved
scipy/stats/_qmc.py Outdated Show resolved Hide resolved
scipy/stats/_qmc.py Outdated Show resolved Hide resolved
scipy/stats/_qmc_cy.pyx Outdated Show resolved Hide resolved
scipy/stats/_qmc_cy.pyx Outdated Show resolved Hide resolved
scipy/stats/_qmc_cy.pyx Outdated Show resolved Hide resolved
scipy/stats/_qmc_cy.pyx Outdated Show resolved Hide resolved
scipy/stats/_qmc_cy.pyx Outdated Show resolved Hide resolved
scipy/stats/_qmc_cy.pyx Outdated Show resolved Hide resolved
scipy/stats/_qmc_cy.pyx Show resolved Hide resolved
Mostly PEP8 modifications

Co-authored-by: Pamphile ROY <roy.pamphile@gmail.com>
Copy link
Member

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @V0lantis @jb-leger. @rgommers as you followed, do you have any more comments? If we would merge #13631 before, it might be possible to have inline type hints here as well.

I propose to merge this on Friday if there are no further discussions and we say we keep the type hints separately.

@tupui tupui merged commit 327dc6f into scipy:master Apr 3, 2021
@tylerjereddy tylerjereddy added this to the 1.7.0 milestone Apr 3, 2021
@rgommers
Copy link
Member

rgommers commented Apr 5, 2021

Nice to have this merged, thanks @V0lantis, @jb-leger and everyone else!

If we would merge #13631 before, it might be possible to have inline type hints here as well.

Sphinx version update perhaps - the autodoc_typehints functionality is pretty recent.

@tupui
Copy link
Member

tupui commented Apr 7, 2021

@V0lantis are you planning to have a look a this or shall I?

@V0lantis
Copy link
Contributor Author

V0lantis commented Apr 7, 2021

Thank you for asking @tupui, I'll try to have a look tonight :-)

@V0lantis V0lantis deleted the enh_improve_discrepancy_perf branch April 10, 2021 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cython Issues with the internal Cython code base enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants