ENH: improve discrepancy performance #13576

V0lantis · 2021-02-17T14:33:28Z

Reference issue

See gh-13474.
The new code is written in Cython in a new file _discrepancy.pyx. Since I don't know so much how it works with scipy, should I send a mail to the dev mailing-list to ask where to put my code ?

What does this implement/fix?

This Enhancement improves the performance of scipy.stats.qmc.discrepancy and scipy.stats.qmc.update_discrepancy.
See additional information for performance.
In short, the cython code achieved in average, 4 times performance's improvement compared to the scipy code. Also, this code has similar performance as the library openturns

Additional information

To tests the performance, I run the following script :

import timeit

import numpy as np
import openturns as ot

from discrepancy_scipy import discrepancy_scipy
from discrepancy import discrepancy

np.random.seed(0)
TEST_VALUES = [
    (100, 2),
    (100, 10),
    (1000, 10),
    (1000, 100),
]
COL = [
    "old scipy time", "cython time", "openturns time", "scipy / cython",
    "openturns / cython"
]
print("Beginig")
print("||" + "|".join(f"`{colname}`" for colname in COL) + "|")
print("|" + "---|" * (len(COL) + 1))
for n, d in TEST_VALUES:
    sample = np.random.random_sample((n, d))
    ot_sample = ot.Sample(sample)

    scipy_time = np.array(timeit.repeat(
        "discrepancy_scipy(sample)",
        number=50,
        repeat=10,
        setup="from __main__ import discrepancy_scipy, " "sample",
    ))
    cython_time = np.array(timeit.repeat(
        "discrepancy(sample)",
        number=50,
        repeat=10,
        setup="from __main__ import discrepancy, " "sample",
    ))
    ot_time = np.array(timeit.repeat(
        "ot.SpaceFillingC2().evaluate(ot_sample)",
        number=50,
        repeat=10,
        setup="from __main__ import ot_sample,ot",
    ))

    print(f"|time for `n={n}` & `d={d}`", end="|")
    print(f"{scipy_time.mean():.4f}s", end="|")
    print(f"{cython_time.mean():.4f}s", end="|")
    print(f"{ot_time.mean():.4f}s", end="|")
    print(f"{scipy_time.mean() / cython_time.mean():.4f}", end="|")
    print(f"{ot_time.mean() / cython_time.mean():.4f}", end="|")
    print()

The results are as follow :

	`old scipy time`	`cython time`	`openturns time`	`scipy / cython`	`openturns / cython`
time for `n=100` & `d=2`	0.0117	0.0027	0.0024	4.2846	0.8891
time for `n=100` & `d=10`	0.0446	0.0094	0.0099	4.7471	1.0515
time for `n=1000` & `d=10`	3.5753	0.9172	1.0067	3.8981	1.0976
time for `n=1000` & `d=100`	36.3270	9.0041	9.7423	4.0345	1.0820

Also, I have written some benchmarks :

Unfortunately, due to some building issues, I haven't been able to compare with the previous version for this particular bencharmk. I will write an issue following this PR to get some more informations.

tupui · 2021-02-17T14:47:27Z

Thank you for this PR!

Some quick notes: do not include the c files as these are cython generated and machine dependent. To answer your question, the path of the cython file is correct.

I expected a small speed up for C2 as this was already vectorized in a proper way. Good that you compared to OT as it's C++ on their side 👍. I hope for better speed up for the rest as L2-star is particularly slow right now.

V0lantis · 2021-02-17T15:00:54Z

Some quick notes: do not include the c files as these are cython generated and machine dependent. To answer your question, the path of the cython file is correct.

Thank you! Indeed I was wondering what to do with the c file. I removed it and force pushed my change since there is now review yet on my code

I expected a small speed up for C2 as this was already vectorized in a proper way. Good that you compared to OT as it's C++ on their side 👍. I hope for better speed up for the rest as L2-star is particularly slow right now.

I am going to test L2-Star right away to give more insights on the performance then.
EDIT :
With `methid="L2-star":

	`old scipy time`	`cython time`	`scipy / cython`
time for `n=100` & `d=2`	0.0189s	0.0016s	11.5044
time for `n=100` & `d=10`	0.0319s	0.0046s	6.9691
time for `n=1000` & `d=10`	4.4023s	0.3395s	12.9677
time for `n=1000` & `d=100`	64.5284s	6.0636s	10.6418

To reproduce :

import timeit

import numpy as np

from discrepancy_scipy import discrepancy_scipy
from discrepancy import discrepancy

np.random.seed(0)
TEST_VALUES = [
    (100, 2),
    (100, 10),
    (1000, 10),
    (1000, 100),
]
COL = [
    "old scipy time", "cython time", "scipy / cython",
]
print("Beginig")
print("||" + "|".join(f"`{colname}`" for colname in COL) + "|")
print("|" + "---|" * (len(COL) + 1))
for n, d in TEST_VALUES:
    sample = np.random.random_sample((n, d))
    ot_sample = ot.Sample(sample)

    scipy_time = np.array(timeit.repeat(
        "discrepancy_scipy(sample, method='L2-star')",
        number=50,
        repeat=10,
        setup="from __main__ import discrepancy_scipy, " "sample",
    ))
    cython_time = np.array(timeit.repeat(
        "discrepancy(sample, method='L2-star')",
        number=50,
        repeat=10,
        setup="from __main__ import discrepancy, " "sample",
    ))

    print(f"|time for `n={n}` & `d={d}`", end="|")
    print(f"{scipy_time.mean():.4f}s", end="|")
    print(f"{cython_time.mean():.4f}s", end="|")
    print(f"{scipy_time.mean() / cython_time.mean():.4f}", end="|")
    print()

I removed Openturns, since I don't know which functions to use for comparing

jb-leger · 2021-02-17T15:10:17Z

scipy/stats/_discrepancy.pyx

+    for i in range(n):
+        for j in range(n):
+            for k in range(d):
+                    prod *= (
+                            1 + 0.5 * fabs(sample_view[i, k] - 0.5)
+                            + 0.5 * fabs(sample_view[j, k] - 0.5)
+                            - 0.5 * fabs(sample_view[i, k] - sample_view[j, k])
+                    )
+            disc2 += prod
+            prod = 1


As this loop is quite long, did you try with prange from cython.parallel? In this case, do not forget the openmp flag in compilation.

note openmp is not used in scipy, see this thread. However testing it can be a usefull information if parallelism can improve the performances.

Thank you @jb-leger. I have some issue with fopenm, but cython seems to compile nonetheless with orange. The result are quite unbelievable for the centered discrepancy (I only tested this one since openmp is not used in scipy anyway). Here are the result :

old scipy time cython time scipy / cython

time for n=100 & d=2 0.0158s 0.0022s 7.1422

time for n=100 & d=10 0.0589s 0.0018s 33.4766

time for n=1000 & d=10 3.8109s 0.0065s 584.6393

EDIT:
But the test are not passing and the function is returning nan value.

Ok, this can be parallelized, with a 10000×100 sample (the same in the two test), without paralellism:

In [4]: %time stats.qmc.discrepancy(sample) CPU times: user 18.6 s, sys: 0 ns, total: 18.6 s Wall time: 18.6 s Out[4]: 489346.18589523673

With parallelism:

In [13]: %time _discrepancy.discrepancy(sample) CPU times: user 33.1 s, sys: 44.7 ms, total: 33.2 s Wall time: 4.25 s Out[13]: 489346.1858951062

I will have a look how to do that directly with pthread. @V0lantis, I will send you results (and code) of my tests.

Only for your information, when you use prange, have a look on the c code. You will see a #pragma omp parallel, you must check a reduction is recognized with the operator sum. And you must also check the initialization of private (thread dependant) variable, in your particular case the prod must be initialized before.

isn't it already the case? (line 158 or line 149)

jb-leger · 2021-02-17T15:16:03Z

scipy/stats/_discrepancy.pyx

+        raise ValueError('{} is not a valid method. Options are '
+                     'CD, WD, MD, L2-star.'.format(method))


Maybe you should use {!r} to display method as it is (with quote when it is a string).

Thanks, add it in b1474c3

jb-leger · 2021-02-17T23:51:20Z

For the context, I work with @V0lantis. (Some kind of light supervision).

For testing, I implement a similar computation than the discrepancy (it is not the discrepancy), and I tried to parallelize.

test1.pyx : the reference
test2.pyx : the paralellized version with OpenMP. This is not allowed in scipy.
test3.pyx : the paralellized version with pthreads. This is quick code (and dirty). The number of threads is in argument.

Results:

Python 3.9.1 (default, Dec  8 2020, 07:51:42)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import numpy as np
   ...: import test1
   ...: import test2
   ...: import test3
   ...: M = np.random.uniform(size=(10000,100))

In [2]: test1.f(M)
Out[2]: 65268094.34825731

In [3]: test2.f(M)
Out[3]: 65268094.350631215

In [4]: test3.f(M,0)
Out[4]: 65268094.34825731

In [5]: test3.f(M,4)
Out[5]: 65268094.35037216

In [6]: test3.f(M,8)
Out[6]: 65268094.350631215

In [7]: %timeit test1.f(M)
8.86 s ± 92.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [8]: %timeit test2.f(M)
2.09 s ± 347 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [9]: %timeit test3.f(M,0)
7.94 s ± 110 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [10]: %timeit test3.f(M,4)
2.59 s ± 231 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [11]: %timeit test3.f(M,8)
1.79 s ± 197 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Then (questions are for scipy contributors and maintainers):

Do the usage of pthread (or similar) allowed in cython code for scipy?
If allowed, what is the best choice? The pthread C version (as I did in my test) or the thread lib in the std lib in C++11?
If allowed, do we have to produce alternative version if pthread not available in compilation? (seems a check like that in scipy/fft/_pocketfft/setup.py)
If allowed, do the number of thread should be provided as a argument for the user? I think yes, because when I use multiprocessing I want to disable threading.

P.S.: It was the first time I mix cython and pthread, it was fun.

benchmarks/benchmarks/stats.py

scipy/stats/_discrepancy.pyx

tupui · 2021-02-18T09:09:30Z

scipy/stats/_discrepancy.pyx

+    )
+
+
+def update_discrepancy(x_new, sample, double initial_disc):


Same here, I would put this back in _qmc.

Since I am calling cdef inside update_discrepancy and discrepancy function, I cannot put this function in _qmc. Take a look to this link. Apparently, cdef is much faster than coded. That's why I am letting this here. Maybe some reviews from more experienced cython developer would be useful here.

Sure cdef is faster than cpdef as with cpdef you basically have 2 versions, a python wrapper and the c one. So there is an overhead to do handle the logic. But as the function is complex, what really matters in the end is not really the way we call it as most of the CPU is going to be used inside the function itself.

You can leave this like that for now, and we can see in the end what other think.

tupui · 2021-02-18T09:15:38Z

A quick note about comparing results with OpenTURNS. You can do it, just be aware that there is a power of 2 difference between our results. For centered discrepancy, we return c2 and they return c.

V0lantis · 2021-02-18T13:34:52Z

For the context, I work with @V0lantis. (Some kind of light supervision).

For testing, I implement a similar computation than the discrepancy (it is not the discrepancy), and I tried to parallelize.

test1.pyx : the reference

test2.pyx : the paralellized version with OpenMP. This is not allowed in scipy.

test3.pyx : the paralellized version with pthreads. This is quick code (and dirty). The number of threads is in argument.

Results:
Python 3.9.1 (default, Dec  8 2020, 07:51:42)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import numpy as np
   ...: import test1
   ...: import test2
   ...: import test3
   ...: M = np.random.uniform(size=(10000,100))

In [2]: test1.f(M)
Out[2]: 65268094.34825731

In [3]: test2.f(M)
Out[3]: 65268094.350631215

In [4]: test3.f(M,0)
Out[4]: 65268094.34825731

In [5]: test3.f(M,4)
Out[5]: 65268094.35037216

In [6]: test3.f(M,8)
Out[6]: 65268094.350631215

In [7]: %timeit test1.f(M)
8.86 s ± 92.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [8]: %timeit test2.f(M)
2.09 s ± 347 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [9]: %timeit test3.f(M,0)
7.94 s ± 110 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [10]: %timeit test3.f(M,4)
2.59 s ± 231 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [11]: %timeit test3.f(M,8)
1.79 s ± 197 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Then (questions are for scipy contributors and maintainers):

Do the usage of pthread (or similar) allowed in cython code for scipy?

If allowed, what is the best choice? The pthread C version (as I did in my test) or the thread lib in the std lib in C++11?

If allowed, do we have to produce alternative version if pthread not available in compilation? (seems a check like that in scipy/fft/_pocketfft/setup.py)

If allowed, do the number of thread should be provided as a argument for the user? I think yes, because when I use multiprocessing I want to disable threading.

P.S.: It was the first time I mix cython and pthread, it was fun.

@rgommers @mdhaber Do you have some opinions on this?

jb-leger · 2021-02-18T15:03:04Z

To complete my previous post, I tried the 4th solution, use of thread in C++. This is very clean comparing to pthread, no pointers, memory views can be passed, it is better than pthread (if C++ with cython allowed). The file for testing is test4.pyx.

The results sumarized:

Implementation	nthreads	time (s)
`test1.pyx`		`8.28`
`test2.pyx` (openmp)	8	`2.18`
`test3.pyx` (pthread)	0 (no threads)	`7.60`
`test3.pyx` (pthread)	4	`2.21`
`test3.pyx` (pthread)	8	`1.42`
`test4.pyx` (C++ thread)	0 (no threads)	`8.28`
`test4.pyx` (C++ thread)	4	`2.54`
`test4.pyx` (C++ thread)	8	`1.58`

Therefore, C++ thread and pthread are equivalent, but code with C++ thread is
more maintainable than one with pthread.

tupui · 2021-02-18T20:20:02Z

FYI, since #8306 there is an experiment in SciPy with Pythran.

tupui · 2021-02-19T09:28:29Z

scipy/stats/_discrepancy.pyx

+    if not (np.all(x_new >= 0) and np.all(x_new <= 1)):
+        raise ValueError('x_new is not in unit hypercube')
+
+    if x_new.shape[0] != sample.shape[1]:


Good addition 👍

rgommers · 2021-02-20T21:32:46Z

To complete my previous post, I tried the 4th solution, use of thread in C++. This is very clean comparing to pthread, no pointers, memory views can be passed, it is better than pthread (if C++ with cython allowed). The file for testing is test4.pyx.

Yes, compiling Cython in C++ mode should be fine I believe, although we have no examples of that in SciPy yet AFAIK. We do have C++ with Cython bindings - see for example cKDTree which is threaded C++ code. test4.pyx looks quite clean.

Do consider if you can fit in the workers= keyword (see spatial.cKDTree or optimize.differential_evolution for examples). If you use auto-parallelization, then nested parallelism can lead to oversubscription. So we make it user-controllable everywhere, just like scikit-learn does with n_jobs. The only exception is usage of BLAS functions; both MKL and OpenBLAS use OpenMP, the threading behavior of which is a little annoying to control.

rgommers · 2021-02-20T21:37:36Z

FYI, since #8306 there is an experiment in SciPy with Pythran.

That'd be nice to see as well indeed, although there may be an issue with parallelism. For now we use Pythran as an optional dependency (enabled via export SCIPY_USE_PYTHRAN=1). And IIRC Pythran can use OpenMP automatically with a build flag, but given that it's a pure Python to C++ transpiler, there's no good way to use C++ threading like done here. And we don't allow OpenMP in our official binaries, so the parallel Pythran would only be useful for people building from source. (Cc @serge-sans-paille)

rgommers · 2021-02-20T21:38:14Z

This is a cool PR by the way, thanks @V0lantis and @jb-leger.

V0lantis · 2021-02-21T18:00:53Z

Following a test with Pythran, here are the results, thanks to @jb-leger :

	`old scipy time`	`cython time`	`pythran`	`scipy/cython`	`scipy/pythran`
time for `n=100` & `d=2`	0.0145s	0.0015s	0.0020s	9.6105	7.2988
time for `n=100` & `d=10`	0.0382s	0.0036s	0.0045s	10.7045	8.4927
time for `n=1000` & `d=10`	4.7000s	0.2964s	0.4655s	15.8585	10.0961
time for `n=1000` & `d=20`	8.9429s	0.6584s	0.8420s	13.5831	10.6208

For the memory, scipy's version consume 300MiB, Pythran's version (700KiB) and cython' version is the best one : (30 kiB).
I am going to implement the threaded version in C++ now, and let the user choose how many workers he wants, as written by @rgommers

rgommers · 2021-02-21T18:57:46Z

Could you share the code @V0lantis? I'm wondering what the difference is between plain cython or pythran, and the scipy/ versions.

For the memory, scipy's version consume 300MiB, Pythran's version (700KiB) and cython' version is the best one : (30 kiB).

There's probably a typo in there (MiB vs. KiB). Pythran also shouldn't use 25x more memory than Cython I'd think.

jb-leger · 2021-02-21T19:15:13Z

For the memory, no typo.

In [12]: tracemalloc.start()
    ...: b = discrepancy_scipy(sample, method='L2-star')
    ...: size = tracemalloc.get_traced_memory()[1]
    ...: tracemalloc.stop()
    ...: print(b, size)
    ...: del b
4.066986839987133e-05 320001619

In [13]: tracemalloc.start()
    ...: b = discrepancy(sample, method='L2-star')
    ...: size = tracemalloc.get_traced_memory()[1]
    ...: tracemalloc.stop()
    ...: print(b, size)
    ...: del b
4.0669868399871326e-05 29461

In [15]: tracemalloc.start()
    ...: b = discrepancy_pythran(sample, False, 'L2-star')
    ...: size = tracemalloc.get_traced_memory()[1]
    ...: tracemalloc.stop()
    ...: print(b, size)
    ...: del b
4.591988087619747e-05 649837

For the code, this is only a adaptation of the scipy code (except, the initialization of prod_arr, which is made by np.ones because pythran does'nt allow type change. The code is here. Maybe @serge-sans-paille could propose some changes.

It is perfectly understundable for the memory:

python version is highly vectorized and manipulates n×n matrices. Therefore a large amount of memory is needed.
pythran version manipulates the same matrices, but (I think) due do lazy computation, this matrices are not evaluated in the same time in memory,
cython version is not vectorized, and does'nt manipulate theses matrices. Only n×d matrix is used.

Edit: Memory usages are given for n,d = 1000,20

serge-sans-paille · 2021-02-22T15:10:10Z

I can confirm Pythran's speed is not satisfying on that one, I'll investigate that performance bug, sounds interesting!

This has a decent impact on performance of scipy/scipy#13576

serge-sans-paille · 2021-02-23T09:21:44Z

@jb-leger can you check the pythran version performance from that pythran branch? serge-sans-paille/pythran#1727

jb-leger · 2021-02-23T15:30:38Z

@serge-sans-paille, quite the same results:

	`old scipy time`	`cython time`	`pythran`	`pythran1727`	`scipy/cython`	`scipy/pythran`	`scipy/pythran1727`
time for `n=100` & `d=2`	0.0177s	0.0021s	0.0024s	0.0025s	8.5983	7.2864	7.0078
time for `n=100` & `d=10`	0.0514s	0.0040s	0.0056s	0.0056s	12.9942	9.1316	9.1726
time for `n=1000` & `d=10`	5.3696s	0.3088s	0.5759s	0.5918s	17.3913	9.3232	9.0727
time for `n=1000` & `d=20`	10.1400s	0.7628s	1.0439s	1.0481s	13.2929	9.7134	9.6745

serge-sans-paille · 2021-02-23T16:42:04Z

@jb-leger: how strange, I get significant speedup between the two revision, here is my benchmark:

python -m timeit -s 'import numpy as np; M = np.random.uniform(size=(10000,100)); from discrepancy_pythran import discrepancy_scipy' 'discrepancy_scipy(M, False, "CD")'

jb-leger · 2021-02-23T17:23:08Z

We (@serge-sans-paille and I) was not benchmarking the same method. I rewrite the benchmark, for considering all the methots. Note: now times are given for on function call. Here is the script. And the results are following.

There is a significant improvement for method="CD" and for method="MD", no improvement for method="WD" and for method="L2-star".

method='CD'

	`old scipy time`	`cython time`	`pythran`	`pythran1727`	`scipy/cython`	`scipy/pythran`	`scipy/pythran1727`
time for `n=100` & `d=2`	128.5µs	52.5µs	84.7µs	61.5µs	2.4465	1.5183	2.0904
time for `n=100` & `d=10`	530.3µs	196.3µs	350.1µs	257.6µs	2.7022	1.5148	2.0583
time for `n=1000` & `d=10`	73.0ms	13.0ms	24.8ms	19.4ms	5.6355	2.9410	3.7692
time for `n=1000` & `d=20`	141.2ms	29.8ms	48.0ms	38.5ms	4.7362	2.9393	3.6691

method='WD'

	`old scipy time`	`cython time`	`pythran`	`pythran1727`	`scipy/cython`	`scipy/pythran`	`scipy/pythran1727`
time for `n=100` & `d=2`	65.4µs	32.0µs	39.0µs	37.1µs	2.0403	1.6744	1.7639
time for `n=100` & `d=10`	292.5µs	92.7µs	147.6µs	135.4µs	3.1570	1.9818	2.1603
time for `n=1000` & `d=10`	69.4ms	8.2ms	23.1ms	23.2ms	8.4492	3.0057	2.9838
time for `n=1000` & `d=20`	151.0ms	20.5ms	41.6ms	41.4ms	7.3774	3.6280	3.6461

method='MD'

	`old scipy time`	`cython time`	`pythran`	`pythran1727`	`scipy/cython`	`scipy/pythran`	`scipy/pythran1727`
time for `n=100` & `d=2`	150.0µs	46.5µs	85.5µs	60.5µs	3.2255	1.7546	2.4794
time for `n=100` & `d=10`	657.3µs	176.0µs	376.8µs	250.8µs	3.7354	1.7445	2.6203
time for `n=1000` & `d=10`	103.3ms	17.1ms	29.7ms	25.9ms	6.0380	3.4790	3.9823
time for `n=1000` & `d=20`	211.5ms	34.2ms	58.7ms	48.8ms	6.1881	3.6031	4.3303

method='L2-star'

	`old scipy time`	`cython time`	`pythran`	`pythran1727`	`scipy/cython`	`scipy/pythran`	`scipy/pythran1727`
time for `n=100` & `d=2`	284.9µs	29.6µs	38.4µs	39.6µs	9.6365	7.4095	7.1937
time for `n=100` & `d=10`	753.0µs	70.5µs	88.2µs	87.7µs	10.6809	8.5407	8.5839
time for `n=1000` & `d=10`	90.9ms	5.8ms	9.1ms	9.0ms	15.6317	10.0278	10.0509
time for `n=1000` & `d=20`	175.5ms	13.0ms	17.0ms	16.1ms	13.4838	10.3211	10.8885

jb-leger · 2021-03-23T21:39:19Z

@V0lantis, I did a PR to your branch for adding a stub. This is quick and dirty. Feel free to close this PR. It is only a draft of a stub (but this should do the job).

tupui · 2021-03-23T21:39:30Z

I am not sure why it is not complaining for sobol.pyx though. For me it's similar.

jb-leger · 2021-03-23T21:41:23Z

@tupui, for _sobol, you have this in mypy.ini:

[mypy-scipy.stats._sobol]
ignore_missing_imports = True

This is not (I think) a very good thing. (But it is private code, therefore, we can also do that).

tupui · 2021-03-24T20:25:34Z

@tupui, for _sobol, you have this in mypy.ini:
[mypy-scipy.stats._sobol]

ignore_missing_imports = True
This is not (I think) a very good thing. (But it is private code, therefore, we can also do that).

Indeed thanks I forgot we did that! I agree that it should not be untyped and skipped.

Actually just noticed this was done in this issue where all mypy issues gh-13613 got fixed. So it was not intended as definitive but more to start working on the problem.

V0lantis · 2021-03-24T21:49:49Z

@V0lantis, I did a PR to your branch for adding a stub. This is quick and dirty. Feel free to close this PR. It is only a draft of a stub (but this should do the job).

Thank you @jb-leger ! I merged your branch into this one, but it still doesn't resolve the Mypy error. I pushed it anyway to gather more help, see if some others can see where it comes from. I tried to debug it, but it didn't work. I never used Mypy before so maybe someone more experimented with it could help me with this :-)

jb-leger · 2021-03-25T06:57:40Z

I had a look on #11739 (which add library stub for a pyx module). And it seems we need to declare the pyi files in the setup.py.

@V0lantis, I submitted a new PR to you branch for that.

add *.pyi stubs in scipy.stats

V0lantis · 2021-03-25T20:59:21Z

I had a look on #11739 (which add library stub for a pyx module). And it seems we need to declare the pyi files in the setup.py.

@V0lantis, I submitted a new PR to you branch for that.

Thank you @jb-leger, it works !

 python -u runtests.py --mypy
Building, see build.log...
Build OK (0:00:11.045731 elapsed)
Success: no issues found in 678 source files

tupui

Really close to be ready IMO. I just have minor pep8 suggestions and 2 missing tests for input validation. Also there is one comment missing about the tread logic in _qmc_cy.pyx.

scipy/stats/_qmc_cy.pyi

scipy/stats/_qmc.py

Co-authored-by: Pamphile ROY <roy.pamphile@gmail.com>

tupui

Should be ready provided the PEP8 suggestions and the comment about threads.

Note: please use the batch mode if you accept multiple suggestions to have a single commit for this.

scipy/stats/setup.py

scipy/stats/_qmc.py

scipy/stats/_qmc_cy.pyx

Mostly PEP8 modifications Co-authored-by: Pamphile ROY <roy.pamphile@gmail.com>

tupui

LGTM, thanks @V0lantis @jb-leger. @rgommers as you followed, do you have any more comments? If we would merge #13631 before, it might be possible to have inline type hints here as well.

I propose to merge this on Friday if there are no further discussions and we say we keep the type hints separately.

rgommers · 2021-04-05T18:58:39Z

Nice to have this merged, thanks @V0lantis, @jb-leger and everyone else!

If we would merge #13631 before, it might be possible to have inline type hints here as well.

Sphinx version update perhaps - the autodoc_typehints functionality is pretty recent.

tupui · 2021-04-07T15:38:06Z

@V0lantis are you planning to have a look a this or shall I?

V0lantis · 2021-04-07T16:36:20Z

Thank you for asking @tupui, I'll try to have a look tonight :-)

tupui added Cython Issues with the internal Cython code base enhancement A new feature or improvement scipy.stats labels Feb 17, 2021

V0lantis force-pushed the enh_improve_discrepancy_perf branch 2 times, most recently from 441d711 to 63a7e57 Compare February 17, 2021 14:59

jb-leger reviewed Feb 17, 2021

View reviewed changes

tupui mentioned this pull request Feb 18, 2021

Performance of scipy.stats.qmc #13474

Open

8 tasks

tupui reviewed Feb 18, 2021

View reviewed changes

tupui reviewed Feb 19, 2021

View reviewed changes

serge-sans-paille added a commit to serge-sans-paille/pythran that referenced this pull request Feb 23, 2021

Improve expression engine to avoid a few extra shared ref incr/decr

99be6a9

This has a decent impact on performance of scipy/scipy#13576

serge-sans-paille mentioned this pull request Feb 23, 2021

Improve expression engine to avoid a few extra shared ref incr/decr serge-sans-paille/pythran#1727

Merged

add _qmc_cy library stub for typing

a4dbc20

Merge branch 'master' into enh_improve_discrepancy_perf

4cc453c

add *.pyi stubs in scipy.stats

033d657

Merge pull request #5 from jb-leger/13576_update_typing

24dc84f

add *.pyi stubs in scipy.stats

tupui requested changes Mar 26, 2021

View reviewed changes

V0lantis and others added 7 commits March 27, 2021 15:24

Update scipy/stats/_qmc_cy.pyi

04ce6be

Co-authored-by: Pamphile ROY <roy.pamphile@gmail.com>

Update scipy/stats/_qmc_cy.pyi

e9c9da8

Co-authored-by: Pamphile ROY <roy.pamphile@gmail.com>

Update scipy/stats/_qmc_cy.pyi

3275ef3

Co-authored-by: Pamphile ROY <roy.pamphile@gmail.com>

MAINT: Add type and rows ordering to update_disc

a3703d9

Update scipy/stats/_qmc.py

9a94694

Co-authored-by: Pamphile ROY <roy.pamphile@gmail.com>

Add coercing to int for workers param

f650fd7

Add tests for wrong value of workers

a6c3cbd

tupui requested changes Mar 30, 2021

View reviewed changes

Apply suggestions from code review

ef45b5f

Mostly PEP8 modifications Co-authored-by: Pamphile ROY <roy.pamphile@gmail.com>

tupui approved these changes Mar 31, 2021

View reviewed changes

tupui merged commit 327dc6f into scipy:master Apr 3, 2021

tylerjereddy added this to the 1.7.0 milestone Apr 3, 2021

V0lantis deleted the enh_improve_discrepancy_perf branch April 10, 2021 09:26

V0lantis mentioned this pull request Apr 10, 2021

MAINT: Add inline type hintings for stats.qmc #13833

Merged

treverhines mentioned this pull request Aug 1, 2021

ENH: Cythonize van der corput #14449

Merged

	`old scipy time`	`cython time`	`scipy / cython`
time for `n=100` & `d=2`	0.0158s	0.0022s	7.1422
time for `n=100` & `d=10`	0.0589s	0.0018s	33.4766
time for `n=1000` & `d=10`	3.8109s	0.0065s	584.6393

		raise ValueError('{} is not a valid method. Options are '
		'CD, WD, MD, L2-star.'.format(method))

		)


		def update_discrepancy(x_new, sample, double initial_disc):

ENH: improve discrepancy performance #13576

ENH: improve discrepancy performance #13576

Conversation

V0lantis commented Feb 17, 2021 • edited by rgommers Loading

Reference issue

What does this implement/fix?

Additional information

tupui commented Feb 17, 2021 • edited Loading

V0lantis commented Feb 17, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

V0lantis Feb 17, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jb-leger commented Feb 17, 2021 • edited Loading

Choose a reason for hiding this comment

V0lantis Feb 18, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tupui commented Feb 18, 2021

V0lantis commented Feb 18, 2021

jb-leger commented Feb 18, 2021 • edited Loading

tupui commented Feb 18, 2021

Choose a reason for hiding this comment

rgommers commented Feb 20, 2021

rgommers commented Feb 20, 2021

rgommers commented Feb 20, 2021

V0lantis commented Feb 21, 2021 • edited Loading

rgommers commented Feb 21, 2021

jb-leger commented Feb 21, 2021 • edited Loading

serge-sans-paille commented Feb 22, 2021

serge-sans-paille commented Feb 23, 2021

jb-leger commented Feb 23, 2021

serge-sans-paille commented Feb 23, 2021

jb-leger commented Feb 23, 2021 • edited Loading

method='CD'

method='WD'

method='MD'

method='L2-star'

jb-leger commented Mar 23, 2021

tupui commented Mar 23, 2021

jb-leger commented Mar 23, 2021

tupui commented Mar 24, 2021 • edited Loading

V0lantis commented Mar 24, 2021

jb-leger commented Mar 25, 2021

V0lantis commented Mar 25, 2021

tupui left a comment

Choose a reason for hiding this comment

tupui left a comment

Choose a reason for hiding this comment

tupui left a comment • edited Loading

Choose a reason for hiding this comment

rgommers commented Apr 5, 2021

tupui commented Apr 7, 2021

V0lantis commented Apr 7, 2021

V0lantis commented Feb 17, 2021 •

edited by rgommers

Loading

tupui commented Feb 17, 2021 •

edited

Loading

V0lantis commented Feb 17, 2021 •

edited

Loading

V0lantis Feb 17, 2021 •

edited

Loading

jb-leger commented Feb 17, 2021 •

edited

Loading

V0lantis Feb 18, 2021 •

edited

Loading

jb-leger commented Feb 18, 2021 •

edited

Loading

V0lantis commented Feb 21, 2021 •

edited

Loading

jb-leger commented Feb 21, 2021 •

edited

Loading

jb-leger commented Feb 23, 2021 •

edited

Loading

tupui commented Mar 24, 2021 •

edited

Loading

tupui left a comment •

edited

Loading