Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: small speedup for stats.gaussian_kde #8558

Merged
merged 2 commits into from
Oct 31, 2018

Conversation

GaelVaroquaux
Copy link
Member

A 20% speedup in gaussian_kde.evaluate when the number of points is large (whether it is in the dataset, or the points on which the KDE is evaluated).

The speedup isn't huge, so I won't be offended if this is not merged :).

To benchmark, I adapted the example in the docstring:

import numpy as np
np.random.seed(42)

from scipy import stats
def measure(n):
    "Measurement model, return two coupled measurements."
    m1 = np.random.normal(size=n)
    m2 = np.random.normal(scale=0.5, size=n)
    return m1+m2, m1-m2


m1, m2 = measure(2000)
xmin = m1.min()
xmax = m1.max()
ymin = m2.min()
ymax = m2.max()


X, Y = np.mgrid[xmin:xmax:200j, ymin:ymax:200j]
positions = np.vstack([X.ravel(), Y.ravel()])
values = np.vstack([m1, m2])
kernel = stats.gaussian_kde(values)

Timing can then be done (in IPython):
%timeit kernel(positions)

With master, I get 2.17s, with this branch 1.73s.

@GaelVaroquaux
Copy link
Member Author

The appveyor failures are all related to convexhull, so I believe that they are unrelated to this change.

@tylerjereddy tylerjereddy added scipy.stats enhancement A new feature or improvement labels Mar 15, 2018
@rgommers
Copy link
Member

Thanks Gael. Can you put the benchmark in https://github.com/scipy/scipy/blob/master/benchmarks/benchmarks/stats.py? Then we will notice any regressions.

@GaelVaroquaux
Copy link
Member Author

I've added a couple of benchmarks, in the small n regime, and in the large n regime.

I must confess that I haven't ran them (I think that the README isn't up to date).

@rgommers
Copy link
Member

Thanks. The bench-compare can be troublesome sometimes, but running the benchmarks for your installed SciPy version should work out of the box:

$ python runtests.py -n --bench stats.GaussianKDE
Running benchmarks for Scipy version 1.1.0.dev0+0893e58 at /home/rgommers/code/scipy/scipy/__init__.py
· virtualenv package not installed
· Discovering benchmarks
· Running 2 total benchmarks (1 commits * 1 environments * 2 benchmarks)
[  0.00%] ·· Building for existing-py_home_rgommers_anaconda3_bin_python
[  0.00%] ·· Benchmarking existing-py_home_rgommers_anaconda3_bin_python
[ 50.00%] ··· Running ...s.GaussianKDE.time_gaussian_kde_evaluate_few_points    10.72ms
[100.00%] ··· Running ....GaussianKDE.time_gaussian_kde_evaluate_many_points     23.45s

@GaelVaroquaux
Copy link
Member Author

GaelVaroquaux commented Mar 16, 2018 via email

@ilayn
Copy link
Member

ilayn commented Mar 16, 2018

It sits at the root of the scipy repository

@pv
Copy link
Member

pv commented Mar 16, 2018 via email

@GaelVaroquaux
Copy link
Member Author

GaelVaroquaux commented Mar 16, 2018 via email

@rgommers
Copy link
Member

Benchmarks look good:

$ python runtests.py --bench-compare master stats
...
-  476.30μs   439.98μs      0.92  stats.GaussianKDE.time_gaussian_kde_evaluate_few_points
-     1.68s      1.40s      0.84  stats.GaussianKDE.time_gaussian_kde_evaluate_many_points

Numerically the old and new implementation are not identical - typical differences are ~ rtol=1e-13. Given that our existing tests need decimal=12 to pass in a number of places, this should not be an issue. So +1 for merging from me.

@rgommers rgommers added this to the 1.1.0 milestone Mar 17, 2018
@josef-pkt
Copy link
Member

I initially thought it's fine as is, but reading again, I think the cholesky should be stored and attached as private attribute. The immediate advantage is that repeated calls don't have to redo the cholesky.

Beyond this PR, there might be some savings in other places like rvs to use the stored cholesky of cov_inv.

(aside: current behavior assumes cov is non-singular, alternative to inv and cholesky would be to use SVD which would extend to (numerically) singular cov. But I never looked at kde with singular cov to have any idea about those use cases..)

@GaelVaroquaux
Copy link
Member Author

OK, @josef-pkt . This entail a bit more work, but if done properly, I agree that it should give additional benefits. I'll try to find time to work on this this week.

@rgommers rgommers removed this from the 1.1.0 milestone Mar 31, 2018
@ev-br
Copy link
Member

ev-br commented Apr 7, 2018

How about we merge it as is, and work on storing the result of Cholesky factorization in a follow-up PR? Otherwise this is in danger of sliding under everyone's radar, I'm afraid.

@rgommers rgommers added this to the 1.2.0 milestone Sep 2, 2018
@rgommers
Copy link
Member

How about we merge it as is, and work on storing the result of Cholesky factorization in a follow-up PR? Otherwise this is in danger of sliding under everyone's radar, I'm afraid.

Sounds good to me. I'll look at doing that before the 1.2.x branching on 5 Nov unless there are more comments.

A 20% speedup in gaussian_kde.evaluate when the number of points is
large (whether it is in the dataset, or the points on which the KDE is
evaluated).
@rgommers
Copy link
Member

Rebased and all green, so merging. Thanks @GaelVaroquaux, all!

@rgommers rgommers merged commit 701b9ac into scipy:master Oct 31, 2018
@GaelVaroquaux
Copy link
Member Author

Thanks @rgommers for making sure that this got in. My apologies for not being able to do more. I've had a tab opened on my browser for months, but it was not sufficient :).

@rgommers
Copy link
Member

We're using the same system:) after 20 tabs it stops working ....

@piotrjurkiewicz
Copy link

piotrjurkiewicz commented Nov 6, 2018

I observe 100% speedup.

Before:

real    2m12.681s
user    22m33.524s
sys     46m15.834s

real    2m13.372s
user    22m41.626s
sys     46m24.667s

After:

real    1m18.318s
user    1m20.406s
sys     0m6.543s

real    1m17.703s
user    1m19.828s
sys     0m6.512s

What is interesting, before this commit, all CPUs are being used by computation (mostly trashed by kernel). Take a look at user and sys times.

After this change, only one thread is being used.

I did not check out the whole git repository, just downloaded the current version of this file, so the rest of scipy is 1.1.0 with 1.15.4 numpy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants