Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pythran version of scipy.optimize._group_columns #13336

Merged
merged 1 commit into from
Jan 23, 2021

Conversation

serge-sans-paille
Copy link
Contributor

No description provided.

@serge-sans-paille serge-sans-paille force-pushed the feature/pythran-group branch 9 times, most recently from 2762c33 to afd55c0 Compare January 6, 2021 05:50
@serge-sans-paille
Copy link
Contributor Author

serge-sans-paille commented Jan 6, 2021

@rgommers this one looks good. The CI issue seems unrelated. My local benchmarks give interesting speedup

$ python -m timeit -s 'from numpy import array; n = 200; m = n - 12; x = array(range(n)); y = array(range(12, 12 +n)); xy = array(range(n*n)).reshape((n,n)); from _group_columns import group_sparse as gs, group_dense as gd;' 'gd(n, m, xy)'

pythran: 10 loops, best of 3: 115 usec per loop
cython: 10 loops, best of 3: 79.2 msec per loop

@serge-sans-paille
Copy link
Contributor Author

@rgommers gentle ping ;-)

@rgommers rgommers added enhancement A new feature or improvement scipy.optimize labels Jan 13, 2021
@rgommers
Copy link
Member

Thanks for the ping, and sorry for the delay @serge-sans-paille. I'm kind of distracted by a proposal deadline until the 19th.

That's a massive speedup, guess there's a serious problem in the Cython code somehow. The code looks like a correct line-by-line translation, it's not clear to me why there should be a ~700x performance difference.

@rgommers
Copy link
Member

I can't reproduce that. I get the same result for Pythran:

>>> n = 200
>>> m = n - 12
>>> xy = np.arange(n**2).reshape((n, n))
>>> n = 200
>>> m = n - 12
>>> x = np.arange(200)
>>> y = np.arange(12, 12 + n)
>>> xy = np.arange(n**2).reshape((n, n))
>>> from scipy.optimize._group_columns import group_dense
>>> %timeit group_dense(n, m, xy)
115 µs ± 444 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Cython code gives an exception for that, it only works with int32 xy:

>>> n = 200
>>> m = n - 12
>>> x = np.arange(200)
>>> y = np.arange(12, 12 + n)
>>> xy = np.arange(n**2).reshape((n, n)).astype(np.int32)
>>> from scipy.optimize._group_columns import group_dense
>>> %timeit group_dense(n, m, xy)
135 µs ± 1.43 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Then if I change Pythran to int32:

>>> xy = np.arange(n**2).reshape((n, n)).astype(np.int32)
>>> %timeit group_dense(n, m, xy)
75.6 µs ± 429 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

So the performance gain is close to 2x. Looks like you swapped the numbers and wrote msec instead of usec?

@serge-sans-paille
Copy link
Contributor Author

Oh, got it, I was using Python Int in my setup, which probably caused my Cython measurement to be meaningless. I've updated the Pythran export line to only accept C int, but it fails on windows I need to have both intc and int. I guess a speedup of x2 is already good ;-)

@serge-sans-paille
Copy link
Contributor Author

@rgommers is there anything I should do for that one?

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. This function is used only in least_squares it looks like. I did some quick timings on the first example in its docstring (Rosenbrock function with trf method), and the speedup is ~8%.

In it goes, thanks @serge-sans-paille

@rgommers rgommers merged commit 764f104 into scipy:master Jan 23, 2021
@rgommers rgommers added this to the 1.7.0 milestone Jan 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.optimize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants