Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code speedup due to faster einsum replacement #23

Merged
merged 1 commit into from
Sep 9, 2023

Conversation

winedarksea
Copy link
Contributor

I use your simdkalman in my AutoTS project, thanks for the work, it is excellent code.
While running the scalene profiler on my code base it identified a possible performance improvement and suggested a speedup. The credit for this really goes to GPT v4.

I've tested the speedup and it is faster and functionally identical for my use case. I haven't necessarily tested this on your entire code bases range of inputs and outputs so there may be some hidden catch that I am unaware of.

simple speed comparison:

%timeit np.einsum("...ij,...kj->...ik", A, B)
31 ms ± 967 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit np.matmul(A, np.swapaxes(B, -1, -2))
7.61 ms ± 328 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

it's hard to quantify the bigger picture but it looks like it makes the function about 10% faster overall, in my use case

@oseiskar
Copy link
Owner

oseiskar commented Sep 3, 2023

Thank you! This seems like a great performance tweak. I do not immediately see any reason to use einsum instead of swapaxes. Originally the design goal was simply ensuring that all relevant methods use vectorized Numpy operations, but not exploring very thoroughly, which Numpy operations would be the fastest after the first working one was found.

Speculating about the performance: It can be critical which BLAS/LAPACK operation this eventually translates to under the hood of Numpy - and if the answer is "none" that's bad too. The swapaxes version probably translates directly to a sequence of very efficient matrix multiplications where the LHS is stored in row-major and RHS in column-major order, while einsum could fall back to a generic C loop.

I will runs some tests and merge this if there are no issues. May take a week or two since I'm currently a bit swamped.

@oseiskar oseiskar merged commit 9b75ad8 into oseiskar:master Sep 9, 2023
@oseiskar
Copy link
Owner

oseiskar commented Sep 9, 2023

Worked fine. Merged and released in 1.0.3. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants