-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Up-to 200x Faster SIMD-Accelerated Distance Functions #19454
Comments
Sounds exciting! Which SciPy version did you use in your benchmarks? Many of SciPy's distance metrics were reimplemented in C++ recently (version 1.11.0). |
Thanks for this proposal @ashvardanian, this is quite interesting. An optional dependency indeed seems like a good idea to me, since there's a lot of interest in making these distance functions faster (also Cc @Micky774 who worked on accelerating distance functions in scikit-learn). My main question right now is about dtype support - from the README it seems that |
@rgommers, valid point! I wasn't originally expecting any gains for double-precision functions, but turns out they are quite significant. @dschmitz89, I compare to the most recent version available at the time, which is now 1.11.4. I'm attaching the outcomes of my last commit, benchmarked on the Sapphire Rapids CPU. Benchmarking SimSIMD vs. SciPy
Between 2 Vectors, Batch Size: 1
Between 2 Vectors, Batch Size: 1,000
|
Porting to SVE-powered Graviton 3 chips also yields good results. Benchmarking SimSIMD vs. SciPy
Between 2 Vectors, Batch Size: 1
Between 2 Vectors, Batch Size: 1,000
|
@rgommers and @dschmitz89, hi 👋 Small update: with SimSIMD v4 and newer, return values are now also 64-bit floats, similar to SciPy, and more input types are supported in dot-products - including I've also changed the dynamic dispatch strategy. Aside from serial and Arm backends, on x86 I now differentiate Haswell, Skylake, Ice Lake, and Sapphire Rapids. The older CPUs got noticeable speed bump in some workloads, especially the bit-level Hamming and Jaccard distances. I am also now covering a broader build matrix including all Python versions supported by PyPi - 105. It's more than NumPy (35) and SciPy (24) combined, so should be easy to integrate, if more performance is needed 🤗 |
Introduction:
SciPy's spatial distance computations are fundamental to various scientific and data science tasks. To accelerate these computations, it is imperative to ensure that the underlying math operations are optimized for modern hardware.
Background:
Currently, SciPy relies on NumPy for math operations, which further depends on the underlying BLAS implementation. However, these BLAS libraries might not be fully optimized for the latest hardware advancements, potentially limiting the performance.
The SimSIMD Solution:
I've developed a low-level library called SimSIMD, which provides accelerated implementations of commonly used distance functions. Notably, SimSIMD is already in use by projects like USearch and ClickHouse, and is an optional backend in LangChain. The library boasts specialized backends for:
These cover most CPUs produced in the past decade, offering potential speed-ups.
Evidence:
While SimSIMD is not a complete replacement for SciPy's API, it focuses on the most used distance functions. Further functions can be incorporated based on the community's feedback.
Proposal:
Given the potential benefits, I propose to consider integrating SimSIMD as an optional backend in SciPy, as suggested on the SciPy Slack. This would offer users an optimized pathway for spatial distance computations on modern hardware platforms.
The text was updated successfully, but these errors were encountered: