-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Half-precision float vector metrics #4122
Conversation
…r vector distance metrics.
💵 To receive payouts, sign up on Algora, link your Github account and connect with Stripe/Alipay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel too confident about including ASM files into the project as-is.
@TheQuantumFractal could you please elaborate why you decided to do it like this instead of directly linking C?
FYI in qunatizations repo we have an example https://github.com/qdrant/quantization/tree/master/quantization/cpp
If you think ASM is strictly necessary, could you please include an instruction of how to generate it from C
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there isn't really a reason to include asm files. Directly linking C is a better solution. I can just set up linking for neon in qdrant/lib/segment then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would help, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TheQuantumFractal you can find the example how to integrate C code here:
https://github.com/qdrant/quantization/blob/master/quantization/build.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use this example because here we solved cross-compilation issues (for instance, build on x64 host binary for arm target)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added build.rs to link the C file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it would be nice to add f16
scoring benchmarks. You can do it here where byte
scoring defined
https://github.com/qdrant/qdrant/blob/dev/lib/segment/benches/metrics.rs
#[cfg(target_arch = "x86_64")] | ||
{ | ||
if is_x86_feature_detected!("avx") | ||
&& is_x86_feature_detected!("fma") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if f16c
is supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a suggestion of how to make this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_x86_feature_detected!("f16c")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved these.
} | ||
} | ||
|
||
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if f16c
is supported
fn similarity(v1: &[VectorElementTypeHalf], v2: &[VectorElementTypeHalf]) -> ScoreType { | ||
#[cfg(target_arch = "x86_64")] | ||
{ | ||
if is_x86_feature_detected!("avx") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if f16c
is supported
|
||
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))] | ||
{ | ||
if is_x86_feature_detected!("sse") && v1.len() >= MIN_DIM_SIZE_SIMD { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if f16c
is supported
fn similarity(v1: &[VectorElementTypeHalf], v2: &[VectorElementTypeHalf]) -> ScoreType { | ||
#[cfg(target_arch = "x86_64")] | ||
{ | ||
if is_x86_feature_detected!("avx") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if f16c
is supported
float16x8_t sum2 = vdupq_n_f16(0.0f); | ||
float16x8_t sum3 = vdupq_n_f16(0.0f); | ||
float16x8_t sum4 = vdupq_n_f16(0.0f); | ||
uint32_t i = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you want to define iterator here instead of for (int i, ...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the iterator into the for loop
float16x8_t sub3 = vdupq_n_f16(0.0f); | ||
float16x8_t sum4 = vdupq_n_f16(0.0f); | ||
float16x8_t sub4 = vdupq_n_f16(0.0f); | ||
uint32_t i = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not inside for
definition?
float16x8_t sum2 = vdupq_n_f16(0.0f); | ||
float16x8_t sum3 = vdupq_n_f16(0.0f); | ||
float16x8_t sum4 = vdupq_n_f16(0.0f); | ||
uint32_t i = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not inside for
definition?
float32_t tmp = 0.0f; | ||
for (i=0; i < (blockSize % 32); i++) { | ||
tmp = (*pSrcA - *pSrcB); | ||
manhattanDistance += tmp > 0 ? tmp : -tmp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not abs
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the arm f16 abs operation now
#[target_feature(enable = "avx")] | ||
#[target_feature(enable = "fma")] | ||
#[target_feature(enable = "f16c")] | ||
pub(crate) unsafe fn euclid_similarity_avx( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment for all namings. euclid_similarity_avx
is already presented. Please, rename euclid_similarity_avx
into avx_euclid_similarity_half
like it was named for byte type:
https://github.com/qdrant/qdrant/blob/dev/lib/segment/src/spaces/metric_uint/avx2/euclid.rs#L9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do this please for all simd functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@TheQuantumFractal rebase please to the latest dev, CI is red |
Hey @TheQuantumFractal thanks a lot for the contribution! We will take it from here and finish the integration as a separate PR. |
Sounds good! Happy to help :) |
* Adding half-precision floating point SIMD-optimized implementation for vector distance metrics. * Primitives adjustment * Remove ds store * Load assembly only for neon * Fixing linter errors * Adding float16 type * Addressing f16 comments * Refactoring and adding benchmarks * Renaming simd functions * Cleaning openapi * Merging in changes to dev * Fixing linter error * fix clippy * disable float16 feature in API --------- Co-authored-by: generall <andrey@vasnetsov.com>
All Submissions:
dev
branch. Did you create your branch fromdev
?New Feature Submissions:
cargo +nightly fmt --all
command prior to submission?cargo clippy --all --all-features
command?Changes to Core Features:
I built out a SIMD implementation with testing for Neon, AVX2, SSE2 on euclidean, manhattan, and dot similarity. Something to note is that float16 SIMD operations are not supported on most ISAs (ARM32/64 processors are able to handle it, and AVX512 recently announced some hardware support but most machines on AVX2 and SSE2 do not support it). F16C is an x86 instruction set extension supported on most x86 modern machines that supports conversion between half- and single-precision floating point formats. Essentially, to run the metrics on AVX2 or SSE2, f16 vectors need to be converted to f32 then processed with f32 SIMD accordingly. My implementations are as such. I also wrote out a separate C / assembly file that enables Neon f16 SIMD operations since Rust does not currently support ARM f16 SIMD operations.
The AVX2 / SSE2 SIMD was tested on a Intel(R) Xeon(R) CPU while the Neon SIMD was tested on an Apple M1 Pro.
As for cosine similarity, the current cosine similarity preprocess step accepts float32 DenseVectors and simply normalizes them. You can similarly normalize the float16 vectors by computing dot product between the vector and itself using the dot similarity SIMD implementation. The actual metric after preprocessing would use the same SIMD dot similarity implementation.
/claim #4110