-
Notifications
You must be signed in to change notification settings - Fork 24.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make cosine similarity faster by storing magnitude and normalizing vectors #99445
Make cosine similarity faster by storing magnitude and normalizing vectors #99445
Conversation
Hi @benwtrent, I've created a changelog YAML for you. |
Pinging @elastic/es-search (Team:Search) |
Did a comparison of this vs. regular Its weird how on search its continuously slightly slower (ran multiple times to confirm). Also the 99.9+% latency on index append is crazy! All these vectors are already normalized. So, I am not 100% sure what could be causing this issue. The comparison should be pretty near 1-to-1 when using this PR with
|
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its weird how on search its continuously slightly slower
On which task in particular are you seeing a slowdown? From a quick look performance looks similar on the baseline and candidate?
if (magnitudeIn == null) { | ||
return false; | ||
} | ||
int currentDoc = magnitudeIn.docID(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we used advanceExact()
we could simplify this logic to just do something like
if (magnitudeIn.advanceExact(docId)) {
magnitude = Float.intBitsToFloat((int) magnitudeIn.longValue());
} else {
magnitude = 1f;
}
In bencharks I ran a few years ago, advanceExact()
performed noticeably faster than advance
because it just needs to check if a doc exists, not compute the next doc.
@jpountz my main concern is Here is another run, baseline is
|
…sticsearch into feature/make-cosine-faster
Thinking about next steps out loud: if we do that then we could deprecate |
server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/elasticsearch/index/mapper/vectors/NormalizedCosineFloatVectorValues.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not competent to review all the bits that this PR touches but the change makes sense to me.
server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/elasticsearch/index/mapper/vectors/NormalizedCosineFloatVectorValues.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/elasticsearch/index/mapper/vectors/NormalizedCosineFloatVectorValues.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benwtrent Thanks Ben, great work!
PR elastic#99445 introduced automatic normalization of dense vectors with cosine similarity. This adds a note about this in the documentation. Relates to elastic#99445
PR elastic#99445 introduced automatic normalization of dense vectors with cosine similarity. This adds a note about this in the documentation. Relates to elastic#99445
PR elastic#99445 introduced automatic normalization of dense vectors with cosine similarity. This adds a note about this in the documentation. Relates to elastic#99445
PR elastic#99445 introduced automatic normalization of dense vectors with cosine similarity. This adds a note about this in the documentation. Relates to elastic#99445
cosine
is our default similarity and should provide a good experience on speed.dot_product
is faster thancosine
as it doesn't require calculating vector magnitudes in the similarity comparison loop. Instead, it can assume vectors have a length of1
and use an optimizeddot_product
calculation.However,
cosine
as it exists today accepts vectors of any magnitude and cannot take advantage of this.This commit addresses this by:
cosine
!= 1
).dot_product
Lucene calculationcosine
fields