New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add byte quantization for float vectors in HNSW #102093
Add byte quantization for float vectors in HNSW #102093
Conversation
a950a4f
to
2176ee2
Compare
Pinging @elastic/es-search (Team:Search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome!!
|
||
As of 8.12 the default <<dense-vector-element-type,`element_type`>> is `float`. But this can be | ||
automatically quantized during index time through the <<dense-vector-quantization,`quantization`>>. Quantization will | ||
reduce the required memory by 4x, but it will also reduce the precision of the vectors. For `float` vectors with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we link to any information (blog maybe?) here that can explain the space/precision tradeoffs?
server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java
Outdated
Show resolved
Hide resolved
similarity: l2_norm | ||
index_options: | ||
type: hnsw | ||
quantization_options: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to add any tests validating errors if we try to create quantized indices with unsupported values?
…add-int8-quantization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great @benwtrent !
I left some minor comments.
We'll also need to update the tune-knn-search
docs but that can be done in a follow up.
[discrete] | ||
=== Reduce vector memory foot-print | ||
|
||
As of 8.12 the default <<dense-vector-element-type,`element_type`>> is `float`. But this can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc is per version so not sure if it's worth mentioning 8.12
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DOH! for sure, docs are already versioned!
in the index. This allows you to use the original `float` vectors for re-scoring, but the `byte` vectors for | ||
indexing. | ||
|
||
To use quantization, you must provide a `quantization_params` object in the `dense_vector` mapping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should refer to quantization_options
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep!
"element_type": "float", | ||
"dims": 2, | ||
"index": true, | ||
"quantization_params": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here?
if (mNode == null) { | ||
throw new MapperParsingException("[index_options] of type [hnsw] requires field [m] to be configured"); | ||
mNode = Lucene99HnswVectorsFormat.DEFAULT_MAX_CONN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need a test that configures m
and not ef_construction
since it was not possible before this change?
Did some rally tests (this is without force-merging), this is the You can see how script score over all the vectors hits many page faults as the data doesn't fit in memory. Also note all numbers reflect going from California (where the test cluster was located) to east coast (where my rally machine is).
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -67,6 +67,7 @@ public CartesianShapeValue() { | |||
super(CoordinateEncoder.CARTESIAN, CartesianPoint::new); | |||
} | |||
|
|||
@SuppressWarnings("this-escape") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks unrelated?
@@ -70,6 +70,7 @@ public GeoShapeValue() { | |||
this.tile2DVisitor = new Tile2DVisitor(); | |||
} | |||
|
|||
@SuppressWarnings("this-escape") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this is something that I fixed in main but is breaking my local build. For sanity, I fixed it here. I will be a no-op once main is merged here
}, | ||
"fields": [ "title" ], | ||
"rescore": { | ||
"window_size": 10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the window_size
should be set to 15 to match the intent with the k
value above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It honestly doesn't matter. I would think you would get a larger k, and then rerank some sub-set of those.
@@ -31,13 +31,15 @@ public class SimulateIndexResponse extends IndexResponse { | |||
private final BytesReference source; | |||
private final XContentType sourceXContentType; | |||
|
|||
@SuppressWarnings("this-escape") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will go away when we merge main, but it is breaking my local testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work!
Curiosity question: is there anything that is in the way of making this the default? |
What would the API look like to disable it? If the user provides any HNSW params and doesn't specify quantization, does this mean we should default? |
Ideally defaults would reflect the knn search tuning guide, so my current thinking is to have something like:
|
Alternatively, we could have a separate PUT vectors
{
"mappings": {
"properties": {
"my_vector": {
"type": "dense_vector",
"index": true,
"index_options": {
"type": "hnsw",
"quantization": {
"enabled": true,
"type": "byte"
}
}
}
}
}
} |
I wonder if specialising
We already have parameters at this level such as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great improvement.
I left some nit picky comments on the docs; mostly do to with my own confusion - this is not an easy concept to grasp, and I believe a tightening of the wording will help folks with it.
What I struggle with is the use of byte
while there is already a byte
element_type. How to we want users to think about this: quantization is just an implementation detail when using float element_type that improves runtime memory footprint OR quantization is somehow coercing floats to bytes, and the user now need to think of bytes. I would think the former, which kinda relates to other comments in this thread; maybe refer to the quantization as int8
or 1-byte integer
value ? Now users of float do not need to think of the byte
element_type.
// TEST[s/"num_candidates": 100/"num_candidates": 3/] | ||
|
||
Since the original `float` vectors are still retained in the index, you can optionally use them for re-scoring. This will | ||
do the heavy query against the indexed vectors, and then you can get the absolute nearest neighbors by re-scoring. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"heavy query" implies no memory footprint improvements, right? The original float vectors are loaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs here are unclear obviously. I mean to say that the expensive query is done via approximate search against smaller foot print.
I would not be comfortable making it the default right away. I really want more testing before we do this. But, we should indeed design the API in a way that makes this possible. For @jpountz 's option,
For @jimczi option, where we add a new index type, that could also work. For setting confidence interval, it would be
For things that don't support that parameter, we would throw. What do you think @jpountz ^ For things in the future @ChrisHegarty 's comment about Yes, I should update the quantization value name from |
The question that this raises to me is whether "flat" should be considered as a special index type, or as a lack of index. I had initially assumed that flat storage would be enabled by setting |
We need to support quantization over various indexing methodologies, including "flat". I think using "index: false" as a flat index mistake. It should only be useful for scripting. It is really weird to have "index: false" and then configure what you are "not" indexing 🤦. I think we need a new "flat" index kind that requires the similarity to be configured, and allows quantization. Admittedly, just "flat" isn't much better than "index: false", but it will clean up the query API really nicely (as we know the similarity that's configured). Another option is to have quantization live as a top level thing. But all this configuration is getting unwieldy. I am starting to like @jimczi's suggestion more and more. |
(Optional, object) | ||
An optional section that configures the quantization configuration. The | ||
quantization configuration is used to reduce the memory footprint of the | ||
index. Only `byte` quantization is currently supported and can only be configured if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should add a sentence about disk usage increase with quantization_options
?
@mayya-sharipova what do you think of @jimczi's suggested API? Adding a new "int8_hnsw" index type. |
So if I read your suggestion correctly, we'd treat |
Awesome, we are in agreement then. Moving to use I will update the PR when I can :D. |
+1 on @jimczi's proposal. +1 also to add "similarity": "l2_norm",
"index_options": {
"type": "flat"
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the iteration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @benwtrent, great work.
New changes to index_options
look good as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
f00364a
into
elastic:lucene_snapshot
Adds new
quantization_options
todense_vector
. This allows for vectors to be automatically quantized tobyte
when indexed.Example:
When querying, the query vector is automatically quantized and used when querying the HNSW graph. This reduces the memory required to only
25%
of what was previously required forfloat
vectors at a slight loss of accuracy.This is currently only available when
index: true
and when usinghnsw