Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add byte quantization for float vectors in HNSW #102093

Conversation

benwtrent
Copy link
Member

@benwtrent benwtrent commented Nov 13, 2023

Adds new quantization_options to dense_vector. This allows for vectors to be automatically quantized to byte when indexed.

Example:

PUT vectors
{
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "index": true,
        "index_options": {
          "type": "int8_hnsw"
        }
      }
    }
  }
}

When querying, the query vector is automatically quantized and used when querying the HNSW graph. This reduces the memory required to only 25% of what was previously required for float vectors at a slight loss of accuracy.

This is currently only available when index: true and when using hnsw

@benwtrent benwtrent added >feature cloud-deploy Publish cloud docker image for Cloud-First-Testing :Search/Vectors Vector search v8.12.0 labels Nov 13, 2023
@benwtrent benwtrent marked this pull request as ready for review November 14, 2023 14:37
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Nov 14, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome!!


As of 8.12 the default <<dense-vector-element-type,`element_type`>> is `float`. But this can be
automatically quantized during index time through the <<dense-vector-quantization,`quantization`>>. Quantization will
reduce the required memory by 4x, but it will also reduce the precision of the vectors. For `float` vectors with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we link to any information (blog maybe?) here that can explain the space/precision tradeoffs?

docs/reference/mapping/types/dense-vector.asciidoc Outdated Show resolved Hide resolved
docs/reference/mapping/types/dense-vector.asciidoc Outdated Show resolved Hide resolved
similarity: l2_norm
index_options:
type: hnsw
quantization_options:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add any tests validating errors if we try to create quantized indices with unsupported values?

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great @benwtrent !
I left some minor comments.
We'll also need to update the tune-knn-search docs but that can be done in a follow up.

[discrete]
=== Reduce vector memory foot-print

As of 8.12 the default <<dense-vector-element-type,`element_type`>> is `float`. But this can be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc is per version so not sure if it's worth mentioning 8.12?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DOH! for sure, docs are already versioned!

in the index. This allows you to use the original `float` vectors for re-scoring, but the `byte` vectors for
indexing.

To use quantization, you must provide a `quantization_params` object in the `dense_vector` mapping.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should refer to quantization_options?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep!

"element_type": "float",
"dims": 2,
"index": true,
"quantization_params": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here?

if (mNode == null) {
throw new MapperParsingException("[index_options] of type [hnsw] requires field [m] to be configured");
mNode = Lucene99HnswVectorsFormat.DEFAULT_MAX_CONN;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need a test that configures m and not ef_construction since it was not possible before this change?

@benwtrent
Copy link
Member Author

Did some rally tests (this is without force-merging), this is the so_vector dataset, so its 2M 768 float32 vectors, so requiring about 6240000000 bytes or ~6GB of off heap if using float. But quantizing to byte, it only requires about 1.5GB.

You can see how script score over all the vectors hits many page faults as the data doesn't fit in memory.

Also note all numbers reflect going from California (where the test cluster was located) to east coast (where my rally machine is).

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------
|                                                         Metric |                                Task |         Value |   Unit |
|---------------------------------------------------------------:|------------------------------------:|--------------:|-------:|
|                                                 Min Throughput |          knn-search-10-50-match-all |  10.06        |  ops/s |
|                                                Mean Throughput |          knn-search-10-50-match-all |  10.63        |  ops/s |
|                                              Median Throughput |          knn-search-10-50-match-all |  10.67        |  ops/s |
|                                                 Max Throughput |          knn-search-10-50-match-all |  11.07        |  ops/s |
|                                        50th percentile latency |          knn-search-10-50-match-all |  78.8566      |     ms |
|                                        90th percentile latency |          knn-search-10-50-match-all |  81.0507      |     ms |
|                                        99th percentile latency |          knn-search-10-50-match-all |  87.6091      |     ms |
|                                       100th percentile latency |          knn-search-10-50-match-all |  87.7286      |     ms |
|                                   50th percentile service time |          knn-search-10-50-match-all |  78.8566      |     ms |
|                                   90th percentile service time |          knn-search-10-50-match-all |  81.0507      |     ms |
|                                   99th percentile service time |          knn-search-10-50-match-all |  87.6091      |     ms |
|                                  100th percentile service time |          knn-search-10-50-match-all |  87.7286      |     ms |
|                                                     error rate |          knn-search-10-50-match-all |   0           |      % |
|                                                 Min Throughput |        script-score-query-match-all |   1.66        |  ops/s |
|                                                Mean Throughput |        script-score-query-match-all |   1.68        |  ops/s |
|                                              Median Throughput |        script-score-query-match-all |   1.68        |  ops/s |
|                                                 Max Throughput |        script-score-query-match-all |   1.69        |  ops/s |
|                                        50th percentile latency |        script-score-query-match-all | 573.453       |     ms |
|                                        90th percentile latency |        script-score-query-match-all | 587.929       |     ms |
|                                        99th percentile latency |        script-score-query-match-all | 618.025       |     ms |
|                                       100th percentile latency |        script-score-query-match-all | 653.512       |     ms |
|                                   50th percentile service time |        script-score-query-match-all | 573.453       |     ms |
|                                   90th percentile service time |        script-score-query-match-all | 587.929       |     ms |
|                                   99th percentile service time |        script-score-query-match-all | 618.025       |     ms |
|                                  100th percentile service time |        script-score-query-match-all | 653.512       |     ms |
|                                                     error rate |        script-score-query-match-all |   0           |      % |
|                                                 Min Throughput |   knn-search-10-50-acceptedAnswerId |  10.27        |  ops/s |
|                                                Mean Throughput |   knn-search-10-50-acceptedAnswerId |  10.41        |  ops/s |
|                                              Median Throughput |   knn-search-10-50-acceptedAnswerId |  10.42        |  ops/s |
|                                                 Max Throughput |   knn-search-10-50-acceptedAnswerId |  10.52        |  ops/s |
|                                        50th percentile latency |   knn-search-10-50-acceptedAnswerId |  91.0864      |     ms |
|                                        90th percentile latency |   knn-search-10-50-acceptedAnswerId |  95.1255      |     ms |
|                                        99th percentile latency |   knn-search-10-50-acceptedAnswerId | 102.892       |     ms |
|                                       100th percentile latency |   knn-search-10-50-acceptedAnswerId | 105.84        |     ms |
|                                   50th percentile service time |   knn-search-10-50-acceptedAnswerId |  91.0864      |     ms |
|                                   90th percentile service time |   knn-search-10-50-acceptedAnswerId |  95.1255      |     ms |
|                                   99th percentile service time |   knn-search-10-50-acceptedAnswerId | 102.892       |     ms |
|                                  100th percentile service time |   knn-search-10-50-acceptedAnswerId | 105.84        |     ms |
|                                                     error rate |   knn-search-10-50-acceptedAnswerId |   0           |      % |
|                                                 Min Throughput | script-score-query-acceptedAnswerId |   1.94        |  ops/s |
|                                                Mean Throughput | script-score-query-acceptedAnswerId |   1.95        |  ops/s |
|                                              Median Throughput | script-score-query-acceptedAnswerId |   1.95        |  ops/s |
|                                                 Max Throughput | script-score-query-acceptedAnswerId |   1.95        |  ops/s |
|                                        50th percentile latency | script-score-query-acceptedAnswerId | 506.142       |     ms |
|                                        90th percentile latency | script-score-query-acceptedAnswerId | 521.093       |     ms |
|                                        99th percentile latency | script-score-query-acceptedAnswerId | 544.209       |     ms |
|                                       100th percentile latency | script-score-query-acceptedAnswerId | 552.22        |     ms |
|                                   50th percentile service time | script-score-query-acceptedAnswerId | 506.142       |     ms |
|                                   90th percentile service time | script-score-query-acceptedAnswerId | 521.093       |     ms |
|                                   99th percentile service time | script-score-query-acceptedAnswerId | 544.209       |     ms |
|                                  100th percentile service time | script-score-query-acceptedAnswerId | 552.22        |     ms |
|                                                     error rate | script-score-query-acceptedAnswerId |   0           |      % |
|                                                 Min Throughput |               knn-search-10-50-java |  10.48        |  ops/s |
|                                                Mean Throughput |               knn-search-10-50-java |  10.61        |  ops/s |
|                                              Median Throughput |               knn-search-10-50-java |  10.61        |  ops/s |
|                                                 Max Throughput |               knn-search-10-50-java |  10.72        |  ops/s |
|                                        50th percentile latency |               knn-search-10-50-java |  89.0557      |     ms |
|                                        90th percentile latency |               knn-search-10-50-java |  94.2473      |     ms |
|                                        99th percentile latency |               knn-search-10-50-java | 188.35        |     ms |
|                                       100th percentile latency |               knn-search-10-50-java | 227.795       |     ms |
|                                   50th percentile service time |               knn-search-10-50-java |  89.0557      |     ms |
|                                   90th percentile service time |               knn-search-10-50-java |  94.2473      |     ms |
|                                   99th percentile service time |               knn-search-10-50-java | 188.35        |     ms |
|                                  100th percentile service time |               knn-search-10-50-java | 227.795       |     ms |
|                                                     error rate |               knn-search-10-50-java |   0           |      % |
|                                                 Min Throughput |             script-score-query-java |   7.72        |  ops/s |
|                                                Mean Throughput |             script-score-query-java |   7.76        |  ops/s |
|                                              Median Throughput |             script-score-query-java |   7.76        |  ops/s |
|                                                 Max Throughput |             script-score-query-java |   7.78        |  ops/s |
|                                        50th percentile latency |             script-score-query-java | 125.497       |     ms |
|                                        90th percentile latency |             script-score-query-java | 132.617       |     ms |
|                                        99th percentile latency |             script-score-query-java | 138.362       |     ms |
|                                       100th percentile latency |             script-score-query-java | 139.082       |     ms |
|                                   50th percentile service time |             script-score-query-java | 125.497       |     ms |
|                                   90th percentile service time |             script-score-query-java | 132.617       |     ms |
|                                   99th percentile service time |             script-score-query-java | 138.362       |     ms |
|                                  100th percentile service time |             script-score-query-java | 139.082       |     ms |
|                                                     error rate |             script-score-query-java |   0           |      % |
|                                                 Min Throughput |                knn-search-10-50-css |  10.66        |  ops/s |
|                                                Mean Throughput |                knn-search-10-50-css |  10.8         |  ops/s |
|                                              Median Throughput |                knn-search-10-50-css |  10.82        |  ops/s |
|                                                 Max Throughput |                knn-search-10-50-css |  10.89        |  ops/s |
|                                        50th percentile latency |                knn-search-10-50-css |  88.3835      |     ms |
|                                        90th percentile latency |                knn-search-10-50-css |  90.6141      |     ms |
|                                        99th percentile latency |                knn-search-10-50-css |  95.0067      |     ms |
|                                       100th percentile latency |                knn-search-10-50-css | 100.368       |     ms |
|                                   50th percentile service time |                knn-search-10-50-css |  88.3835      |     ms |
|                                   90th percentile service time |                knn-search-10-50-css |  90.6141      |     ms |
|                                   99th percentile service time |                knn-search-10-50-css |  95.0067      |     ms |
|                                  100th percentile service time |                knn-search-10-50-css | 100.368       |     ms |
|                                                     error rate |                knn-search-10-50-css |   0           |      % |
|                                                 Min Throughput |              script-score-query-css |   5.29        |  ops/s |
|                                                Mean Throughput |              script-score-query-css |   6.23        |  ops/s |
|                                              Median Throughput |              script-score-query-css |   6.29        |  ops/s |
|                                                 Max Throughput |              script-score-query-css |   7           |  ops/s |
|                                        50th percentile latency |              script-score-query-css |  87.5079      |     ms |
|                                        90th percentile latency |              script-score-query-css |  90.4777      |     ms |
|                                        99th percentile latency |              script-score-query-css |  93.7029      |     ms |
|                                       100th percentile latency |              script-score-query-css |  94.0575      |     ms |
|                                   50th percentile service time |              script-score-query-css |  87.5079      |     ms |
|                                   90th percentile service time |              script-score-query-css |  90.4777      |     ms |
|                                   99th percentile service time |              script-score-query-css |  93.7029      |     ms |
|                                  100th percentile service time |              script-score-query-css |  94.0575      |     ms |
|                                                     error rate |              script-score-query-css |   0           |      % |
|                                                 Min Throughput |        knn-search-10-50-concurrency |  13.14        |  ops/s |
|                                                Mean Throughput |        knn-search-10-50-concurrency |  13.36        |  ops/s |
|                                              Median Throughput |        knn-search-10-50-concurrency |  13.38        |  ops/s |
|                                                 Max Throughput |        knn-search-10-50-concurrency |  13.52        |  ops/s |
|                                        50th percentile latency |        knn-search-10-50-concurrency |  70.4345      |     ms |
|                                        90th percentile latency |        knn-search-10-50-concurrency |  71.4835      |     ms |
|                                        99th percentile latency |        knn-search-10-50-concurrency |  72.8865      |     ms |
|                                       100th percentile latency |        knn-search-10-50-concurrency |  73.7157      |     ms |
|                                   50th percentile service time |        knn-search-10-50-concurrency |  70.4345      |     ms |
|                                   90th percentile service time |        knn-search-10-50-concurrency |  71.4835      |     ms |
|                                   99th percentile service time |        knn-search-10-50-concurrency |  72.8865      |     ms |
|                                  100th percentile service time |        knn-search-10-50-concurrency |  73.7157      |     ms |
|                                                     error rate |        knn-search-10-50-concurrency |   0           |      % |
|                                                 Min Throughput |      script-score-query-concurrency |  13.01        |  ops/s |
|                                                Mean Throughput |      script-score-query-concurrency |  13.25        |  ops/s |
|                                              Median Throughput |      script-score-query-concurrency |  13.28        |  ops/s |
|                                                 Max Throughput |      script-score-query-concurrency |  13.44        |  ops/s |
|                                        50th percentile latency |      script-score-query-concurrency |  70.6921      |     ms |
|                                        90th percentile latency |      script-score-query-concurrency |  71.8873      |     ms |
|                                        99th percentile latency |      script-score-query-concurrency |  73.6924      |     ms |
|                                       100th percentile latency |      script-score-query-concurrency |  74.5011      |     ms |
|                                   50th percentile service time |      script-score-query-concurrency |  70.6921      |     ms |
|                                   90th percentile service time |      script-score-query-concurrency |  71.8873      |     ms |
|                                   99th percentile service time |      script-score-query-concurrency |  73.6924      |     ms |
|                                  100th percentile service time |      script-score-query-concurrency |  74.5011      |     ms |
|                                                     error rate |      script-score-query-concurrency |   0           |      % |

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -67,6 +67,7 @@ public CartesianShapeValue() {
super(CoordinateEncoder.CARTESIAN, CartesianPoint::new);
}

@SuppressWarnings("this-escape")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks unrelated?

@@ -70,6 +70,7 @@ public GeoShapeValue() {
this.tile2DVisitor = new Tile2DVisitor();
}

@SuppressWarnings("this-escape")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is something that I fixed in main but is breaking my local build. For sanity, I fixed it here. I will be a no-op once main is merged here

},
"fields": [ "title" ],
"rescore": {
"window_size": 10,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the window_size should be set to 15 to match the intent with the k value above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It honestly doesn't matter. I would think you would get a larger k, and then rerank some sub-set of those.

@@ -31,13 +31,15 @@ public class SimulateIndexResponse extends IndexResponse {
private final BytesReference source;
private final XContentType sourceXContentType;

@SuppressWarnings("this-escape")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will go away when we merge main, but it is breaking my local testing.

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work!

@jpountz
Copy link
Contributor

jpountz commented Nov 21, 2023

Curiosity question: is there anything that is in the way of making this the default?

@benwtrent
Copy link
Member Author

is there anything that is in the way of making this the default?

What would the API look like to disable it? If the user provides any HNSW params and doesn't specify quantization, does this mean we should default?

@jpountz
Copy link
Contributor

jpountz commented Nov 21, 2023

What would the API look like to disable it? If the user provides any HNSW params and doesn't specify quantization, does this mean we should default?

Ideally defaults would reflect the knn search tuning guide, so my current thinking is to have something like:

  • quantization enabled automatically when dim >= 384
  • have a quantization_options.type = "none" or something along these lines that can be used to disable quantization explicitly
  • indeed quantization would remain enabled if a user overrides HNSW options without touching quantization_options.type

@jpountz
Copy link
Contributor

jpountz commented Nov 22, 2023

Alternatively, we could have a separate enabled flag on quantization if type: none looks too ugly:

PUT vectors
{
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "index": true,
        "index_options": {
          "type": "hnsw",
          "quantization": {
            "enabled": true,
            "type": "byte"
          }
        }
      }
    }
  }
}

@jimczi
Copy link
Contributor

jimczi commented Nov 22, 2023

I wonder if specialising index_options.type would leave more room for better default in the future.
Ideally the index_options.type default should be auto and we can make the decision internally.
What about introducing a new type, something like int8_hnsw? That would allow to force the type to quantisation and to change the default when the type is not provided.

PUT vectors
{
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "index_options": {
          "type": "int8_hnsw"
        }
}

We already have parameters at this level such as ef_construction which are tailored to the hnsw type so adding confidence_interval for the new type appears to be consistent. The determination of available parameters should be based on the index_options.type.

Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great improvement.

I left some nit picky comments on the docs; mostly do to with my own confusion - this is not an easy concept to grasp, and I believe a tightening of the wording will help folks with it.

What I struggle with is the use of byte while there is already a byte element_type. How to we want users to think about this: quantization is just an implementation detail when using float element_type that improves runtime memory footprint OR quantization is somehow coercing floats to bytes, and the user now need to think of bytes. I would think the former, which kinda relates to other comments in this thread; maybe refer to the quantization as int8 or 1-byte integer value ? Now users of float do not need to think of the byte element_type.

docs/reference/mapping/types/dense-vector.asciidoc Outdated Show resolved Hide resolved
docs/reference/mapping/types/dense-vector.asciidoc Outdated Show resolved Hide resolved
// TEST[s/"num_candidates": 100/"num_candidates": 3/]

Since the original `float` vectors are still retained in the index, you can optionally use them for re-scoring. This will
do the heavy query against the indexed vectors, and then you can get the absolute nearest neighbors by re-scoring.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"heavy query" implies no memory footprint improvements, right? The original float vectors are loaded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs here are unclear obviously. I mean to say that the expensive query is done via approximate search against smaller foot print.

@benwtrent
Copy link
Member Author

@jpountz @jimczi

is there anything that is in the way of making this the default?

I would not be comfortable making it the default right away. I really want more testing before we do this. But, we should indeed design the API in a way that makes this possible.

For @jpountz 's option,

{
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "index": true,
        "index_options": {
          "type": "hnsw",
          "quantization": true, // like "index"
          "quantization_options": { // like "index_options"
            "type": "byte",
            "confidence_interval": 0.9
          }
        }
      }
    }
  }
}

For @jimczi option, where we add a new index type, that could also work.

For setting confidence interval, it would be

{
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "index": true,
        "index_options": {
          "type": "int8_hnsw",
          "confidence_interval": 0.9
        }
      }
    }
  }
}

For things that don't support that parameter, we would throw.

What do you think @jpountz ^ For things in the future pq int4, it would be pq_hnsw or int4_hnsw and for flat it would be int8_flat, etc.

@ChrisHegarty 's comment about byte vs int8.

Yes, I should update the quantization value name from byte to int8 to prevent term overloading and confusion.

@jpountz
Copy link
Contributor

jpountz commented Nov 27, 2023

What do you think @jpountz ^ For things in the future pq int4, it would be pq_hnsw or int4_hnsw and for flat it would be int8_flat, etc.

The question that this raises to me is whether "flat" should be considered as a special index type, or as a lack of index. I had initially assumed that flat storage would be enabled by setting index: false because it seems more consistent with other fields we support as flat storage feels like doc values, but on the other hand other vector search libraries do seem to consider flat storage as a form of index. If we want to only enable flat storage through index: false then it would be awkward to configure quantization as part of index options? And if flat is a special form of indexing, then what does index: false do?

@benwtrent
Copy link
Member Author

We need to support quantization over various indexing methodologies, including "flat".

I think using "index: false" as a flat index mistake. It should only be useful for scripting.

It is really weird to have "index: false" and then configure what you are "not" indexing 🤦.

I think we need a new "flat" index kind that requires the similarity to be configured, and allows quantization. Admittedly, just "flat" isn't much better than "index: false", but it will clean up the query API really nicely (as we know the similarity that's configured).

Another option is to have quantization live as a top level thing. But all this configuration is getting unwieldy.

I am starting to like @jimczi's suggestion more and more.

(Optional, object)
An optional section that configures the quantization configuration. The
quantization configuration is used to reduce the memory footprint of the
index. Only `byte` quantization is currently supported and can only be configured if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should add a sentence about disk usage increase with quantization_options?

@benwtrent
Copy link
Member Author

@mayya-sharipova what do you think of @jimczi's suggested API? Adding a new "int8_hnsw" index type.

@jpountz
Copy link
Contributor

jpountz commented Nov 28, 2023

So if I read your suggestion correctly, we'd treat flat as a form of index and parse index: false as meaning "flat index without quantization" for compatibility? That would work for me. I like that the suggested API from @jimczi is less verbose, much nicer to read.

@benwtrent
Copy link
Member Author

Awesome, we are in agreement then.

Moving to use int8_hnsw instead of all these quantization options/params thing. It will accept an optional parameter (similar to m and ef_search) for confidence_interval.

I will update the PR when I can :D.

@mayya-sharipova
Copy link
Contributor

mayya-sharipova commented Nov 28, 2023

+1 on @jimczi's proposal.

+1 also to add flat as an index_option: (may be in another PR):

"similarity": "l2_norm",
"index_options": {
  "type": "flat"
}

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the iteration

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @benwtrent, great work.

New changes to index_options look good as well.

Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@benwtrent benwtrent added the auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Nov 29, 2023
@elasticsearchmachine elasticsearchmachine merged commit f00364a into elastic:lucene_snapshot Nov 29, 2023
15 checks passed
@benwtrent benwtrent deleted the feature/add-int8-quantization branch November 29, 2023 17:30
benwtrent added a commit to ChrisHegarty/elasticsearch that referenced this pull request Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) cloud-deploy Publish cloud docker image for Cloud-First-Testing >feature :Search/Vectors Vector search Team:Search Meta label for search team v8.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants