Feature/speed up binary vector decoding #96716

benwtrent · 2023-06-08T19:55:57Z

Encoding floats in little endian format provides much faster decoding.

This commit takes all indices created in 8.9.0+ and stores binary vectors as little endian.

closes: #96710

elasticsearchmachine · 2023-06-08T19:56:21Z

Hi @benwtrent, I've created a changelog YAML for you.

elasticsearchmachine · 2023-06-08T19:56:22Z

Pinging @elastic/es-search (Team:Search)

benwtrent · 2023-06-08T19:56:52Z

Baseline is current main (with Panama Vector Module)
Contender is this change (again, with Panama Vector Module)

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------

|                                                        Metric |                 Task |        Baseline |       Contender |       Diff |   Unit |   Diff % |
|--------------------------------------------------------------:|---------------------:|----------------:|----------------:|-----------:|-------:|---------:|
|                    Cumulative indexing time of primary shards |                      |     1.5163      |     1.53303     |    0.01673 |    min |   +1.10% |
|             Min cumulative indexing time across primary shard |                      |     0.758       |     0.765483    |    0.00748 |    min |   +0.99% |
|          Median cumulative indexing time across primary shard |                      |     0.75815     |     0.766517    |    0.00837 |    min |   +1.10% |
|             Max cumulative indexing time across primary shard |                      |     0.7583      |     0.76755     |    0.00925 |    min |   +1.22% |
|           Cumulative indexing throttle time of primary shards |                      |     0           |     0           |    0       |    min |    0.00% |
|    Min cumulative indexing throttle time across primary shard |                      |     0           |     0           |    0       |    min |    0.00% |
| Median cumulative indexing throttle time across primary shard |                      |     0           |     0           |    0       |    min |    0.00% |
|    Max cumulative indexing throttle time across primary shard |                      |     0           |     0           |    0       |    min |    0.00% |
|                       Cumulative merge time of primary shards |                      |     0.86665     |     0.871267    |    0.00462 |    min |   +0.53% |
|                      Cumulative merge count of primary shards |                      |     4           |     4           |    0       |        |    0.00% |
|                Min cumulative merge time across primary shard |                      |     0.423467    |     0.42825     |    0.00478 |    min |   +1.13% |
|             Median cumulative merge time across primary shard |                      |     0.433325    |     0.435633    |    0.00231 |    min |   +0.53% |
|                Max cumulative merge time across primary shard |                      |     0.443183    |     0.443017    |   -0.00017 |    min |   -0.04% |
|              Cumulative merge throttle time of primary shards |                      |     0.512683    |     0.527467    |    0.01478 |    min |   +2.88% |
|       Min cumulative merge throttle time across primary shard |                      |     0.256167    |     0.2612      |    0.00503 |    min |   +1.96% |
|    Median cumulative merge throttle time across primary shard |                      |     0.256342    |     0.263733    |    0.00739 |    min |   +2.88% |
|       Max cumulative merge throttle time across primary shard |                      |     0.256517    |     0.266267    |    0.00975 |    min |   +3.80% |
|                     Cumulative refresh time of primary shards |                      |     0.112983    |     0.1234      |    0.01042 |    min |   +9.22% |
|                    Cumulative refresh count of primary shards |                      |    46           |    46           |    0       |        |    0.00% |
|              Min cumulative refresh time across primary shard |                      |     0.0534667   |     0.0589833   |    0.00552 |    min |  +10.32% |
|           Median cumulative refresh time across primary shard |                      |     0.0564917   |     0.0617      |    0.00521 |    min |   +9.22% |
|              Max cumulative refresh time across primary shard |                      |     0.0595167   |     0.0644167   |    0.0049  |    min |   +8.23% |
|                       Cumulative flush time of primary shards |                      |     0.09235     |     0.0870667   |   -0.00528 |    min |   -5.72% |
|                      Cumulative flush count of primary shards |                      |     2           |     2           |    0       |        |    0.00% |
|                Min cumulative flush time across primary shard |                      |     0.0459      |     0.0435167   |   -0.00238 |    min |   -5.19% |
|             Median cumulative flush time across primary shard |                      |     0.046175    |     0.0435333   |   -0.00264 |    min |   -5.72% |
|                Max cumulative flush time across primary shard |                      |     0.04645     |     0.04355     |   -0.0029  |    min |   -6.24% |
|                                       Total Young Gen GC time |                      |     1.262       |     1.028       |   -0.234   |      s |  -18.54% |
|                                      Total Young Gen GC count |                      |    43           |    42           |   -1       |        |   -2.33% |
|                                         Total Old Gen GC time |                      |     0           |     0           |    0       |      s |    0.00% |
|                                        Total Old Gen GC count |                      |     0           |     0           |    0       |        |    0.00% |
|                                                    Store size |                      |     2.0004      |     2.02266     |    0.02226 |     GB |   +1.11% |
|                                                 Translog size |                      |     1.02445e-07 |     1.02445e-07 |    0       |     GB |    0.00% |
|                                        Heap used for segments |                      |     0           |     0           |    0       |     MB |    0.00% |
|                                      Heap used for doc values |                      |     0           |     0           |    0       |     MB |    0.00% |
|                                           Heap used for terms |                      |     0           |     0           |    0       |     MB |    0.00% |
|                                           Heap used for norms |                      |     0           |     0           |    0       |     MB |    0.00% |
|                                          Heap used for points |                      |     0           |     0           |    0       |     MB |    0.00% |
|                                   Heap used for stored fields |                      |     0           |     0           |    0       |     MB |    0.00% |
|                                                 Segment count |                      |     2           |     2           |    0       |        |    0.00% |
|                                   Total Ingest Pipeline count |                      |     0           |     0           |    0       |        |    0.00% |
|                                    Total Ingest Pipeline time |                      |     0           |     0           |    0       |     ms |    0.00% |
|                                  Total Ingest Pipeline failed |                      |     0           |     0           |    0       |        |    0.00% |
|                                                Min Throughput |         index-append | 23251           | 23174.7         |  -76.2634  | docs/s |   -0.33% |
|                                               Mean Throughput |         index-append | 23737.5         | 23651.8         |  -85.6234  | docs/s |   -0.36% |
|                                             Median Throughput |         index-append | 23726.4         | 23608.6         | -117.861   | docs/s |   -0.50% |
|                                                Max Throughput |         index-append | 24177.5         | 24110.8         |  -66.6388  | docs/s |   -0.28% |
|                                       50th percentile latency |         index-append |   169.887       |   170.214       |    0.32644 |     ms |   +0.19% |
|                                       90th percentile latency |         index-append |   202.253       |   196.648       |   -5.60429 |     ms |   -2.77% |
|                                       99th percentile latency |         index-append |   280.865       |   233.765       |  -47.0994  |     ms |  -16.77% |
|                                      100th percentile latency |         index-append |   421.361       |   492.882       |   71.5206  |     ms |  +16.97% |
|                                  50th percentile service time |         index-append |   169.887       |   170.214       |    0.32644 |     ms |   +0.19% |
|                                  90th percentile service time |         index-append |   202.253       |   196.648       |   -5.60429 |     ms |   -2.77% |
|                                  99th percentile service time |         index-append |   280.865       |   233.765       |  -47.0994  |     ms |  -16.77% |
|                                 100th percentile service time |         index-append |   421.361       |   492.882       |   71.5206  |     ms |  +16.97% |
|                                                    error rate |         index-append |     0           |     0           |    0       |      % |    0.00% |
|                                                Min Throughput |  refresh-after-index |     0.483521    |     0.48829     |    0.00477 |  ops/s |   +0.99% |
|                                               Mean Throughput |  refresh-after-index |     0.483521    |     0.48829     |    0.00477 |  ops/s |   +0.99% |
|                                             Median Throughput |  refresh-after-index |     0.483521    |     0.48829     |    0.00477 |  ops/s |   +0.99% |
|                                                Max Throughput |  refresh-after-index |     0.483521    |     0.48829     |    0.00477 |  ops/s |   +0.99% |
|                                      100th percentile latency |  refresh-after-index |  2065           |  2044.74        |  -20.2556  |     ms |   -0.98% |
|                                 100th percentile service time |  refresh-after-index |  2065           |  2044.74        |  -20.2556  |     ms |   -0.98% |
|                                                    error rate |  refresh-after-index |     0           |     0           |    0       |      % |    0.00% |
|                                                Min Throughput | refresh-after-update |    95.1046      |   255.641       |  160.537   |  ops/s | +168.80% |
|                                               Mean Throughput | refresh-after-update |    95.1046      |   255.641       |  160.537   |  ops/s | +168.80% |
|                                             Median Throughput | refresh-after-update |    95.1046      |   255.641       |  160.537   |  ops/s | +168.80% |
|                                                Max Throughput | refresh-after-update |    95.1046      |   255.641       |  160.537   |  ops/s | +168.80% |
|                                      100th percentile latency | refresh-after-update |     8.25192     |     2.64692     |   -5.605   |     ms |  -67.92% |
|                                 100th percentile service time | refresh-after-update |     8.25192     |     2.64692     |   -5.605   |     ms |  -67.92% |
|                                                    error rate | refresh-after-update |     0           |     0           |    0       |      % |    0.00% |
|                                                Min Throughput |          force-merge |     0.0622565   |     0.0611523   |   -0.0011  |  ops/s |   -1.77% |
|                                               Mean Throughput |          force-merge |     0.0622565   |     0.0611523   |   -0.0011  |  ops/s |   -1.77% |
|                                             Median Throughput |          force-merge |     0.0622565   |     0.0611523   |   -0.0011  |  ops/s |   -1.77% |
|                                                Max Throughput |          force-merge |     0.0622565   |     0.0611523   |   -0.0011  |  ops/s |   -1.77% |
|                                      100th percentile latency |          force-merge | 16060.1         | 16350.3         |  290.176   |     ms |   +1.81% |
|                                 100th percentile service time |          force-merge | 16060.1         | 16350.3         |  290.176   |     ms |   +1.81% |
|                                                    error rate |          force-merge |     0           |     0           |    0       |      % |    0.00% |
|                                                Min Throughput |   script-score-query |    11.0361      |    12.582       |    1.54594 |  ops/s |  +14.01% |
|                                               Mean Throughput |   script-score-query |    12.5429      |    14.8063      |    2.26341 |  ops/s |  +18.05% |
|                                             Median Throughput |   script-score-query |    12.6932      |    15.085       |    2.39188 |  ops/s |  +18.84% |
|                                                Max Throughput |   script-score-query |    12.9486      |    15.5165      |    2.56792 |  ops/s |  +19.83% |
|                                       50th percentile latency |   script-score-query |    74.52        |    62.3899      |  -12.1301  |     ms |  -16.28% |
|                                       90th percentile latency |   script-score-query |    76.4475      |    65.0979      |  -11.3497  |     ms |  -14.85% |
|                                       99th percentile latency |   script-score-query |    89.8932      |    66.5804      |  -23.3128  |     ms |  -25.93% |
|                                     99.9th percentile latency |   script-score-query |   132.404       |    67.2787      |  -65.1252  |     ms |  -49.19% |
|                                      100th percentile latency |   script-score-query |   140.169       |    83.8935      |  -56.2751  |     ms |  -40.15% |
|                                  50th percentile service time |   script-score-query |    74.52        |    62.3899      |  -12.1301  |     ms |  -16.28% |
|                                  90th percentile service time |   script-score-query |    76.4475      |    65.0979      |  -11.3497  |     ms |  -14.85% |
|                                  99th percentile service time |   script-score-query |    89.8932      |    66.5804      |  -23.3128  |     ms |  -25.93% |
|                                99.9th percentile service time |   script-score-query |   132.404       |    67.2787      |  -65.1252  |     ms |  -49.19% |
|                                 100th percentile service time |   script-score-query |   140.169       |    83.8935      |  -56.2751  |     ms |  -40.15% |
|                                                    error rate |   script-score-query |     0           |     0           |    0       |      % |    0.00% |

…benwtrent/elasticsearch into feature/speed-up-binary-vector-decoding

benwtrent · 2023-06-09T11:30:41Z

@elasticmachine update branch

jdconrad

Cool change! I had a few questions, but otherwise LGTM.

jdconrad · 2023-06-12T21:16:39Z

server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java

@@ -64,6 +65,8 @@
 * A {@link FieldMapper} for indexing a dense vector of floats.
 */
 public class DenseVectorFieldMapper extends FieldMapper {
+    public static final Version MAGNITUDE_STORED_INDEX_VERSION = Version.V_7_5_0;
+    public static final Version LITTLE_ENDIAN_FLOAT_STORED_INDEX_VERSION = Version.V_8_9_0;


Should this be the last version prior to your change's version using the new TransportVersion constants?

@jdconrad this is a transport/wire serialization thing. its an index version thing. From my understanding index versioning is different. I will see what I can find.

I asked @thecoop and me just using Version here is OK. We may need to update it to IndexVersion depending on which commits make it in first :)

jdconrad · 2023-06-12T21:19:42Z

server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java

@@ -890,18 +907,18 @@ private Field parseKnnVector(DocumentParserContext context) throws IOException {
    private Field parseBinaryDocValuesVector(DocumentParserContext context) throws IOException {
        // encode array of floats as array of integers and store into buf
        // this code is here and not int the VectorEncoderDecoder so not to create extra arrays


Not related to your change, but would you mind fixing the not int -> not in?

jdconrad · 2023-06-12T21:46:38Z

server/src/main/java/org/elasticsearch/index/mapper/vectors/VectorEncoderDecoder.java

+            FloatBuffer fb = ByteBuffer.wrap(vectorBR.bytes, vectorBR.offset, vectorBR.length)
+                .order(ByteOrder.LITTLE_ENDIAN)
+                .asFloatBuffer();
+            fb.get(vector);
+        } else {
+            ByteBuffer byteBuffer = ByteBuffer.wrap(vectorBR.bytes, vectorBR.offset, vectorBR.length);
+            for (int dim = 0; dim < vector.length; dim++) {
+                vector[dim] = byteBuffer.getFloat((dim * Float.BYTES) + vectorBR.offset);
+            }


Could .asFloatBuffer() not be used for both little and big endian?

.asFloatBuffer() is marginally slower for BE. These implementations are the fastest I could get them.

…nary-vector-decoding

ChrisHegarty

LGTM. 👍

ChrisHegarty · 2023-06-13T13:42:32Z

server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java

+                return indexVersion.onOrAfter(LITTLE_ENDIAN_FLOAT_STORED_INDEX_VERSION)
+                    ? ByteBuffer.wrap(new byte[numBytes]).order(ByteOrder.LITTLE_ENDIAN)
+                    : ByteBuffer.wrap(new byte[numBytes]);
+            }


Encoding floats in little endian format provides much faster decoding. This commit takes all indices created in 8.9.0+ and stores binary vectors as little endian. closes: elastic#96710

benwtrent added 3 commits June 8, 2023 14:28

Improve binary float vector decoding by encoding little endian

514f832

binary vector decoding

71a33b2

fixing formatting, using names for version checks

317ab86

benwtrent added >enhancement :Search/Search Search-related issues that do not fall into other categories v8.9.0 labels Jun 8, 2023

elasticsearchmachine added the Team:Search Meta label for search team label Jun 8, 2023

Update docs/changelog/96716.yaml

c1c3239

benwtrent added 2 commits June 8, 2023 16:54

fixing tests

58afec8

Merge branch 'feature/speed-up-binary-vector-decoding' of github.com:…

580709c

…benwtrent/elasticsearch into feature/speed-up-binary-vector-decoding

Merge branch 'main' into feature/speed-up-binary-vector-decoding

b1fa250

benwtrent requested a review from jdconrad June 12, 2023 11:53

jdconrad approved these changes Jun 12, 2023

View reviewed changes

benwtrent added 3 commits June 13, 2023 08:45

Merge remote-tracking branch 'upstream/main' into feature/speed-up-bi…

31ae51e

…nary-vector-decoding

Merge remote-tracking branch 'upstream/main' into feature/speed-up-bi…

7a270d0

…nary-vector-decoding

fixing code comment

497aef3

benwtrent added the auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jun 13, 2023

ChrisHegarty approved these changes Jun 13, 2023

View reviewed changes

elasticsearchmachine merged commit 5d93a42 into elastic:main Jun 13, 2023
12 checks passed

benwtrent deleted the feature/speed-up-binary-vector-decoding branch June 13, 2023 13:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/speed up binary vector decoding #96716

Feature/speed up binary vector decoding #96716

benwtrent commented Jun 8, 2023

elasticsearchmachine commented Jun 8, 2023

elasticsearchmachine commented Jun 8, 2023

benwtrent commented Jun 8, 2023

benwtrent commented Jun 9, 2023

jdconrad left a comment

jdconrad Jun 12, 2023

benwtrent Jun 13, 2023

benwtrent Jun 13, 2023

jdconrad Jun 12, 2023

jdconrad Jun 12, 2023

benwtrent Jun 13, 2023

ChrisHegarty left a comment

ChrisHegarty Jun 13, 2023

Feature/speed up binary vector decoding #96716

Feature/speed up binary vector decoding #96716

Conversation

benwtrent commented Jun 8, 2023

elasticsearchmachine commented Jun 8, 2023

elasticsearchmachine commented Jun 8, 2023

benwtrent commented Jun 8, 2023

benwtrent commented Jun 9, 2023

jdconrad left a comment

Choose a reason for hiding this comment

jdconrad Jun 12, 2023

Choose a reason for hiding this comment

benwtrent Jun 13, 2023

Choose a reason for hiding this comment

benwtrent Jun 13, 2023

Choose a reason for hiding this comment

jdconrad Jun 12, 2023

Choose a reason for hiding this comment

jdconrad Jun 12, 2023

Choose a reason for hiding this comment

benwtrent Jun 13, 2023

Choose a reason for hiding this comment

ChrisHegarty left a comment

Choose a reason for hiding this comment

ChrisHegarty Jun 13, 2023

Choose a reason for hiding this comment