Add indexing with concurrent searches #267

mayya-sharipova · 2022-04-11T15:22:11Z

A very common use case is to run indexing at the same time as searches.
This patch addresses this use case.
After the initial indexing, we add another operation that
add more documents (duplicate documents)
and at the same time doing knn searches on the index.

A very common use case is to run indexing at the same time as searches. This patch addresses this use case. We first index 50% of data. During the indexing of rest of data, we also run concurrent searches.

mayya-sharipova · 2022-04-11T15:24:31Z

dense_vector/challenges/default.json

+      "operation": {
+        "operation-type": "bulk",
+        "bulk-size": {{bulk_size | default(5000)}},
+        "ingest-percentage": 10


Ideally here, I wanted to say index half of the data iningest-percentage, but I did not know how to do that.
In our nightly benchmarks we use ingest-percentage:20, so for now, I've put half of this: 10.

DJRickyB · 2022-04-11T15:25:26Z

dense_vector/challenges/default.json

+            "operation": {
+              "operation-type": "bulk",
+              "bulk-size": {{bulk_size | default(5000)}},
+              "ingest-percentage": 10


I did this same thing in #195, and it turns out this will ingest the same 10% as the index-append task. FYI in case you need to write net-new docs for this challenge

@DJRickyB Thanks for your comment. I did not know that. We want to index net-new documents in this part. What's they way to tell to index the next 10%?

There is not one, unfortunately. One (maybe overly simplistic way) to accomplish some amount of data density before you start "counting" query performance would be to take out the initial index-append task and then give every task in the parallel block the same warmup-time-period. This would give the queries a minimum data density before the measured iterations, but would drop the in-a-vacuum indexing measurement, and you'd have to get that from running a separate challenge. I understand this approach is a bit sloppy, maybe we can chat out-of-band how to accomplish what you need or else enhance Rally to support what you're looking for here

mayya-sharipova · 2022-04-11T15:27:20Z

Results from my laptop:

  
|                                                         Metric |                                         Task |            Value |   Unit |
|---------------------------------------------------------------:|---------------------------------------------:|-----------------:|-------:|
|                     Cumulative indexing time of primary shards |                                              |      2.94898     |    min |
|             Min cumulative indexing time across primary shards |                                              |      1.47418     |    min |
|          Median cumulative indexing time across primary shards |                                              |      1.47449     |    min |
|             Max cumulative indexing time across primary shards |                                              |      1.4748      |    min |
|            Cumulative indexing throttle time of primary shards |                                              |      0.0527333   |    min |
|    Min cumulative indexing throttle time across primary shards |                                              |      0           |    min |
| Median cumulative indexing throttle time across primary shards |                                              |      0.0263667   |    min |
|    Max cumulative indexing throttle time across primary shards |                                              |      0.0527333   |    min |
|                        Cumulative merge time of primary shards |                                              |     13.4356      |    min |
|                       Cumulative merge count of primary shards |                                              |      2           |        |
|                Min cumulative merge time across primary shards |                                              |      6.67385     |    min |
|             Median cumulative merge time across primary shards |                                              |      6.71778     |    min |
|                Max cumulative merge time across primary shards |                                              |      6.76172     |    min |
|               Cumulative merge throttle time of primary shards |                                              |      0           |    min |
|       Min cumulative merge throttle time across primary shards |                                              |      0           |    min |
|    Median cumulative merge throttle time across primary shards |                                              |      0           |    min |
|       Max cumulative merge throttle time across primary shards |                                              |      0           |    min |
|                      Cumulative refresh time of primary shards |                                              |     11.2407      |    min |
|                     Cumulative refresh count of primary shards |                                              |     29           |        |
|              Min cumulative refresh time across primary shards |                                              |      5.29037     |    min |
|           Median cumulative refresh time across primary shards |                                              |      5.62037     |    min |
|              Max cumulative refresh time across primary shards |                                              |      5.95037     |    min |
|                        Cumulative flush time of primary shards |                                              |     12.6752      |    min |
|                       Cumulative flush count of primary shards |                                              |      6           |        |
|                Min cumulative flush time across primary shards |                                              |      6.18308     |    min |
|             Median cumulative flush time across primary shards |                                              |      6.33758     |    min |
|                Max cumulative flush time across primary shards |                                              |      6.49207     |    min |
|                                        Total Young Gen GC time |                                              |      0.77        |      s |
|                                       Total Young Gen GC count |                                              |     56           |        |
|                                          Total Old Gen GC time |                                              |      0           |      s |
|                                         Total Old Gen GC count |                                              |      0           |        |
|                                                     Store size |                                              |      0.995951    |     GB |
|                                                  Translog size |                                              |      1.02445e-07 |     GB |
|                                         Heap used for segments |                                              |      0           |     MB |
|                                       Heap used for doc values |                                              |      0           |     MB |
|                                            Heap used for terms |                                              |      0           |     MB |
|                                            Heap used for norms |                                              |      0           |     MB |
|                                           Heap used for points |                                              |      0           |     MB |
|                                    Heap used for stored fields |                                              |      0           |     MB |
|                                                  Segment count |                                              |      2           |        |
|                                    Total Ingest Pipeline count |                                              |      0           |        |
|                                     Total Ingest Pipeline time |                                              |      0           |      s |
|                                   Total Ingest Pipeline failed |                                              |      0           |        |
|                                                 Min Throughput |                                 index-append |  14125.4         | docs/s |
|                                                Mean Throughput |                                 index-append |  14611.7         | docs/s |
|                                              Median Throughput |                                 index-append |  14675.5         | docs/s |
|                                                 Max Throughput |                                 index-append |  14858.1         | docs/s |
|                                        50th percentile latency |                                 index-append |    273.153       |     ms |
|                                        90th percentile latency |                                 index-append |    383.032       |     ms |
|                                       100th percentile latency |                                 index-append |    625.663       |     ms |
|                                   50th percentile service time |                                 index-append |    273.153       |     ms |
|                                   90th percentile service time |                                 index-append |    383.032       |     ms |
|                                  100th percentile service time |                                 index-append |    625.663       |     ms |
|                                                     error rate |                                 index-append |      0           |      % |
|                                                 Min Throughput |        index-append-concurrent-with-searches |  13538.7         | docs/s |
|                                                Mean Throughput |        index-append-concurrent-with-searches |  13647.9         | docs/s |
|                                              Median Throughput |        index-append-concurrent-with-searches |  13636           | docs/s |
|                                                 Max Throughput |        index-append-concurrent-with-searches |  13873.4         | docs/s |
|                                        50th percentile latency |        index-append-concurrent-with-searches |    279.978       |     ms |
|                                        90th percentile latency |        index-append-concurrent-with-searches |    604.145       |     ms |
|                                       100th percentile latency |        index-append-concurrent-with-searches |    922.579       |     ms |
|                                   50th percentile service time |        index-append-concurrent-with-searches |    279.978       |     ms |
|                                   90th percentile service time |        index-append-concurrent-with-searches |    604.145       |     ms |
|                                  100th percentile service time |        index-append-concurrent-with-searches |    922.579       |     ms |
|                                                     error rate |        index-append-concurrent-with-searches |      0           |      % |
|                                                 Min Throughput |    knn-search-10-50-concurrent-with-indexing |      0.17        |  ops/s |
|                                                Mean Throughput |    knn-search-10-50-concurrent-with-indexing |      0.24        |  ops/s |
|                                              Median Throughput |    knn-search-10-50-concurrent-with-indexing |      0.24        |  ops/s |
|                                                 Max Throughput |    knn-search-10-50-concurrent-with-indexing |      0.31        |  ops/s |
|                                        50th percentile latency |    knn-search-10-50-concurrent-with-indexing |     12.8949      |     ms |
|                                        90th percentile latency |    knn-search-10-50-concurrent-with-indexing |     29.3222      |     ms |
|                                        99th percentile latency |    knn-search-10-50-concurrent-with-indexing |     45.171       |     ms |
|                                       100th percentile latency |    knn-search-10-50-concurrent-with-indexing |     45.7837      |     ms |
|                                   50th percentile service time |    knn-search-10-50-concurrent-with-indexing |     12.8949      |     ms |
|                                   90th percentile service time |    knn-search-10-50-concurrent-with-indexing |     29.3222      |     ms |
|                                   99th percentile service time |    knn-search-10-50-concurrent-with-indexing |     45.171       |     ms |
|                                  100th percentile service time |    knn-search-10-50-concurrent-with-indexing |     45.7837      |     ms |
|                                                     error rate |    knn-search-10-50-concurrent-with-indexing |      0           |      % |
|                                                 Min Throughput |   knn-search-10-100-concurrent-with-indexing |      0.17        |  ops/s |
|                                                Mean Throughput |   knn-search-10-100-concurrent-with-indexing |      0.24        |  ops/s |
|                                              Median Throughput |   knn-search-10-100-concurrent-with-indexing |      0.24        |  ops/s |
|                                                 Max Throughput |   knn-search-10-100-concurrent-with-indexing |      0.31        |  ops/s |
|                                        50th percentile latency |   knn-search-10-100-concurrent-with-indexing |     12.9775      |     ms |
|                                        90th percentile latency |   knn-search-10-100-concurrent-with-indexing |     25.315       |     ms |
|                                        99th percentile latency |   knn-search-10-100-concurrent-with-indexing |     38.458       |     ms |
|                                       100th percentile latency |   knn-search-10-100-concurrent-with-indexing |     45.6029      |     ms |
|                                   50th percentile service time |   knn-search-10-100-concurrent-with-indexing |     12.9775      |     ms |
|                                   90th percentile service time |   knn-search-10-100-concurrent-with-indexing |     25.315       |     ms |
|                                   99th percentile service time |   knn-search-10-100-concurrent-with-indexing |     38.458       |     ms |
|                                  100th percentile service time |   knn-search-10-100-concurrent-with-indexing |     45.6029      |     ms |
|                                                     error rate |   knn-search-10-100-concurrent-with-indexing |      0           |      % |
|                                                 Min Throughput | knn-search-100-1000-concurrent-with-indexing |      0.14        |  ops/s |
|                                                Mean Throughput | knn-search-100-1000-concurrent-with-indexing |      0.23        |  ops/s |
|                                              Median Throughput | knn-search-100-1000-concurrent-with-indexing |      0.23        |  ops/s |
|                                                 Max Throughput | knn-search-100-1000-concurrent-with-indexing |      0.33        |  ops/s |
|                                        50th percentile latency | knn-search-100-1000-concurrent-with-indexing |     23.4112      |     ms |
|                                        90th percentile latency | knn-search-100-1000-concurrent-with-indexing |     31.9343      |     ms |
|                                        99th percentile latency | knn-search-100-1000-concurrent-with-indexing |     53.8614      |     ms |
|                                       100th percentile latency | knn-search-100-1000-concurrent-with-indexing |     57.3495      |     ms |
|                                   50th percentile service time | knn-search-100-1000-concurrent-with-indexing |     23.4112      |     ms |
|                                   90th percentile service time | knn-search-100-1000-concurrent-with-indexing |     31.9343      |     ms |
|                                   99th percentile service time | knn-search-100-1000-concurrent-with-indexing |     53.8614      |     ms |
|                                  100th percentile service time | knn-search-100-1000-concurrent-with-indexing |     57.3495      |     ms |
|                                                     error rate | knn-search-100-1000-concurrent-with-indexing |      0           |      % |
|                                                 Min Throughput |          knn-search-10-50-before-force-merge |    285.75        |  ops/s |
|                                                Mean Throughput |          knn-search-10-50-before-force-merge |    285.75        |  ops/s |
|                                              Median Throughput |          knn-search-10-50-before-force-merge |    285.75        |  ops/s |
|                                                 Max Throughput |          knn-search-10-50-before-force-merge |    285.75        |  ops/s |
|                                        50th percentile latency |          knn-search-10-50-before-force-merge |      2.62398     |     ms |
|                                        90th percentile latency |          knn-search-10-50-before-force-merge |      3.49838     |     ms |
|                                        99th percentile latency |          knn-search-10-50-before-force-merge |      4.83488     |     ms |
|                                       100th percentile latency |          knn-search-10-50-before-force-merge |      4.84396     |     ms |
|                                   50th percentile service time |          knn-search-10-50-before-force-merge |      2.62398     |     ms |
|                                   90th percentile service time |          knn-search-10-50-before-force-merge |      3.49838     |     ms |
|                                   99th percentile service time |          knn-search-10-50-before-force-merge |      4.83488     |     ms |
|                                  100th percentile service time |          knn-search-10-50-before-force-merge |      4.84396     |     ms |
|                                                     error rate |          knn-search-10-50-before-force-merge |      0           |      % |
|                                                 Min Throughput |         knn-search-10-100-before-force-merge |    247.8         |  ops/s |
|                                                Mean Throughput |         knn-search-10-100-before-force-merge |    247.8         |  ops/s |
|                                              Median Throughput |         knn-search-10-100-before-force-merge |    247.8         |  ops/s |
|                                                 Max Throughput |         knn-search-10-100-before-force-merge |    247.8         |  ops/s |
|                                        50th percentile latency |         knn-search-10-100-before-force-merge |      3.29792     |     ms |
|                                        90th percentile latency |         knn-search-10-100-before-force-merge |      4.2637      |     ms |
|                                        99th percentile latency |         knn-search-10-100-before-force-merge |      6.67976     |     ms |
|                                       100th percentile latency |         knn-search-10-100-before-force-merge |      6.83308     |     ms |
|                                   50th percentile service time |         knn-search-10-100-before-force-merge |      3.29792     |     ms |
|                                   90th percentile service time |         knn-search-10-100-before-force-merge |      4.2637      |     ms |
|                                   99th percentile service time |         knn-search-10-100-before-force-merge |      6.67976     |     ms |
|                                  100th percentile service time |         knn-search-10-100-before-force-merge |      6.83308     |     ms |
|                                                     error rate |         knn-search-10-100-before-force-merge |      0           |      % |
|                                                 Min Throughput |       knn-search-100-1000-before-force-merge |     43.59        |  ops/s |
|                                                Mean Throughput |       knn-search-100-1000-before-force-merge |     43.68        |  ops/s |
|                                              Median Throughput |       knn-search-100-1000-before-force-merge |     43.68        |  ops/s |
|                                                 Max Throughput |       knn-search-100-1000-before-force-merge |     43.78        |  ops/s |
|                                        50th percentile latency |       knn-search-100-1000-before-force-merge |     22.0119      |     ms |
|                                        90th percentile latency |       knn-search-100-1000-before-force-merge |     23.5894      |     ms |
|                                        99th percentile latency |       knn-search-100-1000-before-force-merge |     24.9529      |     ms |
|                                       100th percentile latency |       knn-search-100-1000-before-force-merge |     25.3105      |     ms |
|                                   50th percentile service time |       knn-search-100-1000-before-force-merge |     22.0119      |     ms |
|                                   90th percentile service time |       knn-search-100-1000-before-force-merge |     23.5894      |     ms |
|                                   99th percentile service time |       knn-search-100-1000-before-force-merge |     24.9529      |     ms |
|                                  100th percentile service time |       knn-search-100-1000-before-force-merge |     25.3105      |     ms |
|                                                     error rate |       knn-search-100-1000-before-force-merge |      0           |      % |
|                                                 Min Throughput |        script-score-query-before-force-merge |      5.5         |  ops/s |
|                                                Mean Throughput |        script-score-query-before-force-merge |      5.6         |  ops/s |
|                                              Median Throughput |        script-score-query-before-force-merge |      5.61        |  ops/s |
|                                                 Max Throughput |        script-score-query-before-force-merge |      5.67        |  ops/s |
|                                        50th percentile latency |        script-score-query-before-force-merge |    169.724       |     ms |
|                                        90th percentile latency |        script-score-query-before-force-merge |    170.824       |     ms |
|                                        99th percentile latency |        script-score-query-before-force-merge |    175.088       |     ms |
|                                       100th percentile latency |        script-score-query-before-force-merge |    183.441       |     ms |
|                                   50th percentile service time |        script-score-query-before-force-merge |    169.724       |     ms |
|                                   90th percentile service time |        script-score-query-before-force-merge |    170.824       |     ms |
|                                   99th percentile service time |        script-score-query-before-force-merge |    175.088       |     ms |
|                                  100th percentile service time |        script-score-query-before-force-merge |    183.441       |     ms |
|                                                     error rate |        script-score-query-before-force-merge |      0           |      % |
|                                                 Min Throughput |                                  force-merge |      0           |  ops/s |
|                                                Mean Throughput |                                  force-merge |      0           |  ops/s |
|                                              Median Throughput |                                  force-merge |      0           |  ops/s |
|                                                 Max Throughput |                                  force-merge |      0           |  ops/s |
|                                       100th percentile latency |                                  force-merge | 806615           |     ms |
|                                  100th percentile service time |                                  force-merge | 806615           |     ms |
|                                                     error rate |                                  force-merge |      0           |      % |
|                                                 Min Throughput |                             knn-search-10-50 |    213.35        |  ops/s |
|                                                Mean Throughput |                             knn-search-10-50 |    213.35        |  ops/s |
|                                              Median Throughput |                             knn-search-10-50 |    213.35        |  ops/s |
|                                                 Max Throughput |                             knn-search-10-50 |    213.35        |  ops/s |
|                                        50th percentile latency |                             knn-search-10-50 |      2.09358     |     ms |
|                                        90th percentile latency |                             knn-search-10-50 |      5.94023     |     ms |
|                                        99th percentile latency |                             knn-search-10-50 |      8.52969     |     ms |
|                                       100th percentile latency |                             knn-search-10-50 |      8.84533     |     ms |
|                                   50th percentile service time |                             knn-search-10-50 |      2.09358     |     ms |
|                                   90th percentile service time |                             knn-search-10-50 |      5.94023     |     ms |
|                                   99th percentile service time |                             knn-search-10-50 |      8.52969     |     ms |
|                                  100th percentile service time |                             knn-search-10-50 |      8.84533     |     ms |
|                                                     error rate |                             knn-search-10-50 |      0           |      % |
|                                                 Min Throughput |                            knn-search-10-100 |    402.07        |  ops/s |
|                                                Mean Throughput |                            knn-search-10-100 |    402.07        |  ops/s |
|                                              Median Throughput |                            knn-search-10-100 |    402.07        |  ops/s |
|                                                 Max Throughput |                            knn-search-10-100 |    402.07        |  ops/s |
|                                        50th percentile latency |                            knn-search-10-100 |      1.73748     |     ms |
|                                        90th percentile latency |                            knn-search-10-100 |      2.76045     |     ms |
|                                        99th percentile latency |                            knn-search-10-100 |      4.35762     |     ms |
|                                       100th percentile latency |                            knn-search-10-100 |      4.81538     |     ms |
|                                   50th percentile service time |                            knn-search-10-100 |      1.73748     |     ms |
|                                   90th percentile service time |                            knn-search-10-100 |      2.76045     |     ms |
|                                   99th percentile service time |                            knn-search-10-100 |      4.35762     |     ms |
|                                  100th percentile service time |                            knn-search-10-100 |      4.81538     |     ms |
|                                                     error rate |                            knn-search-10-100 |      0           |      % |
|                                                 Min Throughput |                          knn-search-100-1000 |    129.23        |  ops/s |
|                                                Mean Throughput |                          knn-search-100-1000 |    129.23        |  ops/s |
|                                              Median Throughput |                          knn-search-100-1000 |    129.23        |  ops/s |
|                                                 Max Throughput |                          knn-search-100-1000 |    129.23        |  ops/s |
|                                        50th percentile latency |                          knn-search-100-1000 |      5.04306     |     ms |
|                                        90th percentile latency |                          knn-search-100-1000 |      5.96235     |     ms |
|                                        99th percentile latency |                          knn-search-100-1000 |      9.07404     |     ms |
|                                       100th percentile latency |                          knn-search-100-1000 |      9.10671     |     ms |
|                                   50th percentile service time |                          knn-search-100-1000 |      5.04306     |     ms |
|                                   90th percentile service time |                          knn-search-100-1000 |      5.96235     |     ms |
|                                   99th percentile service time |                          knn-search-100-1000 |      9.07404     |     ms |
|                                  100th percentile service time |                          knn-search-100-1000 |      9.10671     |     ms |
|                                                     error rate |                          knn-search-100-1000 |      0           |      % |
|                                                 Min Throughput |                           script-score-query |      5.81        |  ops/s |
|                                                Mean Throughput |                           script-score-query |      5.83        |  ops/s |
|                                              Median Throughput |                           script-score-query |      5.83        |  ops/s |
|                                                 Max Throughput |                           script-score-query |      5.85        |  ops/s |
|                                        50th percentile latency |                           script-score-query |    168.784       |     ms |
|                                        90th percentile latency |                           script-score-query |    170.536       |     ms |
|                                        99th percentile latency |                           script-score-query |    174.288       |     ms |
|                                       100th percentile latency |                           script-score-query |    174.321       |     ms |
|                                   50th percentile service time |                           script-score-query |    168.784       |     ms |
|                                   90th percentile service time |                           script-score-query |    170.536       |     ms |
|                                   99th percentile service time |                           script-score-query |    174.288       |     ms |
|                                  100th percentile service time |                           script-score-query |    174.321       |     ms |
|                                                     error rate |                           script-score-query |      0           |      % |


----------------------------------
[INFO] SUCCESS (took 3110 seconds)
----------------------------------

mayya-sharipova · 2022-04-11T15:29:10Z

Some notable observations:

Indexing with concurrent searches 1.5-2 times slower

|  90th percentile service time |                                 index-append |    383.032       |     ms |
| 100th percentile service time |                                 index-append |    625.663       |     ms |
|  90th percentile service time |        index-append-concurrent-with-searches |    604.145       |     ms |
| 100th percentile service time |        index-append-concurrent-with-searches |    922.579       |     ms |

Searches with concurrent indexing 5x-10x times slower

| 90th percentile service time |    knn-search-10-50-concurrent-with-indexing |     29.3222      |     ms |
|100th percentile service time |    knn-search-10-50-concurrent-with-indexing |     45.7837      |     ms |

| 90th percentile service time |   knn-search-10-100-concurrent-with-indexing |     25.315       |     ms |
| 00th percentile service time |   knn-search-10-100-concurrent-with-indexing |     45.6029      |     ms |

| 90th percentile service time | knn-search-100-1000-concurrent-with-indexing |     31.9343      |     ms |
|100th percentile service time | knn-search-100-1000-concurrent-with-indexing |     57.3495      |     ms |

---

| 90th percentile service time |          knn-search-10-50-before-force-merge |      3.49838     |     ms |
|100th percentile service time |          knn-search-10-50-before-force-merge |      4.84396     |     ms |

| 90th percentile service time |         knn-search-10-100-before-force-merge |      4.2637      |     ms |
|100th percentile service time |         knn-search-10-100-before-force-merge |      6.83308     |     ms |

| 90th percentile service time |       knn-search-100-1000-before-force-merge |     23.5894      |     ms |
|100th percentile service time |       knn-search-100-1000-before-force-merge |     25.3105      |     ms |

|90th percentile service time  |                             knn-search-10-50 |      5.94023     |     ms |
|100th percentile service time |                             knn-search-10-50 |      8.84533     |     ms |

| 90th percentile service time |                            knn-search-10-100 |      2.76045     |     ms |
|100th percentile service time |                            knn-search-10-100 |      4.81538     |     ms |

| 90th percentile service time |                          knn-search-100-1000 |      5.96235     |     ms |
|100th percentile service time |                          knn-search-100-1000 |      9.10671     |     ms |

mayya-sharipova · 2022-04-20T14:22:41Z

I experimented with creating another challenge and using warmup-time-period, but I found it is very difficult to find a good warmup-time-period.

I used "warmup-time-period": 100, and I get too few indexed documents during the test period (100K)
I used "warmup-time-period":200, and I was getting message: [WARNING] No throughput metrics available for [index-append]. Likely cause: The benchmark ended already during warmup.

And looks like warmup-time-period is very fragile and also machine dependant, so I don't think it would be a workable option for us.

concurrent-index-and-search challenge

{
  "name": "concurrent-index-and-search",
  "description": "incremental indexing of vectors with concurrent searches",
  "default": false,
  "schedule": [
    {
      "operation": {
        "operation-type": "delete-index"
      }
    },
    {
      "operation": {
        "operation-type": "create-index"
      }
    },
    {
      "operation": {
        "operation-type": "cluster-health",
        "request-params": {
          "wait_for_status": "green"
        },
        "retry-until-success": true
      }
    },
    {
      "parallel": {
        "warmup-time-period": 100,
        "completed-by": "index-append",
        "tasks": [
          {
            "name": "index-append",
            "operation": {
              "operation-type": "bulk",
              "bulk-size": 5000,
              "ingest-percentage": 20
            },
            "clients": 1
          },
          {
            "name": "knn-search-10-50-concurrent-with-indexing",
            "operation": "knn-search-10-50",
            "clients": 1
          },
          {
            "name": "knn-search-10-100-concurrent-with-indexing",
            "operation": "knn-search-10-100",
            "clients": 1
          },
          {
            "name": "knn-search-100-1000-concurrent-with-indexing",
            "operation": "knn-search-100-1000",
            "clients": 1
          }
        ]
      }  
    },
    {
      "name": "wait-until-merges-finish-after-index",
      "operation": {
        "operation-type": "index-stats",
        "index": "_all",
        "condition": {
          "path": "_all.total.merges.current",
          "expected-value": 0
        },
        "retry-until-success": true,
        "include-in-reporting": true
      }
    }      
  ]
}

I also experimented with using the same number of warmup-iterations for the whole parallel group, but it doesn't work either, as search operations get finished much faster than indexing operations. Using different number of warmup-iterations for each item in the parallel group could help, but it would be also very difficult to find right number of warmup-iterations.

I think sticking with the original plan ingesting the same documents in the parallel section would be our best strategy. Later when the performance team modifies rally to allow to use different set of documents, we can modify our challenge.

DJRickyB

I've left comments that are mostly concerns about variability in the test as designed. One way to control more would be to separate each of the three queries to their own parallel block, completed-by the query task and unbounded on the indexing tasks' ingest-percentage. It is verbose to spell it out this way but I think the results will potentially be more meaningful unless you are also targeting resource contention on the search side, and we do not fear variability in the results.

I'm happy to accept the way the test is currently, also, then revise based on what we see bubble up in nightly benchmark executions, but throwing these thoughts out in case they are helpful to testing what you intend to test.

DJRickyB · 2022-04-20T14:30:19Z

dense_vector/challenges/default.json

+      "parallel": {
+        "tasks": [
+          {
+            "name": "index-append-concurrent-with-searches",


Suggested change

"name": "index-append-concurrent-with-searches",

"name": "index-update-concurrent-with-searches",

DJRickyB · 2022-04-20T14:53:55Z

dense_vector/challenges/default.json

+            "clients": {{bulk_indexing_clients | default(1)}}
+          },
+          {
+            "name": "knn-search-10-50-concurrent-with-indexing",


note here this will not just be concurrent with indexing but concurrent with your other searches, possibly resulting in queuing on the target's shards and not giving us determinant results. Is that ok?

DJRickyB · 2022-04-20T15:01:04Z

dense_vector/challenges/default.json

      "warmup-time-period": {{ bulk_warmup | default(40) | int }},
      "clients": {{bulk_indexing_clients | default(1)}}
    },
+    {
+      "parallel": {


without a completed-by here, the block will execute until all dependent tasks have completed. Note however, that if the indexing task completes before the querying tasks (I'm not sure how likely this is), you will have querying tasks that execute and affect your results but are not actually executing in parallel to indexing

Add an operation that updates the current documents and at the same time doing knn searches on them.

mayya-sharipova · 2022-05-10T19:09:08Z

@DJRickyB Sorry for a late reply, I was away for some of this time and busy with other tasks, but would like to continue the work on this.

I've added the second commit that addresses some of your feedback:

index-append-concurrent-with-searches was renamed to index-update-concurrent-with-searches
using only a single search type request in the parallel block (this gives us enough data)
added "completed-by": "index-update-concurrent-with-searches" in the parallel section

concerns about variability in the test as designed. One way to control more would be to separate each of the three queries to their own parallel block, completed-by the query task and unbounded on the indexing tasks' ingest-percentage. It is verbose to spell it out this way but I think the results will potentially be more meaningful unless you are also targeting resource contention on the search side, and we do not fear variability in the results.

Using a single type of query would be enough for us for now.
We expect indexing operations take much more time than searches, and for the concurrent test to be meaningful we would like for searches to see new indexed/updated data.

mayya-sharipova · 2022-05-10T19:10:55Z

Here are the rally benchmarking results for the latest commit:

concurrent-index-and-search challenge

|                                                         Metric |                                         Task |           Value |   Unit |
|---------------------------------------------------------------:|---------------------------------------------:|----------------:|-------:|
|                     Cumulative indexing time of primary shards |                                              |     2.75132     |    min |
|             Min cumulative indexing time across primary shards |                                              |     1.36882     |    min |
|          Median cumulative indexing time across primary shards |                                              |     1.37566     |    min |
|             Max cumulative indexing time across primary shards |                                              |     1.3825      |    min |
|            Cumulative indexing throttle time of primary shards |                                              |     0           |    min |
|    Min cumulative indexing throttle time across primary shards |                                              |     0           |    min |
| Median cumulative indexing throttle time across primary shards |                                              |     0           |    min |
|    Max cumulative indexing throttle time across primary shards |                                              |     0           |    min |
|                        Cumulative merge time of primary shards |                                              |    18.5745      |    min |
|                       Cumulative merge count of primary shards |                                              |     2           |        |
|                Min cumulative merge time across primary shards |                                              |     9.27853     |    min |
|             Median cumulative merge time across primary shards |                                              |     9.28723     |    min |
|                Max cumulative merge time across primary shards |                                              |     9.29592     |    min |
|               Cumulative merge throttle time of primary shards |                                              |     0           |    min |
|       Min cumulative merge throttle time across primary shards |                                              |     0           |    min |
|    Median cumulative merge throttle time across primary shards |                                              |     0           |    min |
|       Max cumulative merge throttle time across primary shards |                                              |     0           |    min |
|                      Cumulative refresh time of primary shards |                                              |    13.4259      |    min |
|                     Cumulative refresh count of primary shards |                                              |    37           |        |
|              Min cumulative refresh time across primary shards |                                              |     6.5072      |    min |
|           Median cumulative refresh time across primary shards |                                              |     6.71294     |    min |
|              Max cumulative refresh time across primary shards |                                              |     6.91868     |    min |
|                        Cumulative flush time of primary shards |                                              |    14.3902      |    min |
|                       Cumulative flush count of primary shards |                                              |    10           |        |
|                Min cumulative flush time across primary shards |                                              |     6.84495     |    min |
|             Median cumulative flush time across primary shards |                                              |     7.19508     |    min |
|                Max cumulative flush time across primary shards |                                              |     7.54522     |    min |
|                                        Total Young Gen GC time |                                              |     1.005       |      s |
|                                       Total Young Gen GC count |                                              |    67           |        |
|                                          Total Old Gen GC time |                                              |     0           |      s |
|                                         Total Old Gen GC count |                                              |     0           |        |
|                                                     Store size |                                              |     1.24487     |     GB |
|                                                  Translog size |                                              |     1.02445e-07 |     GB |
|                                         Heap used for segments |                                              |     0           |     MB |
|                                       Heap used for doc values |                                              |     0           |     MB |
|                                            Heap used for terms |                                              |     0           |     MB |
|                                            Heap used for norms |                                              |     0           |     MB |
|                                           Heap used for points |                                              |     0           |     MB |
|                                    Heap used for stored fields |                                              |     0           |     MB |
|                                                  Segment count |                                              |     2           |        |
|                                    Total Ingest Pipeline count |                                              |     0           |        |
|                                     Total Ingest Pipeline time |                                              |     0           |      s |
|                                   Total Ingest Pipeline failed |                                              |     0           |        |
|                                                 Min Throughput |                                 index-append | 15188.1         | docs/s |
|                                                Mean Throughput |                                 index-append | 15978.4         | docs/s |
|                                              Median Throughput |                                 index-append | 16098.2         | docs/s |
|                                                 Max Throughput |                                 index-append | 16333.7         | docs/s |
|                                        50th percentile latency |                                 index-append |   271.098       |     ms |
|                                        90th percentile latency |                                 index-append |   291.438       |     ms |
|                                        99th percentile latency |                                 index-append |   451.431       |     ms |
|                                       100th percentile latency |                                 index-append |   525.137       |     ms |
|                                   50th percentile service time |                                 index-append |   271.098       |     ms |
|                                   90th percentile service time |                                 index-append |   291.438       |     ms |
|                                   99th percentile service time |                                 index-append |   451.431       |     ms |
|                                  100th percentile service time |                                 index-append |   525.137       |     ms |
|                                                     error rate |                                 index-append |     0           |      % |
|                                                 Min Throughput |        index-update-concurrent-with-searches | 16110.3         | docs/s |
|                                                Mean Throughput |        index-update-concurrent-with-searches | 16535.5         | docs/s |
|                                              Median Throughput |        index-update-concurrent-with-searches | 16556.2         | docs/s |
|                                                 Max Throughput |        index-update-concurrent-with-searches | 16868.9         | docs/s |
|                                        50th percentile latency |        index-update-concurrent-with-searches |   269.637       |     ms |
|                                        90th percentile latency |        index-update-concurrent-with-searches |   326.219       |     ms |
|                                       100th percentile latency |        index-update-concurrent-with-searches |   452.553       |     ms |
|                                   50th percentile service time |        index-update-concurrent-with-searches |   269.637       |     ms |
|                                   90th percentile service time |        index-update-concurrent-with-searches |   326.219       |     ms |
|                                  100th percentile service time |        index-update-concurrent-with-searches |   452.553       |     ms |
|                                                     error rate |        index-update-concurrent-with-searches |     0           |      % |
|                                                 Min Throughput | knn-search-100-1000-concurrent-with-indexing |    29.67        |  ops/s |
|                                                Mean Throughput | knn-search-100-1000-concurrent-with-indexing |    31.7         |  ops/s |
|                                              Median Throughput | knn-search-100-1000-concurrent-with-indexing |    31.91        |  ops/s |
|                                                 Max Throughput | knn-search-100-1000-concurrent-with-indexing |    32.98        |  ops/s |
|                                        50th percentile latency | knn-search-100-1000-concurrent-with-indexing |    27.9522      |     ms |
|                                        90th percentile latency | knn-search-100-1000-concurrent-with-indexing |    29.946       |     ms |
|                                        99th percentile latency | knn-search-100-1000-concurrent-with-indexing |    39.0829      |     ms |
|                                       100th percentile latency | knn-search-100-1000-concurrent-with-indexing |    49.5132      |     ms |
|                                   50th percentile service time | knn-search-100-1000-concurrent-with-indexing |    27.9522      |     ms |
|                                   90th percentile service time | knn-search-100-1000-concurrent-with-indexing |    29.946       |     ms |
|                                   99th percentile service time | knn-search-100-1000-concurrent-with-indexing |    39.0829      |     ms |
|                                  100th percentile service time | knn-search-100-1000-concurrent-with-indexing |    49.5132      |     ms |
|                                                     error rate | knn-search-100-1000-concurrent-with-indexing |     0           |      % |
|                                                 Min Throughput |          knn-search-10-50-before-force-merge |   198.13        |  ops/s |
|                                                Mean Throughput |          knn-search-10-50-before-force-merge |   198.13        |  ops/s |
|                                              Median Throughput |          knn-search-10-50-before-force-merge |   198.13        |  ops/s |
|                                                 Max Throughput |          knn-search-10-50-before-force-merge |   198.13        |  ops/s |
|                                        50th percentile latency |          knn-search-10-50-before-force-merge |     4.21267     |     ms |
|                                        90th percentile latency |          knn-search-10-50-before-force-merge |     5.21381     |     ms |
|                                        99th percentile latency |          knn-search-10-50-before-force-merge |     6.8245      |     ms |
|                                       100th percentile latency |          knn-search-10-50-before-force-merge |     7.42229     |     ms |
|                                   50th percentile service time |          knn-search-10-50-before-force-merge |     4.21267     |     ms |
|                                   90th percentile service time |          knn-search-10-50-before-force-merge |     5.21381     |     ms |
|                                   99th percentile service time |          knn-search-10-50-before-force-merge |     6.8245      |     ms |
|                                  100th percentile service time |          knn-search-10-50-before-force-merge |     7.42229     |     ms |
|                                                     error rate |          knn-search-10-50-before-force-merge |     0           |      % |
|                                                 Min Throughput |         knn-search-10-100-before-force-merge |   177.64        |  ops/s |
|                                                Mean Throughput |         knn-search-10-100-before-force-merge |   177.64        |  ops/s |
|                                              Median Throughput |         knn-search-10-100-before-force-merge |   177.64        |  ops/s |
|                                                 Max Throughput |         knn-search-10-100-before-force-merge |   177.64        |  ops/s |
|                                        50th percentile latency |         knn-search-10-100-before-force-merge |     4.29077     |     ms |
|                                        90th percentile latency |         knn-search-10-100-before-force-merge |     5.67705     |     ms |
|                                        99th percentile latency |         knn-search-10-100-before-force-merge |     7.0342      |     ms |
|                                       100th percentile latency |         knn-search-10-100-before-force-merge |     7.43758     |     ms |
|                                   50th percentile service time |         knn-search-10-100-before-force-merge |     4.29077     |     ms |
|                                   90th percentile service time |         knn-search-10-100-before-force-merge |     5.67705     |     ms |
|                                   99th percentile service time |         knn-search-10-100-before-force-merge |     7.0342      |     ms |
|                                  100th percentile service time |         knn-search-10-100-before-force-merge |     7.43758     |     ms |
|                                                     error rate |         knn-search-10-100-before-force-merge |     0           |      % |
|                                                 Min Throughput |       knn-search-100-1000-before-force-merge |    26.17        |  ops/s |
|                                                Mean Throughput |       knn-search-100-1000-before-force-merge |    27.19        |  ops/s |
|                                              Median Throughput |       knn-search-100-1000-before-force-merge |    27.32        |  ops/s |
|                                                 Max Throughput |       knn-search-100-1000-before-force-merge |    27.94        |  ops/s |
|                                        50th percentile latency |       knn-search-100-1000-before-force-merge |    33.8932      |     ms |
|                                        90th percentile latency |       knn-search-100-1000-before-force-merge |    35.1241      |     ms |
|                                        99th percentile latency |       knn-search-100-1000-before-force-merge |    36.9389      |     ms |
|                                       100th percentile latency |       knn-search-100-1000-before-force-merge |    38.2534      |     ms |
|                                   50th percentile service time |       knn-search-100-1000-before-force-merge |    33.8932      |     ms |
|                                   90th percentile service time |       knn-search-100-1000-before-force-merge |    35.1241      |     ms |
|                                   99th percentile service time |       knn-search-100-1000-before-force-merge |    36.9389      |     ms |
|                                  100th percentile service time |       knn-search-100-1000-before-force-merge |    38.2534      |     ms |
|                                                     error rate |       knn-search-100-1000-before-force-merge |     0           |      % |
|                                                 Min Throughput |        script-score-query-before-force-merge |     4.56        |  ops/s |
|                                                Mean Throughput |        script-score-query-before-force-merge |     4.63        |  ops/s |
|                                              Median Throughput |        script-score-query-before-force-merge |     4.63        |  ops/s |
|                                                 Max Throughput |        script-score-query-before-force-merge |     4.67        |  ops/s |
|                                        50th percentile latency |        script-score-query-before-force-merge |   208.236       |     ms |
|                                        90th percentile latency |        script-score-query-before-force-merge |   209.456       |     ms |
|                                        99th percentile latency |        script-score-query-before-force-merge |   210.977       |     ms |
|                                       100th percentile latency |        script-score-query-before-force-merge |   211.096       |     ms |
|                                   50th percentile service time |        script-score-query-before-force-merge |   208.236       |     ms |
|                                   90th percentile service time |        script-score-query-before-force-merge |   209.456       |     ms |
|                                   99th percentile service time |        script-score-query-before-force-merge |   210.977       |     ms |
|                                  100th percentile service time |        script-score-query-before-force-merge |   211.096       |     ms |
|                                                     error rate |        script-score-query-before-force-merge |     0           |      % |
|                                                 Min Throughput |                                  force-merge |     0           |  ops/s |
|                                                Mean Throughput |                                  force-merge |     0           |  ops/s |
|                                              Median Throughput |                                  force-merge |     0           |  ops/s |
|                                                 Max Throughput |                                  force-merge |     0           |  ops/s |
|                                       100th percentile latency |                                  force-merge |     1.11501e+06 |     ms |
|                                  100th percentile service time |                                  force-merge |     1.11501e+06 |     ms |
|                                                     error rate |                                  force-merge |     0           |      % |
|                                                 Min Throughput |                             knn-search-10-50 |   176.93        |  ops/s |
|                                                Mean Throughput |                             knn-search-10-50 |   176.93        |  ops/s |
|                                              Median Throughput |                             knn-search-10-50 |   176.93        |  ops/s |
|                                                 Max Throughput |                             knn-search-10-50 |   176.93        |  ops/s |
|                                        50th percentile latency |                             knn-search-10-50 |     1.37304     |     ms |
|                                        90th percentile latency |                             knn-search-10-50 |     2.30238     |     ms |
|                                        99th percentile latency |                             knn-search-10-50 |     3.68        |     ms |
|                                       100th percentile latency |                             knn-search-10-50 |     3.69208     |     ms |
|                                   50th percentile service time |                             knn-search-10-50 |     1.37304     |     ms |
|                                   90th percentile service time |                             knn-search-10-50 |     2.30238     |     ms |
|                                   99th percentile service time |                             knn-search-10-50 |     3.68        |     ms |
|                                  100th percentile service time |                             knn-search-10-50 |     3.69208     |     ms |
|                                                     error rate |                             knn-search-10-50 |     0           |      % |
|                                                 Min Throughput |                            knn-search-10-100 |   173.96        |  ops/s |
|                                                Mean Throughput |                            knn-search-10-100 |   173.96        |  ops/s |
|                                              Median Throughput |                            knn-search-10-100 |   173.96        |  ops/s |
|                                                 Max Throughput |                            knn-search-10-100 |   173.96        |  ops/s |
|                                        50th percentile latency |                            knn-search-10-100 |     1.86        |     ms |
|                                        90th percentile latency |                            knn-search-10-100 |     4.97013     |     ms |
|                                        99th percentile latency |                            knn-search-10-100 |     8.36762     |     ms |
|                                       100th percentile latency |                            knn-search-10-100 |     8.56546     |     ms |
|                                   50th percentile service time |                            knn-search-10-100 |     1.86        |     ms |
|                                   90th percentile service time |                            knn-search-10-100 |     4.97013     |     ms |
|                                   99th percentile service time |                            knn-search-10-100 |     8.36762     |     ms |
|                                  100th percentile service time |                            knn-search-10-100 |     8.56546     |     ms |
|                                                     error rate |                            knn-search-10-100 |     0           |      % |
|                                                 Min Throughput |                          knn-search-100-1000 |    53.83        |  ops/s |
|                                                Mean Throughput |                          knn-search-100-1000 |    53.83        |  ops/s |
|                                              Median Throughput |                          knn-search-100-1000 |    53.83        |  ops/s |
|                                                 Max Throughput |                          knn-search-100-1000 |    53.83        |  ops/s |
|                                        50th percentile latency |                          knn-search-100-1000 |     5.9989      |     ms |
|                                        90th percentile latency |                          knn-search-100-1000 |     7.60146     |     ms |
|                                        99th percentile latency |                          knn-search-100-1000 |    10.6784      |     ms |
|                                       100th percentile latency |                          knn-search-100-1000 |    10.9691      |     ms |
|                                   50th percentile service time |                          knn-search-100-1000 |     5.9989      |     ms |
|                                   90th percentile service time |                          knn-search-100-1000 |     7.60146     |     ms |
|                                   99th percentile service time |                          knn-search-100-1000 |    10.6784      |     ms |
|                                  100th percentile service time |                          knn-search-100-1000 |    10.9691      |     ms |
|                                                     error rate |                          knn-search-100-1000 |     0           |      % |
|                                                 Min Throughput |                           script-score-query |     4.6         |  ops/s |
|                                                Mean Throughput |                           script-score-query |     4.66        |  ops/s |
|                                              Median Throughput |                           script-score-query |     4.67        |  ops/s |
|                                                 Max Throughput |                           script-score-query |     4.7         |  ops/s |
|                                        50th percentile latency |                           script-score-query |   207.542       |     ms |
|                                        90th percentile latency |                           script-score-query |   208.814       |     ms |
|                                        99th percentile latency |                           script-score-query |   211.765       |     ms |
|                                       100th percentile latency |                           script-score-query |   215.326       |     ms |
|                                   50th percentile service time |                           script-score-query |   207.542       |     ms |
|                                   90th percentile service time |                           script-score-query |   208.814       |     ms |
|                                   99th percentile service time |                           script-score-query |   211.765       |     ms |
|                                  100th percentile service time |                           script-score-query |   215.326       |     ms |
|                                                     error rate |                           script-score-query |     0           |      % |

Some interesting observations:

indexing doesn't seem to be affected by concurrent searches
searches are fastest on a single segment: 5.9-10.9 ms
searches are several times slower on multiple segments: 33-38 ms
searches concurrent indexing has some effect on searches, but not significant (most searches performed the same, the slowest 1% up to 25% slower): 27.9 - 49.5 ms

| 50th percentile service time | knn-search-100-1000-concurrent-with-indexing |    27.9522      |     ms |
| 90th percentile service time | knn-search-100-1000-concurrent-with-indexing |    29.946       |     ms |
| 99th percentile service time | knn-search-100-1000-concurrent-with-indexing |    39.0829      |     ms |
|100th percentile service time | knn-search-100-1000-concurrent-with-indexing |    49.5132      |     ms |

| 50th percentile service time |       knn-search-100-1000-before-force-merge |    33.8932      |     ms |
| 90th percentile service time |       knn-search-100-1000-before-force-merge |    35.1241      |     ms |
| 99th percentile service time |       knn-search-100-1000-before-force-merge |    36.9389      |     ms |
|100th percentile service time |       knn-search-100-1000-before-force-merge |    38.2534      |     ms |

| 50th percentile service time |                          knn-search-100-1000 |     5.9989      |     ms |
| 90th percentile service time |                          knn-search-100-1000 |     7.60146     |     ms |
| 99th percentile service time |                          knn-search-100-1000 |    10.6784      |     ms |
|100th percentile service time |                          knn-search-100-1000 |    10.9691      |     ms |

DJRickyB

I find the new approach simple and really interesting. I left a few suggestions, approving in case you think we are good without them, otherwise I'm happy to take another look

dense_vector/challenges/default.json

mayya-sharipova · 2022-05-11T14:34:51Z

@DJRickyB Thanks for your feedback.

@jtibshirani Julie, do you want to have a final look at the updated track to see that we are good with it?

jtibshirani

Thanks @mayya-sharipova! It looks good to me too, I just wanted to check I understand what's happening:

We first index 2 million docs (20% of the full dataset of 10 million).
Then we reindex the first 500,000 docs while running knn-search-100-1000 in parallel.

Some questions:

In the parallel section, how many iterations of the search are run?
I guess this affects the next tasks knn-search-10-50-before-force-merge compared to before this change, because now there will be more segments. This seems fine, we just might see the benchmark results change a bit.

dense_vector/challenges/default.json

jtibshirani · 2022-05-11T23:33:26Z

One last thought: the latency for knn-search-100-1000-concurrent-with-indexing and knn-search-100-1000-before-force-merge is really similar. I wonder if it'd be more interesting if we added a wait-for-merges-to-finish before running the before-force-merge searches. That way we would see what searches look like in a "steady state" where you index all documents, have a pause, then start searching (but don't want the expense of a force merge?)

mayya-sharipova · 2022-05-16T20:39:37Z

@jtibshirani Thanks for your feedback. Answering your questions:

I just wanted to check I understand what's happening:
We first index 2 million docs (20% of the full dataset of 10 million).
Then we reindex the first 500,000 docs while running knn-search-100-1000 in parallel.

Yes, indeed that's the goal. But in reality, when this parallel section finishes only around 100,000 is available for search, as it takes some time for all the refreshes to catch up. That's why after that parallel concurrent section, we have another refresh refresh-after-update to ensure that all before-force-merge see 2.5 million docs.

In the parallel section, how many iterations of the search are run?

This section runs around 30 secs, with around 1000 search and 100 index operations.

I guess this affects the next tasks knn-search-10-50-before-force-merge compared to before this change, because now there will be more segments. This seems fine, we just might see the benchmark results change a bit.

Indeed, now before_force_merge changes will see more segments and docs. We should indeed explain this in changed benchmarking results.

I'm not sure if this matters, but curious why we don't use the default "refresh": false?

I thought using "refresh" : "wait_for" will allow us to immediately see indexed documents for searches, but it turns out not to be the case, and I did not see the number of segments and documents available for search different between refresh:wait_for and the default refresh:false. So I've decided to follow your suggestion and keep the default behaviour.
Addressed in cd8ea41.

the latency for knn-search-100-1000-concurrent-with-indexing and knn-search-100-1000-before-force-merge is really similar. I wonder if it'd be more interesting if we added a wait-for-merges-to-finish before running the before-force-merge searches. That way we would see what searches look like in a "steady state" where you index all documents, have a pause, then start searching (but don't want the expense of a force merge?)

We do achieve "steady state" before we start before-force-merge searches. We have refresh-after-index operation that makes sure searches see all 2.5 million docs.

I also experimented with adding wait-until-merges-finish, but it did not make any difference, because there are no merges occurring until the final force_merge, as new updates only add new segments.

Here are the index stats before we start searches after all indexing.

index stats

{
    "primaries":
    {
        "docs":
        {
            "count": 2500000,
            "deleted": 0
        },
        "shard_stats":
        {
            "total_count": 2
        },
        "store":
        {
            "size_in_bytes": 5303589582,
            "total_data_set_size_in_bytes": 5303589582,
            "reserved_in_bytes": 0
        },
        "indexing":
        {
            "index_total": 2500000,
            "index_time_in_millis": 173547,
            "index_current": 0,
            "index_failed": 0,
            "delete_total": 0,
            "delete_time_in_millis": 0,
            "delete_current": 0,
            "noop_update_total": 0,
            "is_throttled": false,
            "throttle_time_in_millis": 204404
        },
        "get":
        {
            "total": 0,
            "time_in_millis": 0,
            "exists_total": 0,
            "exists_time_in_millis": 0,
            "missing_total": 0,
            "missing_time_in_millis": 0,
            "current": 0
        },
        "search":
        {
            "open_contexts": 0,
            "query_total": 2634,
            "query_time_in_millis": 50926,
            "query_current": 0,
            "fetch_total": 2634,
            "fetch_time_in_millis": 4729,
            "fetch_current": 0,
            "scroll_total": 0,
            "scroll_time_in_millis": 0,
            "scroll_current": 0,
            "suggest_total": 0,
            "suggest_time_in_millis": 0,
            "suggest_current": 0
        },
        "merges":
        {
            "current": 0,
            "current_docs": 0,
            "current_size_in_bytes": 0,
            "total": 0,
            "total_time_in_millis": 0,
            "total_docs": 0,
            "total_size_in_bytes": 0,
            "total_stopped_time_in_millis": 0,
            "total_throttled_time_in_millis": 0,
            "total_auto_throttle_in_bytes": 41943040
        },
        "refresh":
        {
            "total": 28,
            "total_time_in_millis": 901478,
            "external_total": 20,
            "external_total_time_in_millis": 823759,
            "listeners": 0
        },
        "flush":
        {
            "total": 8,
            "periodic": 8,
            "total_time_in_millis": 959008
        },
        "warmer":
        {
            "current": 0,
            "total": 17,
            "total_time_in_millis": 11
        },
        "query_cache":
        {
            "memory_size_in_bytes": 0,
            "total_count": 0,
            "hit_count": 0,
            "miss_count": 0,
            "cache_size": 0,
            "cache_count": 0,
            "evictions": 0
        },
        "fielddata":
        {
            "memory_size_in_bytes": 0,
            "evictions": 0
        },
        "completion":
        {
            "size_in_bytes": 0
        },
        "segments":
        {
            "count": 17,
            "memory_in_bytes": 0,
            "terms_memory_in_bytes": 0,
            "stored_fields_memory_in_bytes": 0,
            "term_vectors_memory_in_bytes": 0,
            "norms_memory_in_bytes": 0,
            "points_memory_in_bytes": 0,
            "doc_values_memory_in_bytes": 0,
            "index_writer_memory_in_bytes": 0,
            "version_map_memory_in_bytes": 0,
            "fixed_bit_set_memory_in_bytes": 0,
            "max_unsafe_auto_id_timestamp": -1,
            "file_sizes":
            {}
        },
        "translog":
        {
            "operations": 0,
            "size_in_bytes": 110,
            "uncommitted_operations": 0,
            "uncommitted_size_in_bytes": 110,
            "earliest_last_modified_age": 91415
        },
        "request_cache":
        {
            "memory_size_in_bytes": 0,
            "evictions": 0,
            "hit_count": 0,
            "miss_count": 0
        },
        "recovery":
        {
            "current_as_source": 0,
            "current_as_target": 0,
            "throttle_time_in_millis": 0
        },
        "bulk":
        {
            "total_operations": 1000,
            "total_time_in_millis": 176705,
            "total_size_in_bytes": 5223807521,
            "avg_time_in_millis": 166,
            "avg_size_in_bytes": 5223841
        }
    }
}

Also it doesn't change performance:

Without waiting for merges to stabilize:
| 50th percentile service time | knn-search-100-1000-before-force-merge | 31.2539 | ms |
| 90th percentile service time | knn-search-100-1000-before-force-merge | 32.4008 | ms |
| 99th percentile service time | knn-search-100-1000-before-force-merge | 34.4963 | ms |
| 100th percentile service time | knn-search-100-1000-before-force-merge | 35.7168 | ms |

With waiting for merges to stabilize:
| 50th percentile service time | knn-search-100-1000-before-force-merge | 31.1528 | ms |
| 90th percentile service time | knn-search-100-1000-before-force-merge | 32.4756 | ms |
| 99th percentile service time | knn-search-100-1000-before-force-merge | 35.4727 | ms |
| 100th percentile service time | knn-search-100-1000-before-force-merge | 36.6767 | ms |

jtibshirani · 2022-05-16T21:11:06Z

That's interesting about wait_for, I also thought it would ensure the docs were visible before returning.

It looks like all merges were throttled, which maybe explains why wait-until-merges-finish doesn't do anything?

        "merges":
        {
            ...
            "total_auto_throttle_in_bytes": 41943040
        },

Anyways, seems like a great step forward to me! We can always discuss these questions or make tweaks later.

mayya-sharipova · 2022-05-17T13:15:55Z

@jtibshirani Thanks for the feedback. I will confirm with the distributed team experts what's happening with wait_for.

It looks like all merges were throttled, which maybe explains why wait-until-merges-finish doesn't do anything?

    "merges":
    {
        ...
        "total_auto_throttle_in_bytes": 41943040
    },

This is a default value that always gets displayed even for an empty just created index; defaults to 20Mb per shard.

mayya-sharipova · 2022-05-17T13:18:55Z

@DJRickyB Thanks for your feedback on this PR, we are ok to merge it now whenever the timing is good for your team.

DJRickyB

LGTM again :)

Add indexing with concurrent searches

bc245f8

A very common use case is to run indexing at the same time as searches. This patch addresses this use case. We first index 50% of data. During the indexing of rest of data, we also run concurrent searches.

dliappis requested a review from inqueue April 11, 2022 15:22

dliappis added the enhancement label Apr 11, 2022

mayya-sharipova commented Apr 11, 2022

View reviewed changes

DJRickyB reviewed Apr 11, 2022

View reviewed changes

DJRickyB reviewed Apr 20, 2022

View reviewed changes

Add indexing with concurrent searches

6cc7040

Add an operation that updates the current documents and at the same time doing knn searches on them.

DJRickyB approved these changes May 10, 2022

View reviewed changes

dense_vector/challenges/default.json Outdated Show resolved Hide resolved

dense_vector/challenges/default.json Outdated Show resolved Hide resolved

dense_vector/challenges/default.json Outdated Show resolved Hide resolved

Address Rick's feedback

5d96d31

jtibshirani reviewed May 11, 2022

View reviewed changes

dense_vector/challenges/default.json Outdated Show resolved Hide resolved

Address Julie's feedback

cd8ea41

jtibshirani approved these changes May 16, 2022

View reviewed changes

DJRickyB approved these changes May 17, 2022

View reviewed changes

mayya-sharipova merged commit f7597fe into elastic:master May 17, 2022

mayya-sharipova deleted the concurrent-indexing-searches branch May 17, 2022 13:46

b-deam mentioned this pull request May 18, 2022

Remove unused track param from dense vector's README #269

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add indexing with concurrent searches #267

Add indexing with concurrent searches #267

mayya-sharipova commented Apr 11, 2022 •

edited

mayya-sharipova Apr 11, 2022

DJRickyB Apr 11, 2022

mayya-sharipova Apr 11, 2022

DJRickyB Apr 11, 2022

mayya-sharipova commented Apr 11, 2022

mayya-sharipova commented Apr 11, 2022 •

edited

mayya-sharipova commented Apr 20, 2022

DJRickyB left a comment

DJRickyB Apr 20, 2022

DJRickyB Apr 20, 2022

DJRickyB Apr 20, 2022

mayya-sharipova commented May 10, 2022

mayya-sharipova commented May 10, 2022 •

edited

DJRickyB left a comment

mayya-sharipova commented May 11, 2022

jtibshirani left a comment •

edited

jtibshirani commented May 11, 2022

mayya-sharipova commented May 16, 2022 •

edited

jtibshirani commented May 16, 2022

mayya-sharipova commented May 17, 2022 •

edited

mayya-sharipova commented May 17, 2022 •

edited

DJRickyB left a comment

	"name": "index-append-concurrent-with-searches",
	"name": "index-update-concurrent-with-searches",

Add indexing with concurrent searches #267

Add indexing with concurrent searches #267

Conversation

mayya-sharipova commented Apr 11, 2022 • edited

mayya-sharipova Apr 11, 2022

Choose a reason for hiding this comment

DJRickyB Apr 11, 2022

Choose a reason for hiding this comment

mayya-sharipova Apr 11, 2022

Choose a reason for hiding this comment

DJRickyB Apr 11, 2022

Choose a reason for hiding this comment

mayya-sharipova commented Apr 11, 2022

mayya-sharipova commented Apr 11, 2022 • edited

mayya-sharipova commented Apr 20, 2022

DJRickyB left a comment

Choose a reason for hiding this comment

DJRickyB Apr 20, 2022

Choose a reason for hiding this comment

DJRickyB Apr 20, 2022

Choose a reason for hiding this comment

DJRickyB Apr 20, 2022

Choose a reason for hiding this comment

mayya-sharipova commented May 10, 2022

mayya-sharipova commented May 10, 2022 • edited

DJRickyB left a comment

Choose a reason for hiding this comment

mayya-sharipova commented May 11, 2022

jtibshirani left a comment • edited

Choose a reason for hiding this comment

jtibshirani commented May 11, 2022

mayya-sharipova commented May 16, 2022 • edited

jtibshirani commented May 16, 2022

mayya-sharipova commented May 17, 2022 • edited

mayya-sharipova commented May 17, 2022 • edited

DJRickyB left a comment

Choose a reason for hiding this comment

mayya-sharipova commented Apr 11, 2022 •

edited

mayya-sharipova commented Apr 11, 2022 •

edited

mayya-sharipova commented May 10, 2022 •

edited

jtibshirani left a comment •

edited

mayya-sharipova commented May 16, 2022 •

edited

mayya-sharipova commented May 17, 2022 •

edited

mayya-sharipova commented May 17, 2022 •

edited