-
Notifications
You must be signed in to change notification settings - Fork 101
Added information about the Warmup API #393
Conversation
docs/knn/warmup.md
Outdated
| --- | ||
|
|
||
| # Warmup API | ||
| ## Overview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just delete this ## Overview line. We generally try to avoid stacked headers, and in this case, it's fair to assume the introduction is an overview.
docs/knn/warmup.md
Outdated
|
|
||
| # Warmup API | ||
| ## Overview | ||
| The HNSW graphs used to perform k-Approximate Nearest Neighbor Search are stored as `.hnsw` files with other Lucene segment files. In order to perform search on these graphs, they need to be loaded into native memory. If the graphs have not yet been loaded into native memory, upon search, they will first be loaded and then searched. This can cause high latency during initial queries. To avoid this, users will often run random queries during a warmup period. After this warmup period, the graphs will be loaded into native memory and their production workloads can begin. This process is indirect and requires extra effort. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tip here: whenever you use "this" or "that" as a demonstrative pronoun, force yourself to add a noun afterwards. The difference in clarity is huge.
- "This can cause high latency" vs. "This loading time can cause high latency"
- "To avoid this" vs. "To avoid this situation"
docs/knn/warmup.md
Outdated
|
|
||
| # Warmup API | ||
| ## Overview | ||
| The HNSW graphs used to perform k-Approximate Nearest Neighbor Search are stored as `.hnsw` files with other Lucene segment files. In order to perform search on these graphs, they need to be loaded into native memory. If the graphs have not yet been loaded into native memory, upon search, they will first be loaded and then searched. This can cause high latency during initial queries. To avoid this, users will often run random queries during a warmup period. After this warmup period, the graphs will be loaded into native memory and their production workloads can begin. This process is indirect and requires extra effort. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid future tense. So something like "users often run random queries during a warmup period to load graphs into native memory. After this warmup period, they can start their production workloads. This process is indirect and requires extra effort."
docs/knn/warmup.md
Outdated
| ## Overview | ||
| The HNSW graphs used to perform k-Approximate Nearest Neighbor Search are stored as `.hnsw` files with other Lucene segment files. In order to perform search on these graphs, they need to be loaded into native memory. If the graphs have not yet been loaded into native memory, upon search, they will first be loaded and then searched. This can cause high latency during initial queries. To avoid this, users will often run random queries during a warmup period. After this warmup period, the graphs will be loaded into native memory and their production workloads can begin. This process is indirect and requires extra effort. | ||
|
|
||
| As an alternative, you can run the k-NN plugin's warmup API on whatever indices you are interested in searching over. This API will load all the graphs for all of the shards (primaries and replicas) of all the indices specified in the request into native memory. After this process completes, you will be able to start searching against their indices with no initial latency penalties. The warmup API is idempotent, so if a segment's graphs are already loaded into memory, this operation will have no impact on them. It only loads graphs that are not currently in memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... interested in searching.
This API loads...
After this process finishes, you can start searching against...
... this operation has no impact on them.
docs/knn/warmup.md
Outdated
| As an alternative, you can run the k-NN plugin's warmup API on whatever indices you are interested in searching over. This API will load all the graphs for all of the shards (primaries and replicas) of all the indices specified in the request into native memory. After this process completes, you will be able to start searching against their indices with no initial latency penalties. The warmup API is idempotent, so if a segment's graphs are already loaded into memory, this operation will have no impact on them. It only loads graphs that are not currently in memory. | ||
|
|
||
| ## Usage | ||
| This command will perform warmup on index1, index2, and index3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This request performs a warmup on three indices:
docs/knn/warmup.md
Outdated
| ``` | ||
| `total` indicates how many shards the warmup operation was performed on. `successful` indicates how many shards succeeded and `failed` indicates how many shards have failed. | ||
|
|
||
| The call will not return until the warmup operation is complete or the request times out. If the request times out, the operation will still be going on in the cluster. To monitor this, use the Elasticsearch `_tasks` API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call does not return a response until...
... the operation still continues on the cluster...
Maybe include a sample call to the _tasks API if we don't have anything to link to.
docs/knn/warmup.md
Outdated
|
|
||
| The call will not return until the warmup operation is complete or the request times out. If the request times out, the operation will still be going on in the cluster. To monitor this, use the Elasticsearch `_tasks` API. | ||
|
|
||
| Following the completion of the operation, use the k-NN `_stats` API to see what has been loaded into the graph. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to the Settings and statistics page.
to see what the plugin loaded into the graph.
docs/knn/warmup.md
Outdated
| Following the completion of the operation, use the k-NN `_stats` API to see what has been loaded into the graph. | ||
|
|
||
| ## Best practices | ||
| In order for the warmup API to function properly, you need to follow a few best practices. First, you should not be running any merge operations on the indices you want to warm up. The reason for this is that, during merge, the k-NN plugin creates new segments, and old segments are (sometimes) deleted. You may see the situation where the warmup API loads graphs A and B into native memory, but segment C is created from segments A and B being merged. The graphs for A and B will no longer be in memory and neither will the graph for C. Then, the initial penalty of loading graph C on the first queries will still be present. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make these commands.
"For the warmup API to function properly, follow these best practices. First, do not run merge operations on indices that you want to warm up. During merge, the k-NN plugin creates new segments, and..."
"For example, you could encounter a situation in which the warmup API loads graphs A and B..."
"In this case, the initial penalty for loading graph C is still present."
docs/knn/warmup.md
Outdated
| ## Best practices | ||
| In order for the warmup API to function properly, you need to follow a few best practices. First, you should not be running any merge operations on the indices you want to warm up. The reason for this is that, during merge, the k-NN plugin creates new segments, and old segments are (sometimes) deleted. You may see the situation where the warmup API loads graphs A and B into native memory, but segment C is created from segments A and B being merged. The graphs for A and B will no longer be in memory and neither will the graph for C. Then, the initial penalty of loading graph C on the first queries will still be present. | ||
|
|
||
| Second, you should first confirm that all of the graphs of interest can fit into native memory before running warmup. If they cannot all fit into memory, the cache will thrash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Second, confirm that all graphs you want to warm up can fit into native memory. See the knn.memory.circuit_breaker.limit statistic for guidance. High graph memory usage causes cache thrashing."
docs/knn/warmup.md
Outdated
|
|
||
| Second, you should first confirm that all of the graphs of interest can fit into native memory before running warmup. If they cannot all fit into memory, the cache will thrash. | ||
|
|
||
| Lastly, you should not index any documents you want to load into the cache. Writing new information to segments prevents the Warmup API from loading the graphs until they are searchable, so you would have to run the Warmup API again after indexing is complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lastly, do not index any documentations you want to load into the cache.
prevents the warmup API
run the warmup API again after indexing finished.
docs/knn/settings.md
Outdated
| `script_query_requests` | The number of query requests that use [the KNN script](../#custom-scoring). | ||
| `script_query_errors` | The number of errors during script queries. | ||
|
|
||
| ## Tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I think this is just the wrong spot for this content. The _tasks API isn't specific to KNN; any content on it should just go into the Elasticsearch section. That said, documenting the entire _tasks API is a large task in and of itself, so I think the right call here is to just include the simple GET call in the other file, like this:
"
To monitor the warmup operation, call the Elasticsearch _tasks API:
GET _tasks
"
You can omit the response, revert the header changes, and we'll (you'll?) eventually document the _tasks API fully under the Elasticsearch header.
docs/knn/settings.md
Outdated
| GET /_tasks | ||
| ``` | ||
|
|
||
| This sample request returns the tasks currently running on a node named `odfe-node1`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove and possibly save somewhere on your system for later use.
docs/knn/warmup.md
Outdated
|
|
||
| `total` indicates how many shards the k-NN plugin attempted to warm up. The response also includes the number of shards the plugin succeeded and failed to warm up. | ||
|
|
||
| The call does not return until the warmup operation is complete or the request times out. If the request times out, the operation still continues on the cluster. To monitor the warmup operation, use the [Elasticsearch `_tasks` API](../settings#tasks). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove link
docs/knn/warmup.md
Outdated
|
|
||
| # Warmup API | ||
|
|
||
| The HNSW graphs used to perform k-Approximate Nearest Neighbor Search are stored as `.hnsw` files with other Lucene segment files. In order to perform search on these graphs, they need to be loaded into native memory. If the graphs have not yet been loaded into native memory, upon search, they will first be loaded and then searched. This loading time can cause high latency during initial queries. To avoid this situation, users will often run random queries during a warmup period. After this warmup period, the graphs will be loaded into native memory and their production workloads can begin. This loading process is indirect and requires extra effort. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple more instances of future tense that I missed before:
"If the graphs have not yet been loaded into native memory, upon search, they will first be loaded and then searched." -> "If the plugin has not yet loaded the graphs into native memory, it loads them when it receives a search request. This loading time..."
"To avoid this situation, users will often run random queries during a warmup period." -> To avoid this situation, users often run random queries...
Issue# 367: Add information on KNN Warmup API
Description of changes: Added new page of information about the Warmup API
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.