-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOCS] add docs for async search #53675
Changes from 4 commits
5a7adf0
b2fe1a4
414c444
b74707d
bbebf3c
4635ba2
08639b9
134b02f
c57c16c
d7c0073
d804de0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
[role="xpack"] | ||
[testenv="basic"] | ||
[[async-search-intro]] | ||
== Long-running searches | ||
|
||
{es} generally allows you to quickly search across big amounts of data. There are | ||
situations where a search executes on many many shards, possibly against | ||
<<frozen-indices,frozen indices>> and spanning multiple | ||
<<modules-remote-clusters,remote clusters>>, for which | ||
results are not expected to be returned in milliseconds. When you need to | ||
execute long-running searches, synchronously | ||
waiting for its results to be returned is not ideal. Instead, Async search lets | ||
you submit a search request that gets executed _asynchronously_, | ||
monitor the progress of the request, and retrieve results at a later stage. | ||
You can also retrieve partial results as they become available but | ||
before the search has completed. | ||
|
||
You can submit an async search request using the <<submit-async-search,submit | ||
async search>> API. The <<get-async-search,get async search>> API allows you to | ||
monitor the progress of an async search request and retrieve its results. An | ||
ongoing async search can be deleted through the <<delete-async-search,delete | ||
async search>> API. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,200 @@ | ||
[role="xpack"] | ||
[testenv="basic"] | ||
[[async-search]] | ||
=== Async search | ||
|
||
The async search API let you asynchronously execute a | ||
search request, monitor its progress, and retrieve partial results | ||
as they become available. | ||
|
||
[[submit-async-search]] | ||
==== Submit async search API | ||
|
||
Executes a search request asynchronously. It accepts the same | ||
parameters and request body as the <<search-search,search API>>. | ||
|
||
[source,console,id=submit-async-search-date-histogram-example] | ||
-------------------------------------------------- | ||
POST /sales*/_async_search?size=0 | ||
{ | ||
"sort" : [ | ||
{ "date" : {"order" : "asc"} } | ||
], | ||
"aggs" : { | ||
"sale_date" : { | ||
"date_histogram" : { | ||
"field" : "date", | ||
"calendar_interval": "1d" | ||
} | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// TEST[setup:sales] | ||
// TEST[s/size=0/size=0&wait_for_completion=0/] | ||
|
||
The response contains an identifier of the search being executed. | ||
You can use this ID to later retrieve the search's final results. | ||
The currently available search | ||
results are returned as part of the <<search-api-response-body,`response`>> object. | ||
|
||
[source,console-result] | ||
-------------------------------------------------- | ||
{ | ||
"id" : "FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=", <1> | ||
"version" : 0, | ||
"is_partial" : true, <2> | ||
"is_running" : true, <3> | ||
"start_time_in_millis" : 1583945890986, | ||
"expiration_time_in_millis" : 1584377890986, | ||
"response" : { | ||
"took" : 1122, | ||
"timed_out" : false, | ||
"num_reduce_phases" : 0, | ||
"_shards" : { | ||
"total" : 562, <4> | ||
"successful" : 3, <5> | ||
"skipped" : 0, | ||
"failed" : 0 | ||
}, | ||
"hits" : { | ||
"total" : { | ||
"value" : 157483, <6> | ||
"relation" : "gte" | ||
}, | ||
"max_score" : null, | ||
"hits" : [ ] | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// TESTRESPONSE[s/FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=/$body.id/] | ||
// TESTRESPONSE[s/1583945890986/$body.start_time_in_millis/] | ||
// TESTRESPONSE[s/1584377890986/$body.expiration_time_in_millis/] | ||
// TESTRESPONSE[s/"took" : 1122/"took": $body.response.took/] | ||
// TESTRESPONSE[s/"total" : 562/"total": $body.response._shards.total/] | ||
// TESTRESPONSE[s/"successful" : 3/"successful": $body.response._shards.successful/] | ||
// TESTRESPONSE[s/"value" : 157483/"value": $body.response.hits.total.value/] | ||
|
||
<1> Identifier of the async search that can be used to monitor its progress, retrieve its results, and/or delete it. | ||
<2> Whether the returned search results are partial or final | ||
<3> Whether the search is still being executed or it has completed | ||
<4> How many shards the search will be executed on, overall | ||
<5> How many shards have successfully completed the search | ||
<6> How many documents are currently matching the query, which belong to the shards that have already completed the search | ||
|
||
It is possible to block and wait until the search is completed up to a certain | ||
timeout by providing the `wait_for_completion` parameter, which defaults to | ||
`1` second. | ||
|
||
The submit async search API supports the same <<search-search-api-query-params,parameters>> | ||
as the search API, though some have different default values: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can list the ones that can be changed and omit the other. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. agreed, I was planning on looking back to this once we disallow setting pre_filter_shard_size and maybe others, but I can address the docs in the meantime. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
there is already a link. Why document max_concurrent_shard_request here? The default is the same as for search? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. because I thought we would have the exhaustive list of options here rather than linking to the entire search request option. We only accept a subset of it so better to be explicit ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the thing is that there's a ton of options that can be set to the search request. See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html#search-search-api-query-params . I thought it's better to just link to search rather than duplicate its docs, and mention the special cases or important aspects only in the async search docs. Would you rather list all of the search options also here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok for a simple link but we should at least document There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yep that's what I had in mind. batched_reduce_size makes a lot of sense to explain, I already did, I will push shortly |
||
|
||
* `pre_filter_shard_size` defaults to `1` | ||
* `batched_reduce_size` defaults to `5` | ||
* `request_cache` defaults to `true` | ||
* `ccs_minimize_roundtrips` defaults to `false`. | ||
|
||
You can also specify how long the async search needs to be | ||
available through the `keep_alive` parameter, which defaults to `5d` (five days). | ||
Ongoing async searches and any saved search results are deleted after this | ||
period | ||
|
||
WARNING: Async search does not support scroll searches, nor search requests that | ||
only include the `suggest` section. Cross cluster searches can be executed using | ||
async search but only with `ccs_minimize_roundtrips` disabled. | ||
jrodewig marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
[[get-async-search]] | ||
==== Get async search | ||
|
||
The get async search API retrieves the results of a previously | ||
submitted async search request given its id. | ||
|
||
[source,console,id=get-async-search-date-histogram-example] | ||
-------------------------------------------------- | ||
GET /_async_search/FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc= | ||
-------------------------------------------------- | ||
// TEST[continued s/FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=/\${body.id}/] | ||
|
||
[source,console-result] | ||
-------------------------------------------------- | ||
{ | ||
"id" : "FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=", | ||
"version" : 2, <1> | ||
"is_partial" : true, | ||
"is_running" : true, | ||
"start_time_in_millis" : 1583945890986, | ||
"expiration_time_in_millis" : 1584377890986, | ||
"response" : { | ||
"took" : 12144, | ||
"timed_out" : false, | ||
"num_reduce_phases" : 38, | ||
"_shards" : { | ||
"total" : 562, | ||
"successful" : 188, | ||
"skipped" : 0, | ||
"failed" : 0 | ||
}, | ||
"hits" : { | ||
"total" : { | ||
"value" : 456433, | ||
"relation" : "eq" | ||
}, | ||
"max_score" : null, | ||
"hits" : [ ] | ||
}, | ||
"aggregations" : { <2> | ||
"sale_date" : { | ||
"buckets" : [] | ||
} | ||
} | ||
} | ||
} | ||
-------------------------------------------------- | ||
// TESTRESPONSE[s/FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=/$body.id/] | ||
// TESTRESPONSE[s/"is_partial" : true/"is_partial" : false/] | ||
// TESTRESPONSE[s/"is_running" : true/"is_running" : false/] | ||
// TESTRESPONSE[s/1583945890986/$body.start_time_in_millis/] | ||
// TESTRESPONSE[s/1584377890986/$body.expiration_time_in_millis/] | ||
// TESTRESPONSE[s/"took" : 12144/"took": $body.response.took/] | ||
// TESTRESPONSE[s/"total" : 562/"total": $body.response._shards.total/] | ||
// TESTRESPONSE[s/"successful" : 188/"successful": $body.response._shards.successful/] | ||
// TESTRESPONSE[s/"value" : 456433/"value": $body.response.hits.total.value/] | ||
// TESTRESPONSE[s/"buckets" : \[\]/"buckets": $body.response.aggregations.sale_date.buckets/] | ||
// TESTRESPONSE[s/"num_reduce_phases" : 38,//] | ||
|
||
<1> The returned `version` is useful to identify whether the response contains | ||
additional results compared to previously obtained responses. If the version | ||
stays the same, no new results have become available, otherwise a higher version | ||
number indicates that more shards have completed their execution of the query | ||
and their partial results are also included in the response. | ||
<2> Partial aggregations results, coming from the shards that have already | ||
completed the execution of the query. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add an explanation for |
||
NOTE: When results are sorted by a numeric field, shards get sorted based on | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe move this note near to the submit query in order to explain why the example uses a sort ? |
||
minimum and maximum value that they hold for that field, hence partial | ||
results become available following the sort criteria that was requested. | ||
|
||
The `wait_for_completion` parameter, which defaults to `1`, can also be provided | ||
when calling the Get Async Search API, in order to wait for the search to be | ||
completed up until the provided timeout. Final results will be returned if | ||
available before the timeout expires, otherwise the currently available results | ||
will be returned once the timeout expires. | ||
|
||
The `keep_alive` parameter, which defaults to `5d` (five days), specifies | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||
how long the async search should be available in the cluster. When this | ||
period expires, the search, if still running, is cancelled. If the search is | ||
completed, its saved results are deleted. | ||
|
||
[[delete-async-search]] | ||
==== Delete async search | ||
|
||
You can use the delete async search API to manually delete an async search | ||
by ID. If the search is still running, the search request will be cancelled. | ||
Otherwise, the saved search results are deleted. | ||
|
||
[source,console,id=delete-async-search-date-histogram-example] | ||
-------------------------------------------------- | ||
DELETE /_async_search/FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc= | ||
-------------------------------------------------- | ||
// TEST[continued s/FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=/\${body.id}/] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping this would work but it does cause problems. I need to see if this escaping is needed, to my mind it isn't but I may be wrong. In that case I need to only escape unless curly brackets are part of an expression like ${expression}, which I don't look forward to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turns out this is easy to address, there were only a couple of tests where we were using unescaped
|
which we can manually escape instead.