Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] add docs for async search #53675

Merged
merged 11 commits into from
Mar 20, 2020
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -314,10 +314,7 @@ class RestTestsFromSnippetsTask extends SnippetsTask {
if (path == null) {
path = '' // Catch requests to the root...
} else {
// Escape some characters that are also escaped by sense
path = path.replace('<', '%3C').replace('>', '%3E')
path = path.replace('{', '%7B').replace('}', '%7D')
path = path.replace('|', '%7C')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping this would work but it does cause problems. I need to see if this escaping is needed, to my mind it isn't but I may be wrong. In that case I need to only escape unless curly brackets are part of an expression like ${expression}, which I don't look forward to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turns out this is easy to address, there were only a couple of tests where we were using unescaped | which we can manually escape instead.

}
current.println(" - do:")
if (catchPart != null) {
Expand Down
22 changes: 22 additions & 0 deletions docs/reference/async-search.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
[role="xpack"]
[testenv="basic"]
[[async-search-intro]]
== Long-running searches

{es} generally allows you to quickly search across big amounts of data. There are
situations where a search executes on many many shards, possibly against
<<frozen-indices,frozen indices>> and spanning multiple
<<modules-remote-clusters,remote clusters>>, for which
results are not expected to be returned in milliseconds. When you need to
execute long-running searches, synchronously
waiting for its results to be returned is not ideal. Instead, Async search lets
you submit a search request that gets executed _asynchronously_,
monitor the progress of the request, and retrieve results at a later stage.
You can also retrieve partial results as they become available but
before the search has completed.

You can submit an async search request using the <<submit-async-search,submit
async search>> API. The <<get-async-search,get async search>> API allows you to
monitor the progress of an async search request and retrieve its results. An
ongoing async search can be deleted through the <<delete-async-search,delete
async search>> API.
2 changes: 2 additions & 0 deletions docs/reference/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ include::query-dsl.asciidoc[]

include::modules/cross-cluster-search.asciidoc[]

include::async-search.asciidoc[]

include::scripting.asciidoc[]

include::mapping.asciidoc[]
Expand Down
27 changes: 11 additions & 16 deletions docs/reference/redirects.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ See <<native-realm-configuration>>.
[role="exclude",id="native-settings"]
==== Native realm settings

See <<ref-native-settings>>.
See <<ref-native-settings>>.

[role="exclude",id="configuring-saml-realm"]
=== Configuring a SAML realm
Expand All @@ -130,27 +130,27 @@ See <<saml-guide>>.
[role="exclude",id="saml-settings"]
==== SAML realm settings

See <<ref-saml-settings>>.
See <<ref-saml-settings>>.

[role="exclude",id="_saml_realm_signing_settings"]
==== SAML realm signing settings

See <<ref-saml-signing-settings>>.
See <<ref-saml-signing-settings>>.

[role="exclude",id="_saml_realm_encryption_settings"]
==== SAML realm encryption settings

See <<ref-saml-encryption-settings>>.
See <<ref-saml-encryption-settings>>.

[role="exclude",id="_saml_realm_ssl_settings"]
==== SAML realm SSL settings

See <<ref-saml-ssl-settings>>.
See <<ref-saml-ssl-settings>>.

[role="exclude",id="configuring-file-realm"]
=== Configuring a file realm

See <<file-realm-configuration>>.
See <<file-realm-configuration>>.

[role="exclude",id="ldap-user-search"]
=== User search mode and user DN templates mode
Expand All @@ -170,7 +170,7 @@ See <<ref-ldap-settings>>.
[role="exclude",id="ldap-ssl"]
=== Setting up SSL between Elasticsearch and LDAP

See <<tls-ldap>>.
See <<tls-ldap>>.

[role="exclude",id="configuring-kerberos-realm"]
=== Configuring a Kerberos realm
Expand Down Expand Up @@ -211,7 +211,7 @@ See <<ref-ad-settings>>.
[role="exclude",id="mapping-roles-ad"]
=== Mapping Active Directory users and groups to roles

See <<ad-realm-configuration>>.
See <<ad-realm-configuration>>.

[role="exclude",id="how-security-works"]
=== How security works
Expand All @@ -237,9 +237,9 @@ See the details in

This page was deleted.
[[ml-datafeed-chunking-config]]
See the details in <<ml-put-datafeed>>, <<ml-update-datafeed>>,
See the details in <<ml-put-datafeed>>, <<ml-update-datafeed>>,
[[ml-datafeed-delayed-data-check-config]]
<<ml-get-datafeed>>,
<<ml-get-datafeed>>,
[[ml-datafeed-counts]]
<<ml-get-datafeed-stats>>.

Expand Down Expand Up @@ -323,7 +323,7 @@ See <<snapshots-register-repository>>.
[role="exclude",id="ml-dfa-analysis-objects"]
=== Analysis configuration objects

This page was deleted.
This page was deleted.
See <<put-dfanalytics>>.

[role="exclude",id="slm-api-delete"]
Expand Down Expand Up @@ -375,8 +375,3 @@ See <<slm-api-stop>>.
=== How {ccs} works

See <<ccs-gateway-seed-nodes>> and <<ccs-min-roundtrips>>.

[role="exclude",id="async-search"]
=== Asynchronous search

coming::[7.x]
2 changes: 2 additions & 0 deletions docs/reference/search.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,8 @@ high). This default value is `5`.

include::search/search.asciidoc[]

include::search/async-search.asciidoc[]

include::search/uri-request.asciidoc[]

include::search/request-body.asciidoc[]
Expand Down
200 changes: 200 additions & 0 deletions docs/reference/search/async-search.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
[role="xpack"]
[testenv="basic"]
[[async-search]]
=== Async search

The async search API let you asynchronously execute a
search request, monitor its progress, and retrieve partial results
as they become available.

[[submit-async-search]]
==== Submit async search API

Executes a search request asynchronously. It accepts the same
parameters and request body as the <<search-search,search API>>.

[source,console,id=submit-async-search-date-histogram-example]
--------------------------------------------------
POST /sales*/_async_search?size=0
{
"sort" : [
{ "date" : {"order" : "asc"} }
],
"aggs" : {
"sale_date" : {
"date_histogram" : {
"field" : "date",
"calendar_interval": "1d"
}
}
}
}
--------------------------------------------------
// TEST[setup:sales]
// TEST[s/size=0/size=0&wait_for_completion=0/]

The response contains an identifier of the search being executed.
You can use this ID to later retrieve the search's final results.
The currently available search
results are returned as part of the <<search-api-response-body,`response`>> object.

[source,console-result]
--------------------------------------------------
{
"id" : "FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=", <1>
"version" : 0,
"is_partial" : true, <2>
"is_running" : true, <3>
"start_time_in_millis" : 1583945890986,
"expiration_time_in_millis" : 1584377890986,
"response" : {
"took" : 1122,
"timed_out" : false,
"num_reduce_phases" : 0,
"_shards" : {
"total" : 562, <4>
"successful" : 3, <5>
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 157483, <6>
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
}
}
}
--------------------------------------------------
// TESTRESPONSE[s/FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=/$body.id/]
// TESTRESPONSE[s/1583945890986/$body.start_time_in_millis/]
// TESTRESPONSE[s/1584377890986/$body.expiration_time_in_millis/]
// TESTRESPONSE[s/"took" : 1122/"took": $body.response.took/]
// TESTRESPONSE[s/"total" : 562/"total": $body.response._shards.total/]
// TESTRESPONSE[s/"successful" : 3/"successful": $body.response._shards.successful/]
// TESTRESPONSE[s/"value" : 157483/"value": $body.response.hits.total.value/]

<1> Identifier of the async search that can be used to monitor its progress, retrieve its results, and/or delete it.
<2> Whether the returned search results are partial or final
<3> Whether the search is still being executed or it has completed
<4> How many shards the search will be executed on, overall
<5> How many shards have successfully completed the search
<6> How many documents are currently matching the query, which belong to the shards that have already completed the search

It is possible to block and wait until the search is completed up to a certain
timeout by providing the `wait_for_completion` parameter, which defaults to
`1` second.

The submit async search API supports the same <<search-search-api-query-params,parameters>>
as the search API, though some have different default values:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can list the ones that can be changed and omit the other. batched_reduce_size, request_cache and max_concurrent_shard_requests + the specific options (wait_for_completion, keep_alive) should be enough for this API. I think we said that we want to remove pre_filter_shard_size so I'd remove it from here, same for ccs_minimize_roundtrips which sounds like it could be changed but cannot.
So to be clear, a concrete list that presents all these options and then we can add a link to the search source documentation that should be unchanged from normal search request ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batched_reduce_size is also important to document since that's the granularity at which partial results will be made available.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, I was planning on looking back to this once we disallow setting pre_filter_shard_size and maybe others, but I can address the docs in the meantime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add a link to the search source documentation

there is already a link. Why document max_concurrent_shard_request here? The default is the same as for search?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because I thought we would have the exhaustive list of options here rather than linking to the entire search request option. We only accept a subset of it so better to be explicit ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the thing is that there's a ton of options that can be set to the search request. See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html#search-search-api-query-params . I thought it's better to just link to search rather than duplicate its docs, and mention the special cases or important aspects only in the async search docs. Would you rather list all of the search options also here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok for a simple link but we should at least document batched_reduce_size explicitly here for the reason exposed above. We can also add the list of options that are not accepted here (scroll, ccs_minimize_roundtrips and pre_filter_shard_size) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep that's what I had in mind. batched_reduce_size makes a lot of sense to explain, I already did, I will push shortly


* `pre_filter_shard_size` defaults to `1`
* `batched_reduce_size` defaults to `5`
* `request_cache` defaults to `true`
* `ccs_minimize_roundtrips` defaults to `false`.

You can also specify how long the async search needs to be
available through the `keep_alive` parameter, which defaults to `5d` (five days).
Ongoing async searches and any saved search results are deleted after this
period

WARNING: Async search does not support scroll searches, nor search requests that
only include the `suggest` section. Cross cluster searches can be executed using
async search but only with `ccs_minimize_roundtrips` disabled.
jrodewig marked this conversation as resolved.
Show resolved Hide resolved

[[get-async-search]]
==== Get async search

The get async search API retrieves the results of a previously
submitted async search request given its id.

[source,console,id=get-async-search-date-histogram-example]
--------------------------------------------------
GET /_async_search/FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=
--------------------------------------------------
// TEST[continued s/FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=/\${body.id}/]

[source,console-result]
--------------------------------------------------
{
"id" : "FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=",
"version" : 2, <1>
"is_partial" : true,
"is_running" : true,
"start_time_in_millis" : 1583945890986,
"expiration_time_in_millis" : 1584377890986,
"response" : {
"took" : 12144,
"timed_out" : false,
"num_reduce_phases" : 38,
"_shards" : {
"total" : 562,
"successful" : 188,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 456433,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : { <2>
"sale_date" : {
"buckets" : []
}
}
}
}
--------------------------------------------------
// TESTRESPONSE[s/FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=/$body.id/]
// TESTRESPONSE[s/"is_partial" : true/"is_partial" : false/]
// TESTRESPONSE[s/"is_running" : true/"is_running" : false/]
// TESTRESPONSE[s/1583945890986/$body.start_time_in_millis/]
// TESTRESPONSE[s/1584377890986/$body.expiration_time_in_millis/]
// TESTRESPONSE[s/"took" : 12144/"took": $body.response.took/]
// TESTRESPONSE[s/"total" : 562/"total": $body.response._shards.total/]
// TESTRESPONSE[s/"successful" : 188/"successful": $body.response._shards.successful/]
// TESTRESPONSE[s/"value" : 456433/"value": $body.response.hits.total.value/]
// TESTRESPONSE[s/"buckets" : \[\]/"buckets": $body.response.aggregations.sale_date.buckets/]
// TESTRESPONSE[s/"num_reduce_phases" : 38,//]

<1> The returned `version` is useful to identify whether the response contains
additional results compared to previously obtained responses. If the version
stays the same, no new results have become available, otherwise a higher version
number indicates that more shards have completed their execution of the query
and their partial results are also included in the response.
<2> Partial aggregations results, coming from the shards that have already
completed the execution of the query.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an explanation for expiration_time_in_millis, is_partial and is_running ? The last two are important since they determine how to interpret the search response (partial or not) and if the search is still running.

NOTE: When results are sorted by a numeric field, shards get sorted based on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this note near to the submit query in order to explain why the example uses a sort ?

minimum and maximum value that they hold for that field, hence partial
results become available following the sort criteria that was requested.

The `wait_for_completion` parameter, which defaults to `1`, can also be provided
when calling the Get Async Search API, in order to wait for the search to be
completed up until the provided timeout. Final results will be returned if
available before the timeout expires, otherwise the currently available results
will be returned once the timeout expires.

The `keep_alive` parameter, which defaults to `5d` (five days), specifies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The keep_alive for the get api is a bit different since it allows to extend/change the expiration time for an existing id. Maybe good to make the distinction here since the default for get is to keep the current expiration time.

how long the async search should be available in the cluster. When this
period expires, the search, if still running, is cancelled. If the search is
completed, its saved results are deleted.

[[delete-async-search]]
==== Delete async search

You can use the delete async search API to manually delete an async search
by ID. If the search is still running, the search request will be cancelled.
Otherwise, the saved search results are deleted.

[source,console,id=delete-async-search-date-histogram-example]
--------------------------------------------------
DELETE /_async_search/FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=
--------------------------------------------------
// TEST[continued s/FmRldE8zREVEUzA2ZVpUeGs2ejJFUFEaMkZ5QTVrSTZSaVN3WlNFVmtlWHJsdzoxMDc=/\${body.id}/]
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
"documentation":{
"url":"https://www.elastic.co/guide/en/elasticsearch/reference/current/async-search.html"
},
"stability":"experimental",
"stability":"stable",
"url":{
"paths":[
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
"documentation":{
"url":"https://www.elastic.co/guide/en/elasticsearch/reference/current/async-search.html"
},
"stability":"experimental",
"stability":"stable",
"url":{
"paths":[
{
Expand Down
Loading