Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async search #49091

Closed
6 tasks done
javanna opened this issue Nov 14, 2019 · 2 comments
Closed
6 tasks done

Async search #49091

javanna opened this issue Nov 14, 2019 · 2 comments
Assignees
Labels
Dependency:SIEM Meta :Search/Search Search-related issues that do not fall into other categories v7.7.0

Comments

@javanna
Copy link
Member

javanna commented Nov 14, 2019

This is a meta issue to monitor progress around adding support for asynchronous search.

The main goal is to launch a search request and, instead of waiting for its final response, let it run asynchronously while being able to monitor its progress. The progress of an async search includes how many operations have been performed compared to the total number of operations required for completion, as well as intermediate results that may be available before completion.

The high-level steps required for this feature are listed as follows, together with their corresponding licensing:

The resiliency of async search is besides the scope of the initial implementation. If the coordinating node goes down while an async search is running, the most recent intermediate results should be made available for retrieval, but the async search is aborted and never retried, hence it needs to be restarted manually.

@javanna javanna added :Search/Search Search-related issues that do not fall into other categories Meta labels Nov 14, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

jimczi added a commit to jimczi/elasticsearch that referenced this issue Nov 14, 2019
This change automatically pre-sort search shards on search requests that use a primary sort based on the value
of a field. When possible, the can_match phase will extract the min/max (depending on the provided sort order) values
of each shard and use it to pre-sort the shards prior to running the subsequent phases. This feature can be useful to
ensure that shards that contain recent data are executed first so that intermediate merge have more chance to contain
contiguous data (think of date_histogram for instance) but it could also be used in a follow up to early terminate sorted
top-hits queries that don't require the total hit count. The latter could significantly speed up the retrieval of the most/least
recent documents from time-based indices.
I took two shortcuts here:
* I reused the can_match phase to add the required information for the shard sort. We could instead introduce a new phase
 but it make sense to me to use the existing phase to add more informations as long as the additional ops are lightweight.
* The shard sort is done automatically if the primary search sort is based on a field. However this sorting only makes sense
if the range of values in each shard doesn't overlap (time-based indices sorted on timestamp for instance). We could add
a new option to enable/disable this behavior or even add an additional `shard_sort` criteria but I also like the fact that
users don't need to set any option to benefit from this feature.

Relates elastic#49091
jimczi added a commit that referenced this issue Nov 21, 2019
…49092)

This change automatically pre-sort search shards on search requests that use a primary sort based on the value of a field. When possible, the can_match phase will extract the min/max (depending on the provided sort order) values of each shard and use it to pre-sort the shards prior to running the subsequent phases. This feature can be useful to ensure that shards that contain recent data are executed first so that intermediate merge have more chance to contain contiguous data (think of date_histogram for instance) but it could also be used in a follow up to early terminate sorted top-hits queries that don't require the total hit count. The latter could significantly speed up the retrieval of the most/least
recent documents from time-based indices.

Relates #49091
jimczi added a commit to jimczi/elasticsearch that referenced this issue Nov 21, 2019
This commit adds a function in NodeClient that allows to track the progress
of a search request locally. Progress is tracked through a SearchProgressListener
that exposes query and fetch responses as well as partial and final reduces.
This new method can be used by modules/plugins inside a node in order to track the
progress of a local search request.

Relates elastic#49091
jimczi added a commit that referenced this issue Nov 22, 2019
…49092)

This change automatically pre-sort search shards on search requests that use a primary sort based on the value of a field. When possible, the can_match phase will extract the min/max (depending on the provided sort order) values of each shard and use it to pre-sort the shards prior to running the subsequent phases. This feature can be useful to ensure that shards that contain recent data are executed first so that intermediate merge have more chance to contain contiguous data (think of date_histogram for instance) but it could also be used in a follow up to early terminate sorted top-hits queries that don't require the total hit count. The latter could significantly speed up the retrieval of the most/least
recent documents from time-based indices.

Relates #49091
jimczi added a commit that referenced this issue Nov 28, 2019
)

This commit adds a function in NodeClient that allows to track the progress
of a search request locally. Progress is tracked through a SearchProgressListener
that exposes query and fetch responses as well as partial and final reduces.
This new method can be used by modules/plugins inside a node in order to track the
progress of a local search request.

Relates #49091
jimczi added a commit to jimczi/elasticsearch that referenced this issue Nov 28, 2019
…stic#49471)

This commit adds a function in NodeClient that allows to track the progress
of a search request locally. Progress is tracked through a SearchProgressListener
that exposes query and fetch responses as well as partial and final reduces.
This new method can be used by modules/plugins inside a node in order to track the
progress of a local search request.

Relates elastic#49091
jimczi added a commit that referenced this issue Nov 28, 2019
) (#49691)

This commit adds a function in NodeClient that allows to track the progress
of a search request locally. Progress is tracked through a SearchProgressListener
that exposes query and fetch responses as well as partial and final reduces.
This new method can be used by modules/plugins inside a node in order to track the
progress of a local search request.

Relates #49091
jimczi added a commit that referenced this issue Mar 10, 2020
…usly (#49931)

### High level view

This change introduces a new API in x-pack basic that allows to track the progress of a search.
Users can submit an asynchronous search through a new endpoint called `_async_search` that
works exactly the same as the `_search` endpoint but instead of blocking and returning the final response when available, it returns a response after a provided `wait_for_completion` time.

````
# Submit an _async_search and waits up to 100ms for a final response
GET my_index_pattern*/_async_search?wait_for_completion=100ms
{
  "aggs": {
    "date_histogram": {
      "field": "@timestamp",
      "fixed_interval": "1h"
    }
  }
}
````

If after 100ms the final response is not available, a `partial_response` is included in the body:

````
{
  "id": "9N3J1m4BgyzUDzqgC15b",
  "version": 1,
  "is_running": true,
  "is_partial": true,
  "response": {
   "_shards": {
       "total": 100,
       "successful": 5,
       "failed": 0
    },
    "total_hits": {
      "value": 1653433,
      "relation": "eq"
    },
    "aggs": {
      ...
    }
  }
}
````

The partial response contains the total number of requested shards, the number of shards that successfully returned and the number of shards that failed.
It also contains the total hits as well as partial aggregations computed from the successful shards.
To continue to monitor the progress of the search users can call the get `_async_search` API like the following:

````
GET _async_search/9N3J1m4BgyzUDzqgC15b/?wait_for_completion=100ms
````

That returns a new response that can contain the same partial response than the previous call if the search didn't progress, in such case the returned `version`
should be the same. If new partial results are available, the version is incremented and the `partial_response` contains the updated progress.
Finally if the response is fully available while or after waiting for completion, the `partial_response` is replaced by a `response` section that contains the usual _search response:

````
{
  "id": "9N3J1m4BgyzUDzqgC15b",
  "version": 10,
  "is_running": false,
  "response": {
     "is_partial": false,
     ...
  }
}
````

## Persistency

Asynchronous search are stored in a restricted index called `.async-search` if they survive (still running) after the initial submit. Each request has a keep alive that defaults to 5 days but this value can be changed/updated any time:
`````
GET my_index_pattern*/_async_search?wait_for_completion=100ms&keep_alive=10d
`````
The default can be changed when submitting the search, the example above raises the default value for the search to `10d`. 
`````
GET _async_search/9N3J1m4BgyzUDzqgC15b/?wait_for_completion=100ms&keep_alive=10d
`````
The time to live for a specific search can be extended when getting the progress/result. In the example above we extend the keep alive to 10 more days.
A background service that runs only on the node that holds the first primary shard of the `async-search` index is responsible for deleting the expired results. It runs every hour but the expiration is also checked by running queries (if they take longer than the keep_alive) and when getting a result.

Like a normal `_search`, if the http channel that is used to submit a request is closed before getting a response, the search is automatically cancelled. Note that this behavior is only for the submit API, subsequent GET requests will not cancel if they are closed. 

## Resiliency

Asynchronous search are not persistent, if the coordinator node crashes or is restarted during the search, the asynchronous search will stop. To know if the search is still running or not the response contains a field called `is_running` that indicates if the task is up or not. It is the responsibility of the user to resume an asynchronous search that didn't reach a final response by re-submitting the query. However final responses and failures are persisted in a system index that allows
to retrieve a response even if the task finishes.

````
DELETE _async_search/9N3J1m4BgyzUDzqgC15b
````

The response is also not stored if the initial submit action returns a final response. This allows to not add any overhead to queries that completes within the initial `wait_for_completion`.

## Security

The `.async-search` index is a restricted index (should be migrated to a system index in +8.0) that is accessible only through the async search APIs. These APIs also ensure that only the user that submitted the initial query can retrieve or delete the running search. Note that admins/superusers would still be able to cancel the search task through the task manager like any other tasks.

Relates #49091

Co-authored-by: Luca Cavanna <javanna@users.noreply.github.com>
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Mar 16, 2020
This commit adds a new AsyncSearchClient to the High Level Rest Client which
initially supporst the submitAsyncSearch in its blocking and non-blocking
flavour. Also adding client side request and response objects and parsing code
to parse the xContent output of the client side AsyncSearchResponse together
with parsing roundtrip tests and a simple roundtrip integration test.

Relates to elastic#49091
jimczi added a commit that referenced this issue Mar 16, 2020
…usly (#49931) (#53591)

This change introduces a new API in x-pack basic that allows to track the progress of a search.
Users can submit an asynchronous search through a new endpoint called `_async_search` that
works exactly the same as the `_search` endpoint but instead of blocking and returning the final response when available, it returns a response after a provided `wait_for_completion` time.

````
GET my_index_pattern*/_async_search?wait_for_completion=100ms
{
  "aggs": {
    "date_histogram": {
      "field": "@timestamp",
      "fixed_interval": "1h"
    }
  }
}
````

If after 100ms the final response is not available, a `partial_response` is included in the body:

````
{
  "id": "9N3J1m4BgyzUDzqgC15b",
  "version": 1,
  "is_running": true,
  "is_partial": true,
  "response": {
   "_shards": {
       "total": 100,
       "successful": 5,
       "failed": 0
    },
    "total_hits": {
      "value": 1653433,
      "relation": "eq"
    },
    "aggs": {
      ...
    }
  }
}
````

The partial response contains the total number of requested shards, the number of shards that successfully returned and the number of shards that failed.
It also contains the total hits as well as partial aggregations computed from the successful shards.
To continue to monitor the progress of the search users can call the get `_async_search` API like the following:

````
GET _async_search/9N3J1m4BgyzUDzqgC15b/?wait_for_completion=100ms
````

That returns a new response that can contain the same partial response than the previous call if the search didn't progress, in such case the returned `version`
should be the same. If new partial results are available, the version is incremented and the `partial_response` contains the updated progress.
Finally if the response is fully available while or after waiting for completion, the `partial_response` is replaced by a `response` section that contains the usual _search response:

````
{
  "id": "9N3J1m4BgyzUDzqgC15b",
  "version": 10,
  "is_running": false,
  "response": {
     "is_partial": false,
     ...
  }
}
````

Asynchronous search are stored in a restricted index called `.async-search` if they survive (still running) after the initial submit. Each request has a keep alive that defaults to 5 days but this value can be changed/updated any time:
`````
GET my_index_pattern*/_async_search?wait_for_completion=100ms&keep_alive=10d
`````
The default can be changed when submitting the search, the example above raises the default value for the search to `10d`.
`````
GET _async_search/9N3J1m4BgyzUDzqgC15b/?wait_for_completion=100ms&keep_alive=10d
`````
The time to live for a specific search can be extended when getting the progress/result. In the example above we extend the keep alive to 10 more days.
A background service that runs only on the node that holds the first primary shard of the `async-search` index is responsible for deleting the expired results. It runs every hour but the expiration is also checked by running queries (if they take longer than the keep_alive) and when getting a result.

Like a normal `_search`, if the http channel that is used to submit a request is closed before getting a response, the search is automatically cancelled. Note that this behavior is only for the submit API, subsequent GET requests will not cancel if they are closed.

Asynchronous search are not persistent, if the coordinator node crashes or is restarted during the search, the asynchronous search will stop. To know if the search is still running or not the response contains a field called `is_running` that indicates if the task is up or not. It is the responsibility of the user to resume an asynchronous search that didn't reach a final response by re-submitting the query. However final responses and failures are persisted in a system index that allows
to retrieve a response even if the task finishes.

````
DELETE _async_search/9N3J1m4BgyzUDzqgC15b
````

The response is also not stored if the initial submit action returns a final response. This allows to not add any overhead to queries that completes within the initial `wait_for_completion`.

The `.async-search` index is a restricted index (should be migrated to a system index in +8.0) that is accessible only through the async search APIs. These APIs also ensure that only the user that submitted the initial query can retrieve or delete the running search. Note that admins/superusers would still be able to cancel the search task through the task manager like any other tasks.

Relates #49091

Co-authored-by: Luca Cavanna <javanna@users.noreply.github.com>
javanna added a commit to javanna/elasticsearch that referenced this issue Mar 17, 2020
cbuescher pushed a commit that referenced this issue Mar 19, 2020
This commit adds a new AsyncSearchClient to the High Level Rest Client which
initially supporst the submitAsyncSearch in its blocking and non-blocking
flavour. Also adding client side request and response objects and parsing code
to parse the xContent output of the client side AsyncSearchResponse together
with parsing roundtrip tests and a simple roundtrip integration test.

Relates to #49091
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Mar 19, 2020
This commit adds the "_async_searhc" get and delete APIs to the
AsyncSearchClient in the High Level Rest Client.

Relates to elastic#49091
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Mar 20, 2020
This commit adds a new AsyncSearchClient to the High Level Rest Client which
initially supporst the submitAsyncSearch in its blocking and non-blocking
flavour. Also adding client side request and response objects and parsing code
to parse the xContent output of the client side AsyncSearchResponse together
with parsing roundtrip tests and a simple roundtrip integration test.

Relates to elastic#49091
Backport of elastic#53592
cbuescher pushed a commit that referenced this issue Mar 20, 2020
This commit adds a new AsyncSearchClient to the High Level Rest Client which
initially supporst the submitAsyncSearch in its blocking and non-blocking
flavour. Also adding client side request and response objects and parsing code
to parse the xContent output of the client side AsyncSearchResponse together
with parsing roundtrip tests and a simple roundtrip integration test.

Relates to #49091
Backport of #53592
javanna added a commit that referenced this issue Mar 20, 2020
Relates to #49091

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
javanna added a commit that referenced this issue Mar 20, 2020
Relates to #49091

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
cbuescher pushed a commit that referenced this issue Mar 23, 2020
This commit adds the "_async_searhc" get and delete APIs to the
AsyncSearchClient in the High Level Rest Client.

Relates to #49091
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Mar 23, 2020
This commit adds the "_async_searhc" get and delete APIs to the
AsyncSearchClient in the High Level Rest Client.

Relates to elastic#49091
Backport of elastic#53828
cbuescher pushed a commit that referenced this issue Mar 23, 2020
This commit adds the "_async_searhc" get and delete APIs to the
AsyncSearchClient in the High Level Rest Client.

Relates to #49091
Backport of #53828
@bpintea bpintea added v7.8.0 and removed v7.7.0 labels Mar 25, 2020
@jimczi jimczi removed the blocker label Apr 8, 2020
@jimczi
Copy link
Contributor

jimczi commented Apr 8, 2020

All the blockers have been merge in 7.7, hence closing. We'll create new issues for improvements related to async search in the future.

@jimczi jimczi closed this as completed Apr 8, 2020
@pugnascotia pugnascotia added v7.7.0 and removed v7.8.0 labels May 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dependency:SIEM Meta :Search/Search Search-related issues that do not fall into other categories v7.7.0
Projects
None yet
Development

No branches or pull requests

6 participants