Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent segment search GA and API changes #6356

Merged
merged 16 commits into from
Feb 12, 2024
Merged

Conversation

kolchfa-aws
Copy link
Collaborator

Closes #3662

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
./bin/opensearch
```
{% include copy.html %}

## Disabling concurrent search at the index or cluster level

After you enable the experimental feature flag, all search requests will use concurrent segment search during the query phase. To disable concurrent segment search for all indexes, set the following dynamic cluster setting:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still a mention of the feature flag here

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
}
}
```
{% include copy-curl.html %}

To disable concurrent segment search for a particular index, specify the index name in the endpoint:
To enable concurrent segment search for a particular index, specify the index name in the endpoint:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we mention that the index setting takes priority over the cluster setting? So if the cluster setting has concurrent search enabled but the index setting has it disabled, then it will be disabled for that index.

@kolchfa-aws
Copy link
Collaborator Author

Thank you, @jed326! Comments are addressed.

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Copy link
Collaborator

@vagimeli vagimeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -18,6 +18,14 @@ The Profile API provides timing information about the execution of individual co
The Profile API is a resource-consuming operation that adds overhead to search operations.
{: .warning}

## Concurrent segment search

Starting in OpenSearch 2.10, [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/) allows each shard-level request to search segments in parallel during the query phase. The Profile API response contains several additional fields with statistics about _slices_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be 2.12 because thats when concurrent search will be GA and these fields will be in the response by default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. v2.10 is an experimental release.


The following table provides information about the added response fields.
The following table provides information about the response fields.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only having these new response fields specific to concurrent search looks incomplete. Ideally, we should add details about all the fields in the table or just explicitly say these are concurrent search related fields in the response.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sohami @ticheng-aws Fully agreed. Could you provide all the info about other fields please so I can include it here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws it looks like all the index stats are already documented in the node stats page here: https://opensearch.org/docs/latest/api-reference/nodes-apis/nodes-stats/#indices

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jed326 Thanks! So should I remove this section and add these to the nodes-stats table?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you already added these to the node stats table in this PR. I think it's good to have the search stats separately on this page too though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'll add a link to the nodes stats API so users can reference descriptions for all fields.

Comment on lines 248 to 250
|`max_slice_time_in_nanos` | Long | The maximum amount of time taken by any slice to run a query, in nanoseconds.
|`min_slice_time_in_nanos` | Long | The minimum amount of time taken by any slice to run a query, in nanoseconds.
|`avg_slice_time_in_nanos` | Long | The average amount of time taken by any slice to run a query, in nanoseconds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and below we are talking about min/max/avg which are concurrent search specific fields but the shared example response is for non-concurrent path only. Probably we should share a sample response for concurrent search case as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 986 to 993
`<method>` |The total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). For example, for the `collect` method, it is the total time spent collecting documents into buckets across all slices.
`max_<method>` |The maximum amount of time taken by any slice to run an aggregation method.
`min_<method>`|The minimum amount of time taken by any slice to run an aggregation method.
`avg_<method>` |The average amount of time taken by any slice to run an aggregation method.
`<method>_count` |The total method count across all slices. For example, for the `collect` method, it is the total number of invocations of this method needed to collect documents into buckets across all slices.
`max_<method>_count` |The maximum number of invocations of a `<method>` on any slice.
`min_<method>_count` |The minimum number of invocations of a `<method>` on any slice.
`avg_<method>_count` |The average number of invocations of a `<method>` on any slice.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are part of breakdown within aggregation section and not in the top level aggregation section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these method stats should go within "the aggregations array" section below.

`post_collection`| Contains the time spent running the aggregation’s `postCollection()` callback method.
`build_aggregation`| Contains the time spent running the aggregation’s `buildAggregations()` method, which builds the results of this aggregation.
`reduce`| Contains the time spent in the `reduce` phase.
`<method>_count` | Contains the number of invocations of a `<method>`. For example, `build_leaf_collector_count` contains the number of invocations of the `build_leaf_collector` method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we will need to add info about min/max/avg for each method.

@@ -255,7 +266,15 @@ Field | Description
`shallow_advance` | Contains the amount of time required to execute the `advanceShallow` Lucene method.
`compute_max_score` | Contains the amount of time required to execute the `getMaxScore` Lucene method.
`set_min_competitive_score` | Contains the amount of time required to execute the `setMinCompetitiveScore` Lucene method.
`<method>_count` | Contains the number of invocations of a `<method>`. For example, `advance_count` contains the number of invocations of the `advance` method. Different invocations of the same method occur because the method is called on different documents. You can determine the selectivity of a query by comparing counts in different query components.
`<method>_count` | Contains the number of invocations of a `<method>`. For example, `advance_count` contains the number of invocations of the `advance` method. Different invocations of the same method occur because the method is called on different documents. You can determine the selectivity of a query by comparing counts in different query components. For concurrent segment search, this field contains the total number of invocations of a `<method>` obtained by adding the number of method invocations for all slices.
`<method>` | For concurrent segment search, `time_in_nanos` is the total time spent across all the slices (the difference between the last completed slice execution end time and the first slice execution start time). For example, for the `build_scorer` method, it is the total time spent constructing the `Scorer` object across all slices.
Copy link
Contributor

@ticheng-aws ticheng-aws Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "method" actually means all the field above. Like "create_weight", "build_scorer", ..., "set_min_competitive_score". So we can remove this row, and add the concurrent segment search description above.

Comment on lines 986 to 993
`<method>` |The total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). For example, for the `collect` method, it is the total time spent collecting documents into buckets across all slices.
`max_<method>` |The maximum amount of time taken by any slice to run an aggregation method.
`min_<method>`|The minimum amount of time taken by any slice to run an aggregation method.
`avg_<method>` |The average amount of time taken by any slice to run an aggregation method.
`<method>_count` |The total method count across all slices. For example, for the `collect` method, it is the total number of invocations of this method needed to collect documents into buckets across all slices.
`max_<method>_count` |The maximum number of invocations of a `<method>` on any slice.
`min_<method>_count` |The minimum number of invocations of a `<method>` on any slice.
`avg_<method>_count` |The average number of invocations of a `<method>` on any slice.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these method stats should go within "the aggregations array" section below.

In general, the max/min/avg slice time captures statistics across all slices for a timing type. For example, when profiling aggregations, the `max_slice_time_in_nanos` field in the `aggregations` section shows the maximum time consumed by the aggregation operation and its children across all slices.

#### Example response
#### Example response: Concurrent segment search
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

`children` | If a collector has subcollectors (children), this field contains information about the subcollectors.
`max_slice_time_in_nanos` |The maximum amount of time taken by any slice, in nanoseconds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention these stats fields only present in the concurrent segment search response? These stats won't show in a non-concurrent segment search response.

@hdhalter hdhalter added 3 - Tech Review PR: Tech review in progress release-notes PR: Include this PR in the automated release notes labels Feb 7, 2024
@ticheng-aws ticheng-aws self-requested a review February 7, 2024 17:43
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
@@ -255,7 +480,7 @@ Field | Description
`shallow_advance` | Contains the amount of time required to execute the `advanceShallow` Lucene method.
`compute_max_score` | Contains the amount of time required to execute the `getMaxScore` Lucene method.
`set_min_competitive_score` | Contains the amount of time required to execute the `setMinCompetitiveScore` Lucene method.
`<method>_count` | Contains the number of invocations of a `<method>`. For example, `advance_count` contains the number of invocations of the `advance` method. Different invocations of the same method occur because the method is called on different documents. You can determine the selectivity of a query by comparing counts in different query components.
`<method>_count` | Contains the number of invocations of a `<method>`. For example, `advance_count` contains the number of invocations of the `advance` method. Different invocations of the same method occur because the method is called on different documents. You can determine the selectivity of a query by comparing counts in different query components. For concurrent segment search, this field contains the total number of invocations of a `<method>` obtained by adding the number of method invocations for all slices. For concurrent segment search, `time_in_nanos` is the total time spent across all the slices (the difference between the last completed slice execution end time and the first slice execution start time). For example, for the `build_scorer` method, it is the total time spent constructing the `Scorer` object across all slices.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove below info. It's not related to <method>_count

For concurrent segment search, `time_in_nanos` is the total time spent across all the slices (the difference between the last completed slice execution end time and the first slice execution start time). For example, for the `build_scorer` method, it is the total time spent constructing the `Scorer` object across all slices.

@@ -236,7 +458,10 @@ Field | Data type | Description
:--- | :--- | :---
`type` | String | The Lucene query type into which the search query was rewritten. Corresponds to the Lucene class name (which often has the same name in OpenSearch).
`description` | String | Contains a Lucene explanation of the query. Helps differentiate queries with the same type.
`time_in_nanos` | Long | The amount of time the query took to execute, in nanoseconds. In a parent query, the time is inclusive of the execution times of all the child queries.
`time_in_nanos` | Long | The total elapsed time for this query, in nanoseconds. For concurrent segment search, `time_in_nanos` is the total time spent across all the slices (the difference between the last completed slice execution end time and the first slice execution start time).
`max_slice_time_in_nanos` | Long | The maximum amount of time taken by any slice to run a query, in nanoseconds.
Copy link
Contributor

@ticheng-aws ticheng-aws Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all the max/min/avg_slice_time_in_nanos fields in query, collector, and aggregation sections, These fields are included only if you enable concurrent segment search. @kolchfa-aws, do you have idea how we can add to the doc?

|`max_<method>_count` |The maximum number of invocations of a `<method>` on any slice. |
|`min_<method>_count` |The minimum number of invocations of a `<method>` on any slice. |
|`avg_<method>_count` |The average number of invocations of a `<method>` on any slice. |
`<method>` |The total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). For example, for the `collect` method, it is the total time spent collecting documents into buckets across all slices.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This aggregation "method" field here also means the aggregation breakdown field like "initialize", "build_leaf_collector", ..., "reduce". So we can remove this row, and add the concurrent segment search description above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, we can remove this row.

[`breakdown`](#the-breakdown-object-1) | Object | Contains timing statistics about low-level Lucene execution.
`children` | Array of objects | If an aggregation has subaggregations (children), this field contains information about the subaggregations.
`debug` | Object | Some aggregations return a `debug` object that describes the details of the underlying execution.
`max_slice_time_in_nanos` |The maximum amount of time taken by any slice to run an aggregation, in nanoseconds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add Long data type for max/min/avg_slice_time_in_nanos.

@@ -255,7 +480,7 @@ Field | Description
`shallow_advance` | Contains the amount of time required to execute the `advanceShallow` Lucene method.
`compute_max_score` | Contains the amount of time required to execute the `getMaxScore` Lucene method.
`set_min_competitive_score` | Contains the amount of time required to execute the `setMinCompetitiveScore` Lucene method.
`<method>_count` | Contains the number of invocations of a `<method>`. For example, `advance_count` contains the number of invocations of the `advance` method. Different invocations of the same method occur because the method is called on different documents. You can determine the selectivity of a query by comparing counts in different query components.
`<method>_count` | Contains the number of invocations of a `<method>`. For example, `advance_count` contains the number of invocations of the `advance` method. Different invocations of the same method occur because the method is called on different documents. You can determine the selectivity of a query by comparing counts in different query components. For concurrent segment search, this field contains the total number of invocations of a `<method>` obtained by adding the number of method invocations for all slices. For concurrent segment search, `time_in_nanos` is the total time spent across all the slices (the difference between the last completed slice execution end time and the first slice execution start time). For example, for the `build_scorer` method, it is the total time spent constructing the `Scorer` object across all slices.

Copy link
Contributor

@ticheng-aws ticheng-aws Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add below query breakdown information here

|`max_<method>`	| The maximum amount of time taken by any slice to run a query method. Breakdown stats for the `create_weight` method do not include profiled `max` time because the method runs at the query level rather than the slice level.	|
|`min_<method>`	| The minimum amount of time taken by any slice to run a query method. Breakdown stats for the `create_weight` method do not include profiled `min` time because the method runs at the query level rather than the slice level.  |
|`avg_<method>`	| The average amount of time taken by any slice to run a query method. Breakdown stats for the `create_weight` method do not include profiled `avg` time because the method runs at the query level rather than the slice level.	|
|`max_<method>_count`	| The maximum number of invocations of a `<method>` on any slice. Breakdown stats for the `create_weight` method do not include profiled `max` count because the method runs at the query level rather than the slice level. |
|`min_<method>_count`	| The minimum number of invocations of a `<method>` on any slice. Breakdown stats for the `create_weight` method do not include profiled `min` count because the method runs at the query level rather than the slice level. |
|`avg_<method>_count`	| The average number of invocations of a `<method>` on any slice. Breakdown stats for the `create_weight` method do not include profiled `avg` count because the method runs at the query level rather than the slice level. |

These fields are included only if you enable concurrent segment search.

|`min_<method>_count` |The minimum number of invocations of a `<method>` on any slice. |
|`avg_<method>_count` |The average number of invocations of a `<method>` on any slice. |
`<method>` |The total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). For example, for the `collect` method, it is the total time spent collecting documents into buckets across all slices.
`max_<method>` |The maximum amount of time taken by any slice to run an aggregation method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These max/min/avg stats fields are included only if you enable concurrent segment search.

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
@@ -213,6 +221,220 @@ The response contains profiling information:
```
</details>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws Should we move the above "## Concurrent segment search" section to here? It's because "#### Example request" section is not related to concurrent segment search

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to leave the section on top because it's an overview. And then there are examples. I'll add a heading to the second example so we differentiate.

@@ -756,274 +995,11 @@ Field | Description
`build_aggregation`| Contains the time spent running the aggregation’s `buildAggregations()` method, which builds the results of this aggregation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the below sentence to the end.

For concurrent segment search,`build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time).

Copy link
Contributor

@ticheng-aws ticheng-aws Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same information above should add it to initialize, build_leaf_collector, collect, post_collection, and reduce methods.

|`max_<method>_count` |The maximum number of invocations of a `<method>` on any slice. |
|`min_<method>_count` |The minimum number of invocations of a `<method>` on any slice. |
|`avg_<method>_count` |The average number of invocations of a `<method>` on any slice. |
`<method>` |The total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). For example, for the `collect` method, it is the total time spent collecting documents into buckets across all slices.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, we can remove this row.

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
`max_<method>` |The maximum amount of time taken by any slice to run an aggregation method. This field is included only if you enable concurrent segment search.
`min_<method>`|The minimum amount of time taken by any slice to run an aggregation method. This field is included only if you enable concurrent segment search.
`avg_<method>` |The average amount of time taken by any slice to run an aggregation method. This field is included only if you enable concurrent segment search.
`<method>_count` |The total method count across all slices. For example, for the `collect` method, it is the total number of invocations of this method needed to collect documents into buckets across all slices. This field is included only if you enable concurrent segment search.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove "This field is included only if you enable concurrent segment search." This field also exists in the non-concurrent search case.

`build_leaf_collector`| Contains the time spent running the `getLeafCollector()` method of the aggregation, which creates a new collector to collect the given context. For concurrent segment search,`build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time).
`collect`| Contains the time spent collecting the documents into buckets. For concurrent segment search,`build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time).
`post_collection`| Contains the time spent running the aggregation’s `postCollection()` callback method. For concurrent segment search,`build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time).
`build_aggregation`| Contains the time spent running the aggregation’s `buildAggregations()` method, which builds the results of this aggregation. For concurrent segment search,`build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time).
`reduce`| Contains the time spent in the `reduce` phase.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add " For concurrent segment search,reduce method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time)." to this field as well.

`initialize` | Contains the amount of time taken to execute the `preCollection()` callback method during `AggregationCollectorManager` creation. For concurrent segment search,`build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time).
`build_leaf_collector`| Contains the time spent running the `getLeafCollector()` method of the aggregation, which creates a new collector to collect the given context. For concurrent segment search,`build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time).
`collect`| Contains the time spent collecting the documents into buckets. For concurrent segment search,`build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time).
`post_collection`| Contains the time spent running the aggregation’s `postCollection()` callback method. For concurrent segment search,`build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For concurrent segment search,post_collection method ...

`build_aggregation`| Contains the time spent running the aggregation’s `buildAggregations()` method, which builds the results of this aggregation.
`initialize` | Contains the amount of time taken to execute the `preCollection()` callback method during `AggregationCollectorManager` creation. For concurrent segment search,`build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time).
`build_leaf_collector`| Contains the time spent running the `getLeafCollector()` method of the aggregation, which creates a new collector to collect the given context. For concurrent segment search,`build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time).
`collect`| Contains the time spent collecting the documents into buckets. For concurrent segment search,`build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For concurrent segment search, collect method ...

`post_collection`| Contains the time spent running the aggregation’s `postCollection()` callback method.
`build_aggregation`| Contains the time spent running the aggregation’s `buildAggregations()` method, which builds the results of this aggregation.
`initialize` | Contains the amount of time taken to execute the `preCollection()` callback method during `AggregationCollectorManager` creation. For concurrent segment search,`build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time).
`build_leaf_collector`| Contains the time spent running the `getLeafCollector()` method of the aggregation, which creates a new collector to collect the given context. For concurrent segment search,`build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For concurrent segment search, build_leaf_collector method ...

`collect`| Contains the time spent collecting the documents into buckets.
`post_collection`| Contains the time spent running the aggregation’s `postCollection()` callback method.
`build_aggregation`| Contains the time spent running the aggregation’s `buildAggregations()` method, which builds the results of this aggregation.
`initialize` | Contains the amount of time taken to execute the `preCollection()` callback method during `AggregationCollectorManager` creation. For concurrent segment search,`build_aggregation` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For concurrent segment search,initialize method ...

Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws Please see my comments and changes and let me know if you have any questions. Thanks!

_api-reference/index-apis/stats.md Outdated Show resolved Hide resolved
_api-reference/profile.md Outdated Show resolved Hide resolved
_api-reference/profile.md Outdated Show resolved Hide resolved
_api-reference/profile.md Outdated Show resolved Hide resolved
`build_aggregation`| Contains the time spent running the aggregation’s `buildAggregations()` method, which builds the results of this aggregation.
`reduce`| Contains the time spent in the `reduce` phase.
`initialize` | Contains the amount of time taken to execute the `preCollection()` callback method during `AggregationCollectorManager` creation. For concurrent segment search, the `initialize` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time).
`build_leaf_collector`| Contains the time spent running the `getLeafCollector()` method of the aggregation, which creates a new collector to collect the given context. For concurrent segment search, the `build_leaf_collector` method contains the total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"aggregation method" instead of "method of the aggregation"? Or "the aggregation's getLeafCollector() method" (as you've done below)?

{% include copy.html %}

If you want to define the environment variable separately prior to running OpenSearch, run the following commands:
## Enabling concurrent search at the index or cluster level
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirm that "segment" shouldn't precede "search".


## Disabling concurrent search at the index or cluster level
The index-level setting takes priority over the cluster-level setting. Thus, if the cluster setting is enabled but the index setting is disabled, then concurrent search will be disabled for that index.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirm that "segment" shouldn't precede "search".

@@ -213,6 +221,220 @@ The response contains profiling information:
```
</details>

#### Example response: Concurrent segment search

The following is an example response for a concurrent search with three segment slices:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirm that "segment" shouldn't precede "search".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be either way. The "segment" is understood in this case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but let's go for consistency 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done!

_api-reference/profile.md Outdated Show resolved Hide resolved
Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
_api-reference/profile.md Outdated Show resolved Hide resolved
kolchfa-aws and others added 3 commits February 9, 2024 08:27
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
@kolchfa-aws kolchfa-aws merged commit 37803ff into main Feb 12, 2024
4 checks passed
@hdhalter hdhalter added 3 - Done Issue is done/complete and removed 3 - Tech Review PR: Tech review in progress labels Feb 13, 2024
oeyh pushed a commit to oeyh/documentation-website that referenced this pull request Mar 14, 2024
* Concurrent segment search GA and API changes for 2.12

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add types

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Changed wording to enable concur seg search

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Tech review comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Typo

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Tech review comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* More tech review comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Added link to response fields

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* More tech review comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add tech review comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Apply suggestions from code review

Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _api-reference/profile.md

Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Resolve merge conflicts

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

---------

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
@Naarcha-AWS Naarcha-AWS deleted the concurrent-seg-search branch March 28, 2024 23:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Done Issue is done/complete release-notes PR: Include this PR in the automated release notes v2.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DOC] Concurrent Segment Search support in OpenSearch
7 participants