Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add heuristic slice_size for slice level top buckets in concurrent segment search #11585

Merged
merged 1 commit into from
Dec 12, 2023

Conversation

jed326
Copy link
Collaborator

@jed326 jed326 commented Dec 12, 2023

Adds a slice_size heuristic for controlling top buckets for terms aggs. See #11584 for more details

Related Issues

Resolves #11584

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…gment search

Signed-off-by: Jay Deng <jayd0104@gmail.com>
@jed326
Copy link
Collaborator Author

jed326 commented Dec 12, 2023

@reta @sohami would love to hear your thoughts on this PR as well as the problems described in #11584. Thanks!

Copy link
Contributor

Compatibility status:

Checks if related components are compatible with change 67113aa

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/performance-analyzer.git]

Copy link
Contributor

❌ Gradle check result for 67113aa: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@reta
Copy link
Collaborator

reta commented Dec 12, 2023

❌ Gradle check result for 67113aa: FAILURE

#11032
#5329

Copy link
Contributor

✅ Gradle check result for 67113aa: SUCCESS

@sohami sohami added the backport 2.x Backport to 2.x branch label Dec 12, 2023
@sohami sohami merged commit 074bc6a into opensearch-project:main Dec 12, 2023
119 of 120 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Dec 12, 2023
…gment search (#11585)

Signed-off-by: Jay Deng <jayd0104@gmail.com>
(cherry picked from commit 074bc6a)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@sohami
Copy link
Collaborator

sohami commented Dec 12, 2023

@jed326 We should also have a document change for this to explain about slice_size heuristics in concurrent search flow. Users can use shard_size parameter itself to control the number of buckets considered at each slice level to get correct doc count per bucket on a shard. Good thing about this is If shard_size is chosen correctly then this default heuristic should work fine out of the box.

jed326 added a commit to jed326/OpenSearch that referenced this pull request Dec 14, 2023
…gment search (opensearch-project#11585)

Signed-off-by: Jay Deng <jayd0104@gmail.com>
reta pushed a commit that referenced this pull request Dec 14, 2023
…gment search (#11585) (#11620)

Signed-off-by: Jay Deng <jayd0104@gmail.com>
@jed326 jed326 deleted the 11584 branch March 12, 2024 22:13
rayshrey pushed a commit to rayshrey/OpenSearch that referenced this pull request Mar 18, 2024
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…gment search (opensearch-project#11585)

Signed-off-by: Jay Deng <jayd0104@gmail.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch enhancement Enhancement or improvement to existing feature or request Search:Performance skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Concurrent Segment Search] Performance regression for multi-term aggs on high cardinality data
4 participants