Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META] Concurrent Searching #2587

Open
2 of 9 tasks
reta opened this issue Mar 24, 2022 · 3 comments
Open
2 of 9 tasks

[META] Concurrent Searching #2587

reta opened this issue Mar 24, 2022 · 3 comments
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Search:Aggregations

Comments

@reta
Copy link
Collaborator

reta commented Mar 24, 2022

Is your feature request related to a problem? Please describe.
At least since Apache Lucene 6.x, there is a new experimental low-level API which allows to parallelize execution of the search across segments [3]. As of latest Apache Lucene 8.10.1, the API is still marked as experimental (see please [1]). The community feedback on this feature is looking positive so far (see please [2]), there are high chances that for certain kind of indices parallelizing the search over segments could bring performance benefits.

[1] https://lucene.apache.org/core/8_10_1/core/org/apache/lucene/search/IndexSearcher.html#search-org.apache.lucene.search.Query-org.apache.lucene.search.CollectorManager-
[2] https://engineeringblog.yelp.com/2021/09/nrtsearch-yelps-fast-scalable-and-cost-effective-search-engine.html
[3] https://blog.mikemccandless.com/2019/10/concurrent-query-execution-in-apache.html

Describe the solution you'd like
Support the concurrent search over Apache Lucene segments

Describe alternatives you've considered
N/A

Additional context

@msfroh
Copy link
Collaborator

msfroh commented Apr 13, 2023

@reta -- out of curiosity, is there any plan to link concurrent search with a merge policy that evens-out segment sizes?

On Amazon Product Search, we used a merge policy that dynamically adjusts the max segment size (to something like min(5GB, max(1GB, totalIndexSize/5)), IIRC) combined with a merge-on-commit setting that would merge all segments less than some threshold (something like 100MB). Basically, "lower the ceiling and raise the floor".

(I talked about this around minute 14:00 of https://www.youtube.com/watch?v=UwclHSeE_B8. Sorry for the shameless plug. 😁 )

@reta
Copy link
Collaborator Author

reta commented Apr 13, 2023

@msfroh this is great idea I think, I remember we have discussed that at OpenSearchCon as well, will follow up with the issue, thank you!

@hdhalter
Copy link

Hi @yigithub , if anything here is related to 2.10, please create a doc issue or PR for the update. Thanks!

@yigithub yigithub added this to 2.10.0 (September 22nd, 2023) in OpenSearch Project Roadmap Sep 7, 2023
@yigithub yigithub removed this from 2.10.0 (September 22nd, 2023) in OpenSearch Project Roadmap Sep 7, 2023
@yigithub yigithub added this to 2.11.0 (November 16th, 2023) in OpenSearch Project Roadmap Sep 7, 2023
@yigithub yigithub self-assigned this Jan 22, 2024
@yigithub yigithub removed this from 2.12.0 (Release window opens Feb 6 closes Feb 20) in OpenSearch Project Roadmap Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Aggregations
Projects
Status: No status
Development

No branches or pull requests

6 participants