Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very large scroll search (i.e. reindex) can gradually slow down #65780

Closed
jakelandis opened this issue Dec 2, 2020 · 4 comments
Closed

Very large scroll search (i.e. reindex) can gradually slow down #65780

jakelandis opened this issue Dec 2, 2020 · 4 comments
Labels
>bug :Distributed/Reindex Issues relating to reindex that are not caused by issues further down :Search/Search Search-related issues that do not fall into other categories Team:Distributed Meta label for distributed team Team:Search Meta label for search team

Comments

@jakelandis
Copy link
Contributor

jakelandis commented Dec 2, 2020

Since 7.7 (via this PR) added better ability to cancel a search request. However, this resulted in adding a method to cancel a task to a collection on the context searcher. That collection is checked very frequently and the count of that collection can grow unbounded. The memory footprint is not an issue, rather the number of iterations for very long running scroll searches, such as used by re-index. In testing this started to show an issue around 50m documents and kept increasing the search latency as time went on.

Below is a test run of 180m documents being re-index that show the increase in the search latency and decrease in the search rate.

(7.9.1)
image

Hot threads will look similar to:

  2.9% (29.3ms out of 1s) cpu usage by thread 'elasticsearch[node1][search][T#93]'
     2/10 snapshots sharing following 20 elements
       app//org.elasticsearch.search.internal.ContextIndexSearcher$MutableQueryTimeout.checkCancelled(ContextIndexSearcher.java:357)
       app//org.elasticsearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:196)
       app//org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:185)
       app//org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:445)
       app//org.elasticsearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:343)
       app//org.elasticsearch.search.query.QueryPhase.executeInternal(QueryPhase.java:298)
       app//org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:150)
       app//org.elasticsearch.search.SearchService.lambda$executeQueryPhase$1(SearchService.java:485)
       app//org.elasticsearch.search.SearchService$$Lambda$5754/0x0000000801a8b040.get(Unknown Source)
       app//org.elasticsearch.search.SearchService$$Lambda$5270/0x0000000801a2d040.get(Unknown Source)
       app//org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:58)
       app//org.elasticsearch.action.ActionRunnable$$Lambda$5092/0x00000008019a2840.accept(Unknown Source)
       app//org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73)
       app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       app//org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44)
       app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:710)
       app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
       java.base@14.0.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
       java.base@14.0.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
       java.base@14.0.1/java.lang.Thread.run(Thread.java:832)

This issue is fixed as 7.10.0 due to #61062 and #46523 which will now re-create the searcher on each phase even for scroll requests. Which means that this collection will grow unbounded anymore. The same test above was run on 7.10.0 and did not show any signs of performance degradation.

For 7.7 -> 7.9.x there is an easy work around to for this issue:

PUT _cluster/settings
{
  "persistent": {
    "search.low_level_cancellation" : false
  }
}

Which will will prevent that collection from even being used. (also tested to fix the issue).

@jakelandis jakelandis added >bug :Distributed/Reindex Issues relating to reindex that are not caused by issues further down labels Dec 2, 2020
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team label Dec 2, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@jakelandis
Copy link
Contributor Author

Since this issue is fixed in 7.10.0, I will close this issue as fixed by #61062 and this issue is for documentation purposes only.

@jakelandis
Copy link
Contributor Author

related to slow reindex (but a totally different cause) #65788

@jakelandis jakelandis added the :Search/Search Search-related issues that do not fall into other categories label Dec 2, 2020
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Dec 2, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Reindex Issues relating to reindex that are not caused by issues further down :Search/Search Search-related issues that do not fall into other categories Team:Distributed Meta label for distributed team Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

2 participants