Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait until engine is started up when acquiring searcher #7456

Merged
merged 1 commit into from Aug 26, 2014

Conversation

Projects
None yet
4 participants
@s1monw
Copy link
Contributor

s1monw commented Aug 26, 2014

Today we have a small window where a searcher can be acquired but the
engine is in the state of starting up. This causes a NPE triggering a
shard failure if we are fast enough. This commit fixes this situation
gracefully.

Closes #7455

@s1monw s1monw added the review label Aug 26, 2014

@bleskes

This comment has been minimized.

Copy link
Member

bleskes commented Aug 26, 2014

LGTM

@bleskes bleskes removed the review label Aug 26, 2014

[ENGINE] Wait until engine is started up when acquireing searcher
Today we have a small window where a searcher can be acquired but the
engine is in the state of starting up. This causes a NPE triggering a
shard failure if we are fast enough. This commit fixes this situation
gracefully.

Closes #7455

@s1monw s1monw force-pushed the s1monw:issues/7455 branch to 0676869 Aug 26, 2014

@s1monw s1monw merged commit 0676869 into elastic:master Aug 26, 2014

@clintongormley clintongormley changed the title [ENGINE] Wait until engine is started up when acquireing searcher [ENGINE] Wait until engine is started up when acquiring searcher Aug 26, 2014

@clintongormley clintongormley changed the title [ENGINE] Wait until engine is started up when acquiring searcher Internal: Wait until engine is started up when acquiring searcher Sep 8, 2014

@ajhalani

This comment has been minimized.

Copy link

ajhalani commented Sep 30, 2014

We are seeing an issue with v1.3.2 which may be related to this. Would really appreciate if someone could confirm it's indeed related or not..

After node restart, sometimes one of the shard's recovery is indefinitely stuck with no updates after the trace [index.engine.internal ] [linux01.node] [myindex][0] starting engine. Doing hot threads (few times with a minute break), get dumps like below -

   102.9% (514.5ms out of 500ms) cpu usage by thread 'elasticsearch[linux01.node][generic][T#5]'
     10/10 snapshots sharing following 15 elements
       java.util.Collections$UnmodifiableCollection$1.hasNext(Collections.java:1066)
       org.elasticsearch.index.engine.internal.InternalEngine$SearchFactory.newSearcher(InternalEngine.java:1561)
       org.apache.lucene.search.SearcherManager.getSearcher(SearcherManager.java:160)
       org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:122)
       org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58)
       org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176)
       org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:225)
       org.elasticsearch.index.engine.internal.InternalEngine.refresh(InternalEngine.java:766)
       org.elasticsearch.index.engine.internal.InternalEngine.delete(InternalEngine.java:686)
       org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:780)
       org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:250)
       org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
       java.lang.Thread.run(Thread.java:722)
@clintongormley

This comment has been minimized.

Copy link
Member

clintongormley commented Oct 14, 2014

@bleskes can you confirm that #7456 (comment) is a symptom of this (fixed) issue?

@bleskes

This comment has been minimized.

Copy link
Member

bleskes commented Oct 14, 2014

@clintongormley this bug caused an NPE (because a variable was not set yet). The stack trace is indicating that the recovery is in it's translog phase, meaning the engine has already started.

@ajhalani it looks like you use a lot of delete-by-query operations. Is that true? Those are replayed when you recover the shard, which takes longer then simple indexing (and we refresh after them by default, which is not needed during recovery - a potential optimization we can do here).

@ajhalani

This comment has been minimized.

Copy link

ajhalani commented Oct 14, 2014

Thanks for response.

@bleskes - you're right. My issue didn't seem to be related to this. It was indeed due to slowness in recovery from translog. In hot threads dump, I see frequent refresh and flush, and it was much slower than regular indexing..

@clintongormley clintongormley changed the title Internal: Wait until engine is started up when acquiring searcher Wait until engine is started up when acquiring searcher Jun 7, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.