Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] AssertionError in ShardSearchStats #37185

Closed
droberts195 opened this issue Jan 7, 2019 · 3 comments

Comments

Projects
None yet
7 participants
@droberts195
Copy link
Contributor

commented Jan 7, 2019

A 6.6 ML test timed out in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.6+matrix-java-periodic/ES_BUILD_JAVA=java11,ES_RUNTIME_JAVA=zulu11,nodes=virtual&&linux/38/consoleText because the thread that was initializing a scroll threw an assertion error:

ERROR   0.00s J3 | TooManyJobsIT (suite) <<< FAILURES!
   > Throwable #1: java.lang.Exception: Suite timeout exceeded (>= 1200000 msec).
   >    at __randomizedtesting.SeedInfo.seed([ACEE7CCD596311C5]:0)
   > Throwable #2: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=1882, name=elasticsearch[node_t1][search][T#3], state=RUNNABLE, group=TGRP-TooManyJobsIT]
   > Caused by: java.lang.AssertionError
   >    at __randomizedtesting.SeedInfo.seed([ACEE7CCD596311C5]:0)
   >    at org.elasticsearch.index.search.stats.ShardSearchStats.lambda$onQueryPhase$2(ShardSearchStats.java:101)
   >    at org.elasticsearch.index.search.stats.ShardSearchStats.computeStats(ShardSearchStats.java:142)
   >    at org.elasticsearch.index.search.stats.ShardSearchStats.onQueryPhase(ShardSearchStats.java:93)
   >    at org.elasticsearch.index.shard.SearchOperationListener$CompositeListener.onQueryPhase(SearchOperationListener.java:155)
   >    at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:407)
   >    at org.elasticsearch.search.SearchService.access$100(SearchService.java:126)
   >    at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:360)
   >    at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:356)
   >    at org.elasticsearch.search.SearchService$4.doRun(SearchService.java:1117)
   >    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:759)
   >    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
   >    at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41)
   >    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
   >    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
   >    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
   >    at java.base/java.lang.Thread.run(Thread.java:834)

The REPRO command is this:

./gradlew :x-pack:plugin:ml:internalClusterTest \
  -Dtests.seed=ACEE7CCD596311C5 \
  -Dtests.class=org.elasticsearch.xpack.ml.integration.TooManyJobsIT \
  -Dtests.method="testSingleNode" \
  -Dtests.security.manager=true \
  -Dtests.locale=yue-Hans-CN \
  -Dtests.timezone=UCT \
  -Dcompiler.java=11 \
  -Druntime.java=11

However, it doesn't reproduce locally for me.

The scroll that was being initialized when this happened is the one that is set up in

SearchRequest searchRequest = new SearchRequest(index);
searchRequest.indicesOptions(MlIndicesUtils.addIgnoreUnavailable(SearchRequest.DEFAULT_INDICES_OPTIONS));
searchRequest.scroll(CONTEXT_ALIVE_DURATION);
searchRequest.source(new SearchSourceBuilder()
.size(BATCH_SIZE)
.query(getQuery())
.fetchSource(shouldFetchSource())
.sort(SortBuilders.fieldSort(ElasticsearchMappings.ES_DOC)));
SearchResponse searchResponse = client.search(searchRequest).actionGet();

@elasticmachine

This comment has been minimized.

Copy link

commented Jan 7, 2019

@atorok atorok added the v7.0.0 label Jan 9, 2019

@s1monw s1monw assigned s1monw and unassigned s1monw Jan 10, 2019

s1monw added a commit to s1monw/elasticsearch that referenced this issue Jan 15, 2019

Ensure either success or failure path for SearchOperationListener is …
…called

Today we have several implementations of executing SearchOperationListener
in SearchService. While all of them seem to be safe at least on, the one that
executes scroll searches can cause illegal execution of SearchOperationListener
that can then in-turn trigger assertions in ShardSearchStats. This change
adds a SearchOperationListenerExecutor that uses try-with blocks to ensure
listeners are called in a safe way.

Relates to elastic#37185

@pcsanwald pcsanwald added v6.7.0 and removed v6.6.0 labels Jan 17, 2019

s1monw added a commit that referenced this issue Jan 23, 2019

Ensure either success or failure path for SearchOperationListener is …
…called (#37467)

Today we have several implementations of executing SearchOperationListener
in SearchService. While all of them seem to be safe at least on, the one that
executes scroll searches can cause illegal execution of SearchOperationListener
that can then in-turn trigger assertions in ShardSearchStats. This change
adds a SearchOperationListenerExecutor that uses try-with blocks to ensure
listeners are called in a safe way.

Relates to #37185

s1monw added a commit that referenced this issue Jan 23, 2019

Ensure either success or failure path for SearchOperationListener is …
…called (#37467)

Today we have several implementations of executing SearchOperationListener
in SearchService. While all of them seem to be safe at least on, the one that
executes scroll searches can cause illegal execution of SearchOperationListener
that can then in-turn trigger assertions in ShardSearchStats. This change
adds a SearchOperationListenerExecutor that uses try-with blocks to ensure
listeners are called in a safe way.

Relates to #37185

@jasontedor jasontedor added v8.0.0 and removed v7.0.0 labels Feb 6, 2019

@danielmitterdorfer danielmitterdorfer added v7.2.0 and removed v6.7.0 labels Feb 7, 2019

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Feb 19, 2019

we just spoke about this again and decided to close it since we added infra to prevent it in #37467

@s1monw s1monw closed this Feb 19, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.