Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] XPackRestIT p0=indices.freeze/10_basic/Basic L74 Failure #52209

Closed
davidkyle opened this issue Feb 11, 2020 · 8 comments · Fixed by #52868
Closed

[CI] XPackRestIT p0=indices.freeze/10_basic/Basic L74 Failure #52209

davidkyle opened this issue Feb 11, 2020 · 8 comments · Fixed by #52868
Assignees
Labels
:Data Management/Indices APIs APIs to create and manage indices and templates :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >test-failure Triaged test failures from CI

Comments

@davidkyle
Copy link
Member

java.lang.AssertionError: Failure at [indices.freeze/10_basic:74]: expected [2xx] status code but api [search] returned [503 Service Unavailable] [{"error":{"root_cause":[],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[],"stack_trace":"Failed to execute phase [query], all shards failed\r\n\tat org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:
...

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.6+multijob-windows-compatibility/os=windows-2019/53/console
https://gradle-enterprise.elastic.co/s/t66es3e2pbe6g

Does not reproduce:

./gradlew ':x-pack:plugin:integTestRunner' --tests "org.elasticsearch.xpack.test.rest.XPackRestIT" \
  -Dtests.method="test {p0=indices.freeze/10_basic/Basic}" \
  -Dtests.seed=143749813EC5AF86 \
  -Dtests.security.manager=true \
  -Dtests.locale=mt-MT \
  -Dtests.timezone=Africa/Ouagadougou \
  -Dcompiler.java=13 \
  -Dtests.rest.blacklist=getting_started/10_monitor_cluster_health/*
@davidkyle davidkyle added >test-failure Triaged test failures from CI :Data Management/ILM+SLM Index and Snapshot lifecycle management v7.6.0 labels Feb 11, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/ILM+SLM)

@andreidan andreidan added :Data Management/Indices APIs APIs to create and manage indices and templates :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Data Management/ILM+SLM Index and Snapshot lifecycle management labels Feb 12, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Engine)

@dnhatn dnhatn added :Search/Search Search-related issues that do not fall into other categories and removed :Data Management/Indices APIs APIs to create and manage indices and templates labels Feb 12, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@dnhatn
Copy link
Member

dnhatn commented Feb 12, 2020

Relates to #51708?

@dnhatn dnhatn added :Data Management/Indices APIs APIs to create and manage indices and templates and removed :Search/Search Search-related issues that do not fall into other categories labels Feb 12, 2020
@jpountz
Copy link
Contributor

jpountz commented Feb 17, 2020

@dnhatn It looks like you already spent some time looking into it, I wonder whether you have ideas where the problem so that we could give a single area label to this failure?

@henningandersen henningandersen self-assigned this Feb 26, 2020
@henningandersen
Copy link
Contributor

henningandersen commented Feb 27, 2020

The problem here is that the ilm-history index is created in the middle of the test (not explicitly by the test but implicitly by ILM, due to some earlier tests triggering ilm).

[2020-02-18T01:29:15,055] [ilm-history-1-000001] creating index, cause [api], templates [ilm-history], shards [1]/[0], mappings [_doc]

If the test is unfortunate enough to run the search using _all while the primary is still initializing, I believe it will fail like this. In the above test run, following happened in the previous test:

[2020-02-18T01:29:13,056] ... [.ml-notifications-000001] creating index, cause [auto(bulk api)], templates [.ml-notifications-000001], shards [1]/[1], mappings [_doc]

but the ilm-history index was not created in that test. The bulk processor used in ILMHistoryStore has a flush interval of 5 seconds. Going backwards I found:

[2020-02-18T01:29:09,771] ... [ilm-history-1-000001] creating index, cause [api], templates [ilm-history], shards [1]/[0], mappings [_doc]

Looks like it is maybe even the history entry for updating the history's ilm status that causes this.

Will open a PR to disable ILM history during these tests.

henningandersen added a commit to henningandersen/elasticsearch that referenced this issue Feb 27, 2020
The ILM history index can be delayed created from one test into the
next, which can cause issues for tests using `_all`.

Closes elastic#52209
henningandersen added a commit that referenced this issue Feb 27, 2020
The ILM history index can be delayed created from one test into the
next, which can cause issues for tests using `_all`.

Closes #52209
henningandersen added a commit that referenced this issue Feb 27, 2020
The ILM history index can be delayed created from one test into the
next, which can cause issues for tests using `_all`.

Closes #52209
henningandersen added a commit that referenced this issue Feb 27, 2020
The ILM history index can be delayed created from one test into the
next, which can cause issues for tests using `_all`.

Closes #52209
@ywelsch ywelsch removed the v7.6.2 label Mar 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Indices APIs APIs to create and manage indices and templates :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants