Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] CcrRollingUpgradeIT.testBiDirectionalIndexFollowing fails reproducibly in 6.7 #40677

Closed
droberts195 opened this issue Apr 1, 2019 · 6 comments
Assignees
Labels
:Distributed/CCR Issues around the Cross Cluster State Replication features >test-failure Triaged test failures from CI v6.7.1

Comments

@droberts195
Copy link
Contributor

CcrRollingUpgradeIT.testBiDirectionalIndexFollowing is failing in the 6.7 BWC tests:

The exception is:

   > Throwable #1: org.elasticsearch.client.ResponseException: method [PUT], host [http://[::1]:40247], URI [/follower_index6/_ccr/follow], status line [HTTP/1.1 400 Bad Request]
   > {"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[node-1][127.0.0.1:42344][indices:admin/xpack/ccr/put_follow]"}],"type":"illegal_argument_exception","reason":"no index stats available for the leader index"},"status":400}
   >    at __randomizedtesting.SeedInfo.seed([3191D00E20B9D5EB:E9516F3EA88AB21D]:0)
   >    at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:936)
   >    at org.elasticsearch.client.RestClient.performRequest(RestClient.java:233)
   >    at org.elasticsearch.upgrades.CcrRollingUpgradeIT.followIndex(CcrRollingUpgradeIT.java:315)
   >    at org.elasticsearch.upgrades.CcrRollingUpgradeIT.testBiDirectionalIndexFollowing(CcrRollingUpgradeIT.java:250)
   >    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   >    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   >    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   >    at java.base/java.lang.reflect.Method.invoke(Method.java:567)
   >    at java.base/java.lang.Thread.run(Thread.java:835)
   > Caused by: org.elasticsearch.client.ResponseException: method [PUT], host [http://[::1]:40247], URI [/follower_index6/_ccr/follow], status line [HTTP/1.1 400 Bad Request]
   > {"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[node-1][127.0.0.1:42344][indices:admin/xpack/ccr/put_follow]"}],"type":"illegal_argument_exception","reason":"no index stats available for the leader index"},"status":400}
   >    at org.elasticsearch.client.RestClient$1.completed(RestClient.java:552)
   >    at org.elasticsearch.client.RestClient$1.completed(RestClient.java:537)
   >    at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:119)
   >    at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:177)
   >    at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:436)
   >    at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:326)
   >    at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265)
   >    at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
   >    at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
   >    at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114)
   >    at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
   >    at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
   >    at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
   >    at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
   >    at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
   >    at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588)
   >    ... 1 more

This reproduces locally every time on a CentOS 7 machine using:

./gradlew :x-pack:qa:rolling-upgrade-multi-cluster:v6.7.0#bwcTest -Dtests.seed=3191D00E20B9D5EB -Dtests.class=org.elasticsearch.upgrades.CcrRollingUpgradeIT
@droberts195 droberts195 added >test-failure Triaged test failures from CI :Distributed/CCR Issues around the Cross Cluster State Replication features v6.7.1 labels Apr 1, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@martijnvg martijnvg self-assigned this Apr 1, 2019
@martijnvg
Copy link
Member

martijnvg commented Apr 1, 2019

This looks like a timing issue in test. Waiting for the leader index to be green/yellow prior to the put follow api call should fix this.
I will do some more digging in order to be sure about this diagnosis.

@martijnvg
Copy link
Member

This is a test 🐛.

The put follow call fails, because the leader cluster wants to follow a follow index that was been created with a previous put follow call. The problem is that wait_for_active_shards=1 isn't used and the call returns immediately without waiting for the follow index to be created/restored. This wasn't a problem before, because w ewere testing against node before 6.7.0 (where the wait_for_active_shards did not exist and we created the follow index instead of restoring for the leader index). Today we're testing against a 6.7.0 node and this now fails. I will fix this asap.

Note that in 7.0.0 and higher, we're always setting wait_for_active_shards=1.

@droberts195
Copy link
Contributor Author

Thanks for confirming that it's not a worry for releasing 6.7.1 @martijnvg!

@martijnvg
Copy link
Member

I've opened #40681

@martijnvg
Copy link
Member

#40681 has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/CCR Issues around the Cross Cluster State Replication features >test-failure Triaged test failures from CI v6.7.1
Projects
None yet
Development

No branches or pull requests

3 participants