-
Notifications
You must be signed in to change notification settings - Fork 24.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: TribeIT#testClusterStateNodes fails with "can't update [discovery.zen.minimum_master_nodes]" #23695
Comments
Couldn't reproduce, but here's the line in any case:
|
@ywelsch can you take a look at this and re-assign if needed? |
This exhibits a real issue (but I'm not sure yet why the test fails) with the To reproduce the issue:
What's bad about this is that the node will not apply any of the other cluster settings in that cluster state publishing round. /cc: @bleskes |
It looks like there was a failure related to this today. On the G1Gc master build. https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+g1gc/2327/console gradle :core:integTest -Dtests.seed=8C71214509D0E0F4 -Dtests.class=org.elasticsearch.tribe.TribeIT -Dtests.method="testClusterStateNodes" -Dtests.security.manager=true -Dtests.jvm.argline="-XX:-UseConcMarkSweepGC -XX:+UseG1GC" -Dtests.locale=es-PE -Dtests.timezone=Africa/Gaborone |
…s master The purpose of this validation is to make sure that the master doesn't step down due to a change in master nodes, which also means that there is no way to revert an accidental change. Since we validate using the current cluster state (and not the one from which the settings come from) we have to be careful and only validate if the local node is already a master. Doing so all the time causes subtle issues. For example, a node that joins a cluster has no nodes in its current cluster state. When it receives a cluster state from the master with a dynamic minimum master nodes setting int it, we must make sure we don't reject it. Closes elastic#23695
… master (#23915) The purpose of this validation is to make sure that the master doesn't step down due to a change in master nodes, which also means that there is no way to revert an accidental change. Since we validate using the current cluster state (and not the one from which the settings come from) we have to be careful and only validate if the local node is already a master. Doing so all the time causes subtle issues. For example, a node that joins a cluster has no nodes in its current cluster state. When it receives a cluster state from the master with a dynamic minimum master nodes setting int it, we must make sure we don't reject it. Closes #23695
… master (#23915) The purpose of this validation is to make sure that the master doesn't step down due to a change in master nodes, which also means that there is no way to revert an accidental change. Since we validate using the current cluster state (and not the one from which the settings come from) we have to be careful and only validate if the local node is already a master. Doing so all the time causes subtle issues. For example, a node that joins a cluster has no nodes in its current cluster state. When it receives a cluster state from the master with a dynamic minimum master nodes setting int it, we must make sure we don't reject it. Closes #23695
… master (#23915) The purpose of this validation is to make sure that the master doesn't step down due to a change in master nodes, which also means that there is no way to revert an accidental change. Since we validate using the current cluster state (and not the one from which the settings come from) we have to be careful and only validate if the local node is already a master. Doing so all the time causes subtle issues. For example, a node that joins a cluster has no nodes in its current cluster state. When it receives a cluster state from the master with a dynamic minimum master nodes setting int it, we must make sure we don't reject it. Closes #23695
This test failed again after @bleskes fix was applied: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+java9-periodic/2051/console The failure is different, but I want to make sure that it has nothing do with the original failure or the fix itself before opening a new issue.
|
Another failure today:
|
Reopening as the issue is still failing |
The tribe was being shutdown by the test while a publishing round (that adds the tribe node to a cluster) is not completed yet (i.e. the node itself knows that it became part of the cluster, and the test shuts the tribe node down, but another node has not applied the cluster state yet, which makes that node hang while trying to connect to the node that is shutting down (due to connect_timeout being 30 seconds), delaying publishing for 30 seconds, and subsequently tripping an assertion when another tribe instance wants to join. Relates to #23695
I've looked into this failure (which is unrelated to the first one that was fixed here). What I think is happening here is that the tribe node Relevant log lines:
I've pushed 7c39507 to address this. |
The tribe was being shutdown by the test while a publishing round (that adds the tribe node to a cluster) is not completed yet (i.e. the node itself knows that it became part of the cluster, and the test shuts the tribe node down, but another node has not applied the cluster state yet, which makes that node hang while trying to connect to the node that is shutting down (due to connect_timeout being 30 seconds), delaying publishing for 30 seconds, and subsequently tripping an assertion when another tribe instance wants to join. Relates to #23695
The tribe was being shutdown by the test while a publishing round (that adds the tribe node to a cluster) is not completed yet (i.e. the node itself knows that it became part of the cluster, and the test shuts the tribe node down, but another node has not applied the cluster state yet, which makes that node hang while trying to connect to the node that is shutting down (due to connect_timeout being 30 seconds), delaying publishing for 30 seconds, and subsequently tripping an assertion when another tribe instance wants to join. Relates to #23695
Build: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-intake/974
There are several of these failures in the logs:
The text was updated successfully, but these errors were encountered: