New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AtomicRegisterCoordinatorTests.testClusterRecoversAfterExceptionDuringSerialization failure #98606
Labels
:Distributed/Cluster Coordination
Cluster formation and cluster state publication, including cluster membership and fault detection.
Team:Distributed
Meta label for distributed team
>test-failure
Triaged test failures from CI
Comments
DaveCTurner
added
>test-failure
Triaged test failures from CI
:Distributed/Cluster Coordination
Cluster formation and cluster state publication, including cluster membership and fault detection.
labels
Aug 17, 2023
Pinging @elastic/es-distributed (Team:Distributed) |
DaveCTurner
added a commit
to DaveCTurner/elasticsearch
that referenced
this issue
Aug 18, 2023
It's possible (although very unlikely) that the `GatewayService` recovers the state, then fails over to a new master with unrecovered state, and then fails back to the original master, and only then performs the reroute that resets the flags which would trigger another state recovery attempt. This leaves the cluster in an unrecovered state until the next cluster state update. This commit resets the flags at the end of the recovery update rather than waiting until after the reroute, allowing a subsequent election to retry recovery again. Closes elastic#98606
DaveCTurner
added a commit
to DaveCTurner/elasticsearch
that referenced
this issue
Aug 21, 2023
It's possible (although very unlikely) that the `GatewayService` recovers the state, then fails over to a new master with unrecovered state, and then fails back to the original master, and only then performs the reroute that resets the flags which would trigger another state recovery attempt. This leaves the cluster in an unrecovered state until the next cluster state update. This commit resets the flags at the end of the recovery update rather than waiting until after the reroute, allowing a subsequent election to retry recovery again. Closes elastic#98606
DaveCTurner
added a commit
that referenced
this issue
Aug 22, 2023
It's possible (although very unlikely) that the `GatewayService` recovers the state, then fails over to a new master with unrecovered state, and then fails back to the original master, and only then performs the reroute that resets the flags which would trigger another state recovery attempt. This leaves the cluster in an unrecovered state until the next cluster state update. This commit resets the flags at the end of the recovery update rather than waiting until after the reroute, allowing a subsequent election to retry recovery again. Closes #98606
dreamquster
pushed a commit
to dreamquster/elasticsearch
that referenced
this issue
Aug 26, 2023
It's possible (although very unlikely) that the `GatewayService` recovers the state, then fails over to a new master with unrecovered state, and then fails back to the original master, and only then performs the reroute that resets the flags which would trigger another state recovery attempt. This leaves the cluster in an unrecovered state until the next cluster state update. This commit resets the flags at the end of the recovery update rather than waiting until after the reroute, allowing a subsequent election to retry recovery again. Closes elastic#98606
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
:Distributed/Cluster Coordination
Cluster formation and cluster state publication, including cluster membership and fault detection.
Team:Distributed
Meta label for distributed team
>test-failure
Triaged test failures from CI
The following test fails reproducibly in current
main
(af071cc):The text was updated successfully, but these errors were encountered: