New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node replacement get stuck #291
Comments
At that state node does not produce any logs. deleting pod, kicked things thru, but scylla failed to start:
|
@dkropachev do you have Scylla logs saved anywhere? Stuck node is going to be fixed in #297 |
Unfortunately I did not grab them, tomorrow I will spin test environment and get them |
Also, it worth to mention that there are a lot of cases how we can endup with missing ip address in gossip, i think s-o should check if address in the gossip before forcing scylla to replace it. |
Still see that in some cases node replacement did not endup with fully functioning node, latest example: db-cluster | https://cloudius-jenkins-test.s3.amazonaws.com/9881ad6a-d801-464f-a19f-19c1b1d26231/20201223_200155/db-cluster-9881ad6a.zip | events:
|
@dkropachev how often do you encounter this issue? |
Every second run of the related jenkins job, i.e. 1 of 12-16 replacing. |
First initiate PVC deletion and then clear finalizers to unblock PVC deletion. Fixes #291
First initiate PVC deletion and then clear finalizers to unblock PVC deletion. Fixes #291
Describe the bug
Node replacement get stuck
target node did not get back to life
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Node is back to life
Config Files and Logs
logs_and_configs.zip
Environment:
The text was updated successfully, but these errors were encountered: