-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scylla stuck during systemctl stop scylla-server.service
(while running TerminateAndRemoveNodeMonkey)
#8191
Comments
seems like it happened also during the weekly Logs
Look like it stuck on the same place:
And after 15min, seem like systemd timeouted, and scylla-server started again:
|
can we get a stuck node ? |
one, systemd seem to timeout after 15min, regardless (i.e. in we might be able to run the 48h case only with |
trying it here: |
@slivne @bhalevy the cluster is all yours. but again, systemd is wiping the evidence and forcefully killing the process:
|
@xemul please look into this |
Hm ... when we shutdown we will flush memtables - this is not supposed to increase commitlog usage - its supposed actually to clear the commitlogs. |
@elcallio so the cluster with the reproducer isn't needed ? (I'm going to terminate it) |
If this is a hanging hints manager, it should be addressed by added53 |
I have this node where it reproduced if someone needs it: 13.49.78.133. |
@elcallio can you please check if this is the issue |
It seems that I faced this issue in 4.4.rc3-0.20210304.c2d924757 as part of longevirt-counters-multidc-test. During an ENOSPC nemesis, after node-4 was filled and hit the expected "storage I/O error", the nemesis clear the space and restarting the service by: "sudo systemctl restart scylla-server.service".
This is the last message from scylla until:
Logs available in: |
Duplicate of #8079 |
Installation details
Scylla version (or git commit hash): 4.5.dev-0.20210216.2f3b265da with build-id 6a7412abfaf73ef877a8e6ae22759227e89cac13 (ami-0208ab84477ace351)
Cluster size: 6 (i3.4xlarge)
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-0208ab84477ace351
Summary
During
TerminateAndRemoveNodeMonkey
as part oflongevity-50gb-3days
weekly run on masterWhile calling
systemctl stop scylla-server.service
(onlongevity-tls-50gb-3d-master-db-node-147194be-4
) scylla get stuck, and last seen log is from the hints_manager being stoppedLogs
The text was updated successfully, but these errors were encountered: