-
Notifications
You must be signed in to change notification settings - Fork 2.9k
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Processes stuck forever #4974
Comments
Same here with |
Is there any plan to fix it? |
Same with me, it's terriable and cause slow system |
Same here. It's happened to me on several versions of Rancher, so I don't think it's a new bug. I'd like to know how to manually clear them if anyone has an idea. |
I am not sure this is a proper fix or not, but it did solve my problem. I turned off the rancher server, used the following mysql query to fix the zombie process UPDATE process_instance SET exit_reason='DONE', end_time=NOW() WHERE exit_reason = 'UNKNOWN_EXCEPTION'; Further for cases where there were no exit reason and never ended, I have tried this which works for all the cases. UPDATE process_instance SET exit_reason='DONE', end_time=NOW() WHERE end_time is NULL; Starting rancher server again kept the log normal and the backgroun process never started for these failed ones. Caution: This may have product side effects, and few unexpected behaviors may pop up due to manipulated data, I took the risk as the situation was bad, but a proper solution through the product should be the right one. |
@Modomu - Can you confirm all your hosts are connected and you are able to reach them? Also there may be docker filesystem issues at times, which causes the container to never start again properly. Your case is different than mine. |
@Modomu - I think, you need the rancher team or an expert to look into it, I am not sure about the cause. |
Thanks @sujaisd |
It's probably difficult for the Rancher team to fix this, as the root cause of processes being stuck can be multiple. I believe they tend to work on fixing the issue which causes the process to get stuck, instead of providing a kill switch to quieten processes saying "hey there is something wrong there!". |
With the release of Rancher 2.0, development on v1.6 is only limited to critical bug fixes and security patches. |
Rancher Version: v1.0.2
Docker Version: 1.11.0
OS and where are the hosts located? (cloud, bare metal, etc): Amazon AWS with Ubuntu 14.04.4LTS (3.13.0-77-generic)
Setup Details: (single node rancher vs. HA rancher, internal DB vs. external DB) Single node rancher, with both server and node on the same machine.
Environment Type: (Cattle/Kubernetes/Swarm/Mesos) Cattle
Steps to Reproduce: When rebooting the machine, rancher systematically fails to reconnect the convoy volume. I must stop the service whose container requires the convoy volume, delete the container and restart the service to create a new container. Then only then the convoy volume is bound correctly. We're using convoy 0.4.
Due to this issue, one process stuck forever is stacking up on every reboot. Some of them are suck for months now.
Results:
Rancher server logs are full of these logs:
Expected:
Get a way to kill / stop zombie processes.
The text was updated successfully, but these errors were encountered: