-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
salt minion task process stuck with eventpoll and in sleeping status #55710
Comments
Below is our setup, We have more than 10K servers in the salt infrastructure and geographically spread with syndic masters. Each location we have two syndic masters and respective location minions are connected with both the sydic masters of the same location. |
While am doing the audit, found one more process hung and when I check the strace its stuck with
When I tried to do strace on the threads associated with this process and again the same
|
@kk21986 Thanks for the report. Are you able to upgrade to the latest version of the 2019.2.x branch, 2019.2.2. There were a number of fixes that went into that version, I would be curious if it resolves the issues you're seeing. @saltstack/team-core thoughts? |
@garethgreenaway Thanks for your response! Unfortunately, its a very big task for me to upgrade to the latest version as we have more than 10K servers. The problem here is, I don't have any clue to find out in what scenario this issue appearing, otherwise I would be able to just upgrade it on few servers and test the same. This issue not appearing on all the servers and its happening on different servers randomly a few times. Another strange issue here is, when I tried to stop the minion services those stuck processes are not getting killed, instead of getting TIMEOUT error. If you want me to check anything am ready to do that to find out the root cause. FYI, I have kept a couple of servers with these issues to troubleshoot in case if you want any more information and even I can't guarantee how long I can hold these servers in this stage as well. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue. |
Thank you for updating this issue. It is no longer marked as stale. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue. |
Description of Issue
I have found on few servers where the salt-minion task process(not minion service process) got stuck with
eventpoll
and while debugging found that the respective process is insleeping
statue. Since this process is running forever restarting salt-minion also not working where stopping the minion service returnedTIMEOUT FAIL
. Notably, this is issue not appearing all the time and few times only. Below are my findings and hope it will be helpful for troubleshooting.From ps
root 3346 1 0 Aug20 ? 02:36:10 /usr/local/python371/bin/python3.7 /usr/local/python371/bin/salt-minion -c /etc/salt -d
From strace
From lsof
From netstat
More details from /proc
I suspect that this process belongs to state.apply as per the Date in the process. Since I have done only that task on that day.
Versions Report
Salt Minion:
Let me know if you need any more details.
The text was updated successfully, but these errors were encountered: