-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
salt-master EventPublisher crashes every few day #36285
Comments
Additional info the /etc/salt/master on syndic master |
it seems like as #35503 |
@czhong111 just a couple follow up questions that will help us to dive in and try to replicate this. Do you notice when this behavior occurs if you are doing something particular at the time? Like running a highstate for example? Just to verify are your salt-minoins all 2016.3.3 as well? |
this might be a long shot but i'm wondering if you are also running into this issue #35480 where if you restart the master it does not kill all the master processes when you have a custom cache_dir and increased worker threads. Can you test if when you run |
Do you notice when this behavior occurs if you are doing something particular at the time? Like running a highstate for example? When using cmd like
to accept id on syndic master, the cmd will cost cpu, and meanwhile eventpublisher process cost too much cpu. If restart salt-master service, and running the same cmd, the cmd will cost cpu as normal but the eventpubliser cost cpu usage less than 0.1%. Just to verify are your salt-minoins all 2016.3.3 as well? Nope, the salt-minions are almost salt-2015.5.3-5 and salt-2015.5.3-4 Can you test if when you run service salt-master stop if its killing all the process or if it leaves around some processes.
and using ps command to check salt-master processes, it indeedly does not kill all the master processes.
|
Thanks for all The info! So if you stop the master process and then manually kill all the processes and start back the master does that help? Also any chance you could try to upgrade minions and see if it helps the situation?? I know you have a lot so I understand this is might not be plausible. |
Thanks for all The info! So if you stop the master process and then manually kill all the processes and start back the master does that help? Also any chance you could try to upgrade minions and see if it helps the situation?? I know you have a lot so I understand this is might not be plausible. and the process info is as below
|
There is a known issue that looks related. It's fixed in #36024 and will be released in 2016.3.4. |
@czhong111 BTW, #36024 just removes 2 codelines, you could try to do it manually on your master and try. |
Thanks for reply, it did leaks the memory before the master crashes. And i will remove the codelines manually to see if it happens again. |
@czhong111 thank you! |
@czhong111 I've tried to reproduce your issue during a number of days using different states combinations but without success. |
Closed for lack of response. If we do get a response, we can easily re-open this. Thanks. |
Description of Issue/Question
Every few days (1 or 2 days) salt-master stops responding and we need to restart salt-master.
Setup
We have a number of salt-syndics(about 50) that connect to this salt-master. MasterOfMasters/Syndics are very new version (2016.3.3). We get a fair number of Salt minions(30k+) connected with Syndics
Steps to Reproduce Issue
When restart salt-master after a few hours, the EventPublisher process frequently cost 100% cpu resouce
* and top command shows as follows :*
And using command "strace -p 31418" shows like too many EAGAIN error occurring
and the detail is as follows
check the master log on master of masters , can see all the minions(30K+) connected with syndics auth event is captured like:
.....
Versions Report
master of masters and the syndic are all the same version
The text was updated successfully, but these errors were encountered: