Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

salt-minion, memory and oom-killer with salt-master 2014.7.0 #19999

Closed
equinoxefr opened this issue Jan 23, 2015 · 16 comments

Comments

Projects
None yet
6 participants
@equinoxefr
Copy link
Contributor

commented Jan 23, 2015

Hi,

Since i upgraded my salt-master server (Linux Centos 6.5) from salt 2014.1.13 to 2014.7.0, my minions (Linux, 2014.1.13 and 2014.7.0) are eating memory randomly (10 servers in a few days on up to 200 minions). Windows minions aren't affected.

A few days ago, i downgraded salt-master from 2014.7.0 to 2014.1.13 and the problem disappeared. I didn't see useful information in logs. What kind of information could be useful for you to help debugging ?

regards
Pierre

@oldmantaiter

This comment has been minimized.

Copy link
Contributor

commented Jan 23, 2015

@equinoxefr What version of ZMQ is the minion using? I recently had this issue as well and with ZMQ3 in EL6 I was able to solve the issue.

@rallytime rallytime added this to the Blocked milestone Jan 23, 2015

@rallytime

This comment has been minimized.

Copy link
Contributor

commented Jan 23, 2015

@equinoxefr Can you post the out put of salt --versions-report on your master and as well as a versions report from a minion (salt <minion> test.versions_report)? We did have some tricky issues like this pop up that were resolved by updating ZMQ as @oldmantaiter suggested.

@equinoxefr

This comment has been minimized.

Copy link
Contributor Author

commented Jan 26, 2015

Hi,

Now with the downgraded master:

[root@salt ~]# salt --versions-report
Salt: 2014.1.13
Python: 2.6.6 (r266:84292, Jan 22 2014, 09:42:36)
Jinja2: 2.2.1
M2Crypto: 0.20.2
msgpack-python: 0.1.13
msgpack-pure: Not Installed
pycrypto: 2.0.1
PyYAML: 3.10
PyZMQ: 14.3.1
ZMQ: 3.2.4

With the latest master:

[root@salt ~]# salt --versions-report
Salt: 2014.7.0
Python: 2.6.6 (r266:84292, Jan 22 2014, 09:42:36)
Jinja2: 2.2.1
M2Crypto: 0.20.2
msgpack-python: 0.1.13
msgpack-pure: Not Installed
pycrypto: 2.0.1
libnacl: Not Installed
PyYAML: 3.10
ioflo: Not Installed
PyZMQ: 14.3.1
RAET: Not Installed
ZMQ: 3.2.4
Mako: Not Installed

On the minion:

[root@HERACLES ~]# salt-call --versions-report
Salt: 2014.7.0
Python: 2.6.6 (r266:84292, Jan 22 2014, 09:42:36)
Jinja2: 2.2.1
M2Crypto: 0.20.2
msgpack-python: 0.1.13
msgpack-pure: Not Installed
pycrypto: 2.0.1
libnacl: Not Installed
PyYAML: 3.10
ioflo: Not Installed
PyZMQ: 14.3.1
RAET: Not Installed
ZMQ: 3.2.4
Mako: 0.3.4

I'm using Centos 6.x with EPEL repo.

Regards
Pierre

@rallytime

This comment has been minimized.

Copy link
Contributor

commented Jan 26, 2015

@equinoxefr I would definitely give upgrading your ZMQ version a try. We have seen these types of issues be resolved by upgrading to ZMQ 4.

@equinoxefr

This comment has been minimized.

Copy link
Contributor Author

commented Jan 27, 2015

Hi,

I upgraded to ZMQ 4 today on my master with this package http://www.itsprite.com/centos-linux-how-to-upgrade-zmq2-x-to-zmq-4-x/

[root@salt ~]# salt --versions-report
Salt: 2014.7.0
Python: 2.6.6 (r266:84292, Jan 22 2014, 09:42:36)
Jinja2: 2.2.1
M2Crypto: 0.20.2
msgpack-python: 0.1.13
msgpack-pure: Not Installed
pycrypto: 2.0.1
libnacl: Not Installed
PyYAML: 3.10
ioflo: Not Installed
PyZMQ: 14.3.1
RAET: Not Installed
ZMQ: 4.0.4
Mako: Not Installed

Stay tuned ;)

@rallytime

This comment has been minimized.

Copy link
Contributor

commented Jan 27, 2015

Awesome! Keep us posted!

@equinoxefr

This comment has been minimized.

Copy link
Contributor Author

commented Feb 2, 2015

Hi,

A few days later, no oom-killer. It seems that ZeroMQ 4 was a good workaround. It was a bit tricky to install it on redhat 5.x but now it work's with repositories from copr-be.cloud.fedoraproject.org

Regards
Pierre

@rallytime

This comment has been minimized.

Copy link
Contributor

commented Feb 2, 2015

@equinoxefr Excellent! I am glad that worked for you. And yes, it is a bit tricky with RHEL-5.x systems, but I am glad that our COPR packages worked out for you. Since it seems you've got this resolved, can this issue be closed?

@equinoxefr

This comment has been minimized.

Copy link
Contributor Author

commented Feb 3, 2015

If you think that salt 2014.7.0 will only work with ZMQ4 and not anymore with ZMQ3, you can close it ;)

@equinoxefr equinoxefr closed this Feb 3, 2015

@equinoxefr

This comment has been minimized.

Copy link
Contributor Author

commented Feb 16, 2015

Hi,

My servers and minions are in zeroMQ 4 but i have some new oom-killer events :-(

How can i help you to debug ?

Regards

@equinoxefr equinoxefr reopened this Feb 16, 2015

@equinoxefr

This comment has been minimized.

Copy link
Contributor Author

commented Feb 16, 2015

Some logs:

/var/log/messages

Feb 15 00:44:43 sisib05 kernel: salt-minion invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0, oom_score_adj=0
Feb 15 00:44:43 sisib05 kernel: salt-minion cpuset=/ mems_allowed=0
Feb 15 00:44:43 sisib05 kernel: Pid: 21236, comm: salt-minion Not tainted 2.6.32-431.17.1.el6.x86_64 #1
Feb 15 00:44:43 sisib05 kernel: Call Trace:
Feb 15 00:44:43 sisib05 kernel: [] ? cpuset_print_task_mems_allowed+0x91/0xb0
Feb 15 00:44:43 sisib05 kernel: [] ? dump_header+0x90/0x1b0
Feb 15 00:44:43 sisib05 kernel: [] ? security_real_capable_noaudit+0x3c/0x70
Feb 15 00:44:43 sisib05 kernel: [] ? oom_kill_process+0x82/0x2a0
Feb 15 00:44:43 sisib05 kernel: [] ? select_bad_process+0xe1/0x120
Feb 15 00:44:43 sisib05 kernel: [] ? out_of_memory+0x220/0x3c0
Feb 15 00:44:43 sisib05 kernel: [] ? __alloc_pages_nodemask+0x89f/0x8d0
Feb 15 00:44:43 sisib05 kernel: [] ? alloc_pages_vma+0x9a/0x150
Feb 15 00:44:43 sisib05 kernel: [] ? do_wp_page+0xfd/0x920
Feb 15 00:44:43 sisib05 kernel: [] ? swap_info_get+0x63/0xe0
Feb 15 00:44:43 sisib05 kernel: [] ? handle_pte_fault+0x2cd/0xb00
Feb 15 00:44:43 sisib05 kernel: [] ? rb_insert_color+0x9d/0x160
Feb 15 00:44:43 sisib05 kernel: [] ? handle_mm_fault+0x22a/0x300
Feb 15 00:44:43 sisib05 kernel: [] ? __do_page_fault+0x138/0x480
Feb 15 00:44:43 sisib05 kernel: [] ? sys_recvfrom+0x16b/0x180
Feb 15 00:44:43 sisib05 kernel: [] ? __switch_to+0x26e/0x320
Feb 15 00:44:43 sisib05 kernel: [] ? read_tsc+0x9/0x20
Feb 15 00:44:43 sisib05 kernel: [] ? ktime_get_ts+0xb1/0xf0
Feb 15 00:44:43 sisib05 kernel: [] ? thread_return+0x4e/0x760
Feb 15 00:44:43 sisib05 kernel: [] ? do_page_fault+0x3e/0xa0
Feb 15 00:44:43 sisib05 kernel: [] ? page_fault+0x25/0x30
Feb 15 00:44:43 sisib05 kernel: Mem-Info:
Feb 15 00:44:43 sisib05 kernel: Node 0 DMA per-cpu:
Feb 15 00:44:43 sisib05 kernel: CPU 0: hi: 0, btch: 1 usd: 0
Feb 15 00:44:43 sisib05 kernel: CPU 1: hi: 0, btch: 1 usd: 0
Feb 15 00:44:43 sisib05 kernel: Node 0 DMA32 per-cpu:
Feb 15 00:44:43 sisib05 kernel: CPU 0: hi: 186, btch: 31 usd: 80
Feb 15 00:44:43 sisib05 kernel: CPU 1: hi: 186, btch: 31 usd: 29
Feb 15 00:44:43 sisib05 kernel: Node 0 Normal per-cpu:
Feb 15 00:44:43 sisib05 kernel: CPU 0: hi: 186, btch: 31 usd: 30
Feb 15 00:44:43 sisib05 kernel: CPU 1: hi: 186, btch: 31 usd: 82
Feb 15 00:44:43 sisib05 kernel: active_anon:546690 inactive_anon:203567 isolated_anon:19739
Feb 15 00:44:43 sisib05 kernel: active_file:45 inactive_file:87 isolated_file:68
.....

I have ~1800 process salt-minion in stack during the oom-killer call !

I think i will try to disable multiprocessing support on minion.

@equinoxefr

This comment has been minimized.

Copy link
Contributor Author

commented Feb 16, 2015

Hi,

i found that some of my windows minions are concerned by #19350

It seems that those minions are eating threads on master. After some time, master is out of order.

I upgraded my windows minions from 2014.7.0 to 2014.7.1 to see if it can resolve my linux issues with oom-killer.

@equinoxefr

This comment has been minimized.

Copy link
Contributor Author

commented Feb 17, 2015

A few more oom-killer this night. I saw this log on the master

2015-02-16 20:29:23,114 [salt.client ][ERROR ] Salt request timed out. If this error persists, worker_threads may need to be increased.
2015-02-16 20:32:31,948 [salt.client ][ERROR ] Salt request timed out. If this error persists, worker_threads may need to be increased.
2015-02-16 20:35:40,124 [salt.client ][ERROR ] Salt request timed out. If this error persists, worker_threads may need to be increased.

I have this conf:
worker_threads: 30

@rallytime

This comment has been minimized.

Copy link
Contributor

commented Feb 24, 2015

@equinoxefr Are you only seeing this behavior on the windows minions? Or minions on other distros as well?

@UtahDave

This comment has been minimized.

Copy link
Member

commented Feb 24, 2015

Upgrade to 2014.7.1

There's a known memory/thread leak on 2014.7.0, especially for Windows minions

@ssgward

This comment has been minimized.

Copy link

commented May 19, 2015

Is this still an issue on the latest version of salt? Can we close if this is no longer an issue?

@ssgward ssgward added P3 and removed Medium Severity labels May 19, 2015

@cachedout cachedout closed this Dec 23, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.