Memory leak in Reactor #37938

johje349 · 2016-11-29T08:04:42Z

Description of Issue/Question

Memory leak, most notable in Reactor and EventPublisher. The Reactor process has over 1G reserved memory. The leak seems to occur during orchestration following minion creation using salt-cloud.
Total used memory after a day of uptime has gone from 2G to 10G.

Possibly due to queuing the states in the orchestration. There are between 70 to 100 minions in the setup (we do auto scaling).

# ps -xauwww | egrep "2286|1882" | grep -v egrep
root      1882  0.1  2.0 547848 343016 ?       S    Nov28   1:28 /usr/bin/python /usr/bin/salt-master EventPublisher
root      2286  6.9  6.6 2011440 1098436 ?     Rl   Nov28  87:39 /usr/bin/python /usr/bin/salt-master Reactor

Setup

A reactor which fires on salt/cloud/*/created event:

#  cat minion-created.sls 
startup-orchestration:
  runner.state.orchestrate:
    - mods: orchestration.minion-created
    - pillar:
        id: {{ data['name'] }}

The orchestration SLS:

# cat minion-created.sls 
{% set host = salt.pillar.get('id') %}

notify-slack:
  salt.function:
    - name: notify.slack
    - tgt: 'xxxxx'
    - arg:
      - "[salt-cloud] DigitalOcean droplet created: {{ host }}"

update-hosts:
  salt.state:
    - tgt: '*'
    - sls: hosts
    - queue: True

update-iptables:
  salt.state:
    - tgt: '*'
    - sls: iptables
    - queue: True

update-dns-server:
  salt.function:
    - name: cmd.run
    - tgt: 'xxxxx'
    - arg:
      - /srv/scripts/add_dns_record.sh {{ host }}

run-highstate:
  salt.state:
    - tgt: {{ host }}
    - highstate: True

{% if salt['match.pcre']("xxxxx", host) %}
update-haproxy:
  salt.state:
    - tgt: 'xxxxx*'
    - sls: haproxy
    - queue: True
{% endif %}

{% if salt['match.pcre']("xxxxx", host) %}
update-usersync-haproxy:
  salt.state:
    - tgt: 'xxxxx*'
    - sls: haproxy
    - queue: True
{% endif %}

{% if salt['match.pcre']("xxxxx", host) %}
update-usersync-apache2-ssl:
  salt.state:
    - tgt: 'xxxxx*'
    - sls: apache2-ssl 
    - queue: True
{% endif %}

Steps to Reproduce Issue

Create a few minions using salt-cloud with the above setup.

Versions Report

Salt Version:
           Salt: 2016.3.4
 
Dependency Versions:
           cffi: 0.8.6
       cherrypy: Not Installed
       dateutil: 1.5
          gitdb: 0.5.4
      gitpython: 0.3.2 RC1
          ioflo: Not Installed
         Jinja2: 2.7.2
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: 0.9.1
   msgpack-pure: Not Installed
 msgpack-python: 0.4.6
   mysql-python: 1.2.3
      pycparser: 2.10
       pycrypto: 2.6.1
         pygit2: Not Installed
         Python: 2.7.6 (default, Jun 22 2015, 17:58:13)
   python-gnupg: Not Installed
         PyYAML: 3.10
          PyZMQ: 14.5.0
           RAET: Not Installed
          smmap: 0.8.2
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.0.5
 
System Versions:
           dist: Ubuntu 14.04 trusty
        machine: x86_64
        release: 3.13.0-68-generic
         system: Linux
        version: Ubuntu 14.04 trusty

The text was updated successfully, but these errors were encountered:

gtmanfred · 2016-11-30T16:32:58Z

@DmitryKuzmenko Can you take a look at this?

I was able to confirm the memory leak with just the salt-api webhook triggering highstates on minions.

DmitryKuzmenko · 2016-12-01T15:03:17Z

I can't say anything without a deep analyze. Now I'm working on another memleak issue. I'll add this one to my ToDo list.
@meggiebot could you please confirm?

meggiebot · 2016-12-01T16:48:45Z

@DmitryKuzmenko yes, let's make this a top priority since it is a blocker for an upcoming release.

johje349 · 2016-12-06T13:11:56Z

Same behaviour on 2016.11.0 (Carbon)

gtmanfred · 2016-12-06T17:37:36Z

@johje349 I am having trouble replicating this to get to over 1G of memory.

Are you sure that your ps command shows the command at 10G of Residential memory? I believe ps returns the RES number back in KiB, which would mean that you are at just over 1GiB of memory in the ps command in your first comment. Also, the percentage looks low, the Mem percentage looks like it is 6.6%, which would mean you have about 15-16G of ram in the server. Are you sure you are seeing this go up to 10x this amount of ram?

I have a docker-compose environment simulating spinning up and down, but it is going to take a while today to get up to 1G.

Thanks,
Daniel

gtmanfred · 2016-12-06T19:39:44Z

@johje349 I think this may be a leak in the RunnerClient.

Would you be able to convert your orchestrate script over to just a reactor sls that runs local.state.apply and local.cmd.run and see if you still see the memory leak?

Thanks
Daniel

johje349 · 2016-12-07T10:03:53Z

@gtmanfred I have 16G RAM in the in the server. Currently 13G - 6G Cached is used.
From top:
21356 root 20 0 1981676 1.184g 3812 S 1.7 7.6 378:23.91 /usr/bin/python

# ps -xauwww | grep 21356 | grep -v grep
root     21356 13.0  7.5 1981676 1241508 ?     Sl   Dec05 378:28 /usr/bin/python /usr/bin/salt-master Reactor

I will test your suggestions.

BR,
/JJ

rickh563 · 2016-12-07T23:09:23Z

ZD-1056

johje349 · 2016-12-08T12:22:28Z

@gtmanfred Currently I have only beacon load events coming in, which triggers a Reactor that runs local.state.sls. These are small events but I can see memory usage is increasing with about 4k per event. It would take quite a while to reach 1G+ with this configuration.

BR,

/JJ

johje349 · 2017-01-24T09:50:31Z

I have now an uptime of 20 days, and the Reactor process has consumed more than 2G of memory:
root 2168 8.2 26.7 3055820 2188964 ? Sl Jan04 2363:14 /usr/bin/python /usr/bin/salt-master Reactor

DmitryKuzmenko · 2017-01-24T19:11:17Z

@johje349 I'm working on this issue. It's actually not as easy.

DmitryKuzmenko · 2017-01-25T20:37:09Z

@johje349 is it possible for you to apply the fix #38951 manually and check would it help or not?

johje349 · 2017-01-26T08:04:53Z

@DmitryKuzmenko change applied, let's wait and see.

johje349 · 2017-02-01T07:41:00Z

The fix does not seem to have improved the situation unfortunately.

DmitryKuzmenko · 2017-02-02T19:18:54Z

@johje349 bad news. =( This means I've reproduced not yours issue. Sorry.
Could you please share the SLS files used by your orchestration SLS?

johje349 · 2017-02-09T13:13:11Z

@DmitryKuzmenko salt-master has been running for 4 days now and the Reactor process consumes 800M, it seems to be pretty stable around that value though.

DmitryKuzmenko · 2017-02-10T09:00:26Z

@johje349 glad to hear it. Could we close this issue then?

johje349 · 2017-02-10T09:13:17Z

@DmitryKuzmenko yes go ahead and close it.

DmitryKuzmenko · 2017-02-10T15:53:53Z

@johje349 thank you!

gtmanfred added this to the Approved milestone Nov 30, 2016

gtmanfred added ZRELEASED - 2016.11.1 ZRELEASED - 2016.3.5 and removed ZRELEASED - 2016.11.1 labels Nov 30, 2016

DmitryKuzmenko self-assigned this Dec 1, 2016

meggiebot modified the milestones: Nitrogen 1, Approved Dec 1, 2016

gtmanfred added the ZRELEASED - 2016.11.2 label Dec 6, 2016

gtmanfred removed ZRELEASED - 2016.11.2 ZRELEASED - 2016.3.5 severity-critical top severity, seen by most users, serious issues labels Dec 6, 2016

gtmanfred modified the milestones: Blocked, Nitrogen 1 Dec 6, 2016

rickh563 added the ZD The issue is related to a Zendesk customer support ticket. label Dec 7, 2016

meggiebot added the ZRELEASED - 2016.3.5 label Dec 7, 2016

meggiebot modified the milestones: Nitrogen 2, Nitrogen 1 Dec 19, 2016

meggiebot added ZRELEASED - 2016.3.6 and removed ZRELEASED - 2016.3.5 labels Jan 17, 2017

meggiebot modified the milestones: Nitrogen 3, Nitrogen 2 Jan 17, 2017

meggiebot added the ZRELEASED - 2016.11.3 label Jan 24, 2017

meggiebot modified the milestone: Nitrogen 3 Jan 24, 2017

DmitryKuzmenko mentioned this issue Jan 25, 2017

Keep the only one record per module-function in depends decorator. #38951

Merged

meggiebot added the fixed-pls-verify fix is linked, bug author to confirm fix label Jan 26, 2017

meggiebot removed the fixed-pls-verify fix is linked, bug author to confirm fix label Feb 2, 2017

meggiebot modified the milestones: Nitrogen 4, Nitrogen 3 Feb 6, 2017

DmitryKuzmenko added the Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged label Feb 7, 2017

DmitryKuzmenko closed this as completed Feb 10, 2017

golmaal mentioned this issue Feb 4, 2019

Memory leak in salt reactor #51493

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak in Reactor #37938

Memory leak in Reactor #37938

johje349 commented Nov 29, 2016

gtmanfred commented Nov 30, 2016

DmitryKuzmenko commented Dec 1, 2016

meggiebot commented Dec 1, 2016

johje349 commented Dec 6, 2016

gtmanfred commented Dec 6, 2016

gtmanfred commented Dec 6, 2016

johje349 commented Dec 7, 2016

rickh563 commented Dec 7, 2016

johje349 commented Dec 8, 2016

johje349 commented Jan 24, 2017

DmitryKuzmenko commented Jan 24, 2017

DmitryKuzmenko commented Jan 25, 2017

johje349 commented Jan 26, 2017 •

edited

johje349 commented Feb 1, 2017

DmitryKuzmenko commented Feb 2, 2017

johje349 commented Feb 9, 2017

DmitryKuzmenko commented Feb 10, 2017

johje349 commented Feb 10, 2017

DmitryKuzmenko commented Feb 10, 2017

Memory leak in Reactor #37938

Memory leak in Reactor #37938

Comments

johje349 commented Nov 29, 2016

Description of Issue/Question

Setup

Steps to Reproduce Issue

Versions Report

gtmanfred commented Nov 30, 2016

DmitryKuzmenko commented Dec 1, 2016

meggiebot commented Dec 1, 2016

johje349 commented Dec 6, 2016

gtmanfred commented Dec 6, 2016

gtmanfred commented Dec 6, 2016

johje349 commented Dec 7, 2016

rickh563 commented Dec 7, 2016

johje349 commented Dec 8, 2016

johje349 commented Jan 24, 2017

DmitryKuzmenko commented Jan 24, 2017

DmitryKuzmenko commented Jan 25, 2017

johje349 commented Jan 26, 2017 • edited

johje349 commented Feb 1, 2017

DmitryKuzmenko commented Feb 2, 2017

johje349 commented Feb 9, 2017

DmitryKuzmenko commented Feb 10, 2017

johje349 commented Feb 10, 2017

DmitryKuzmenko commented Feb 10, 2017

johje349 commented Jan 26, 2017 •

edited