Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Salt Mine memory leak #31454

Closed
johje349 opened this issue Feb 24, 2016 · 23 comments
Closed

Salt Mine memory leak #31454

johje349 opened this issue Feb 24, 2016 · 23 comments
Assignees
Labels
Bug broken, incorrect, or confusing behavior Confirmed Salt engineer has confirmed bug/feature - often including a MCVE Core relates to code central or existential to Salt P2 Priority 2 severity-high 2nd top severity, seen by most users, causes major problems
Milestone

Comments

@johje349
Copy link

I have a test state which looks like this:

{% for host, ip in salt['mine.get']('*', 'network.ip_addrs').iteritems() %} 
test-{{ host }}: 
  cmd.run: 
    - name: "echo {{ host }}" 
{% endfor %}

The master has 36 minions. Each time I run the state, memory usage on the master increases with about 100M.

# free | grep Mem 
Mem: 4048268 2348116 1700152 396 38760 52368

# salt '*' state.apply test

# free | grep Mem
Mem: 4048268 2438388 1609880 396 38996 52816

Running the state enough times the memory gets full and everything stops working.

root@asx-salt:~/asx-salt/salt# salt '*' state.apply test 
Segmentation fault (core dumped)
@johje349
Copy link
Author

Salt version is 2015.8.7 running on Ubuntu 14.04.3.
Number of worker threads is set to 64.

@jfindlay jfindlay added this to the Under Review milestone Feb 24, 2016
@jfindlay jfindlay added the Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged label Feb 24, 2016
@jfindlay
Copy link
Contributor

@johje349, thanks for reporting. I have tried to reproduce this with a single minion without success. I will try more minions later.

jmoney-main ~ master # cat /etc/salt/minion
master: localhost
log_fmt_console: '%(colorlevel)s %(colormsg)s'

mine_functions:
  test.fib:
    - 19
mine_interval: 3
jmoney-main ~ master # cat /srv/salt/test.sls
{% for host, ip in salt['mine.get']('*', 'test.fib').iteritems() %}
test-{{ host }}:
  cmd.run:
    - name: "echo {{ host }}"
{% endfor %}
jmoney-main ~ master # for i in {1..11} ; do salt jmoney-main state.apply test &> /dev/null ; free -m ; done
             total       used       free     shared    buffers     cached
Mem:          2005       1671        333         40        190        999
-/+ buffers/cache:        480       1524
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:          2005       1671        333         40        190        999
-/+ buffers/cache:        481       1523
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:          2005       1672        332         40        190        999
-/+ buffers/cache:        481       1523
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:          2005       1672        332         40        190        999
-/+ buffers/cache:        482       1522
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:          2005       1672        332         40        190       1000
-/+ buffers/cache:        482       1522
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:          2005       1672        332         40        190       1000
-/+ buffers/cache:        482       1522
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:          2005       1673        331         40        190       1000
-/+ buffers/cache:        482       1522
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:          2005       1673        331         40        190       1000
-/+ buffers/cache:        482       1522
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:          2005       1673        331         40        190       1000
-/+ buffers/cache:        482       1522
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:          2005       1672        332         40        190       1000
-/+ buffers/cache:        482       1522
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:          2005       1673        331         40        190       1000
-/+ buffers/cache:        482       1522
Swap:            0          0          0
jmoney-main ~ master # salt --versions
Salt Version:
           Salt: 2015.8.7

Dependency Versions:
         Jinja2: 2.7.3
       M2Crypto: 0.21.1
           Mako: 1.0.0
         PyYAML: 3.11
          PyZMQ: 14.4.0
         Python: 2.7.9 (default, Mar  1 2015, 12:57:24)
           RAET: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.0.5
           cffi: 0.8.6
       cherrypy: Not Installed
       dateutil: Not Installed
          gitdb: 0.5.4
      gitpython: 0.3.2 RC1
          ioflo: Not Installed
        libgit2: Not Installed
        libnacl: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.2
   mysql-python: Not Installed
      pycparser: 2.10
       pycrypto: 2.6.1
         pygit2: Not Installed
   python-gnupg: Not Installed
          smmap: 0.8.2
        timelib: Not Installed

System Versions:
           dist: debian 8.3
        machine: x86_64
        release: 3.16.0-4-amd64
         system: debian 8.3

@jfindlay jfindlay added the cannot-reproduce cannot be replicated with info/context provided label Feb 24, 2016
@jfindlay
Copy link
Contributor

@johje349, I've retried with 20 minions and the master's memory profile increases by about 4 MiB with each state run.

jmoney-main ~ master # salt-key -L
Accepted Keys:
CentOS-01-brain-so9r
CentOS-01-brain-vste
CentOS-01-curly-6wij
CentOS-01-curly-c367
CentOS-01-curly-fh6k
CentOS-01-curly-x0r7
CentOS-01-larry-0o7u
CentOS-01-larry-85rj
CentOS-01-moe-1tfn
CentOS-01-moe-6prr
CentOS-01-moe-jgvl
CentOS-01-pinky-89ui
CentOS-01-pinky-8b01
CentOS-01-pinky-izmh
CentOS-01-pinky-qhg6
jmoney-main
Denied Keys:
Unaccepted Keys:
Rejected Keys:
jmoney-main ~ master # for i in {1..11} ; do salt 'CentOS-*' state.apply test &> /dev/null ; free -m ; done
             total       used       free     shared    buffers     cached
Mem:          2005       1720        284         40        190        990
-/+ buffers/cache:        539       1465
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:          2005       1725        279         40        190        990
-/+ buffers/cache:        544       1460
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:          2005       1728        276         40        190        990
-/+ buffers/cache:        547       1457
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:          2005       1732        272         40        190        991
-/+ buffers/cache:        550       1454
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:          2005       1735        269         40        190        991
-/+ buffers/cache:        554       1450
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:          2005       1739        265         40        190        991
-/+ buffers/cache:        557       1447
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:          2005       1743        261         40        190        991
-/+ buffers/cache:        561       1443
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:          2005       1746        258         40        190        992
-/+ buffers/cache:        564       1440
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:          2005       1750        254         40        190        992
-/+ buffers/cache:        567       1437
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:          2005       1754        250         40        190        992
-/+ buffers/cache:        571       1433
Swap:            0          0          0
             total       used       free     shared    buffers     cached
Mem:          2005       1757        247         40        190        992
-/+ buffers/cache:        573       1431
Swap:            0          0          0
jmoney-main ~ master # salt 'CentOS-*' test.version
CentOS-01-pinky-89ui:
    2015.8.7
CentOS-01-brain-vste:
    2015.8.7
CentOS-01-moe-1tfn:
    2015.8.7
CentOS-01-curly-c367:
    2015.8.7
CentOS-01-larry-85rj:
    2015.8.7
CentOS-01-pinky-qhg6:
    2015.8.7
CentOS-01-curly-x0r7:
    2015.8.7
CentOS-01-moe-6prr:
    2015.8.7
CentOS-01-larry-0o7u:
    2015.8.7
CentOS-01-brain-so9r:
    2015.8.7
CentOS-01-curly-6wij:
    2015.8.7
CentOS-01-pinky-8b01:
    2015.8.7
CentOS-01-pinky-izmh:
    2015.8.7
CentOS-01-moe-jgvl:
    2015.8.7
CentOS-01-curly-fh6k:
    2015.8.7

@jfindlay jfindlay removed the cannot-reproduce cannot be replicated with info/context provided label Feb 24, 2016
@jfindlay jfindlay modified the milestones: Approved, Under Review Feb 24, 2016
@jfindlay jfindlay added Bug broken, incorrect, or confusing behavior Core relates to code central or existential to Salt Confirmed Salt engineer has confirmed bug/feature - often including a MCVE severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around P2 Priority 2 and removed Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged labels Feb 24, 2016
@jfindlay
Copy link
Contributor

Additionally, the memory profile remains constant several minutes later.

jmoney-main ~ master # free -m
             total       used       free     shared    buffers     cached
Mem:          2005       1798        206         40        190        996
-/+ buffers/cache:        611       1393
Swap:            0          0          0

If I stop the master, the memory use drops:

jmoney-main ~ master # free -m
             total       used       free     shared    buffers     cached
Mem:          2005       1454        550         40        190        996
-/+ buffers/cache:        267       1737
Swap:            0          0          0

@jfindlay jfindlay added severity-high 2nd top severity, seen by most users, causes major problems and removed severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around labels Feb 24, 2016
@johje349
Copy link
Author

How many worker threads did you use on the master?

@jfindlay
Copy link
Contributor

@johje349, I ran the experiment with the default configs, so 5.

@DmitryKuzmenko
Copy link
Contributor

@johje349 what transport do you use? Zeromq or tcp?

@johje349
Copy link
Author

johje349 commented Mar 3, 2016

Zeromq

@DmitryKuzmenko
Copy link
Contributor

@johje349 thnx

@meggiebot meggiebot modified the milestones: B 0, B 1, B -1 Mar 10, 2016
@meggiebot meggiebot removed the fixed-pls-verify fix is linked, bug author to confirm fix label Jun 20, 2016
@DmitryKuzmenko
Copy link
Contributor

@johje349 I've run a test overnight with debug python under valgrind massif tool. My results: salt-master has grown by 35MB after 12 hours that looks more like not leak and just valgrind data and python cache. I've analyzed massif output and found nothing critical.
I'll run the test again on non-debug python build today to be sure.

@johje349 could you please provide additional information about which of salt-master processes eats the memory:

  1. Ensure you've setproctitle installed:
$ python -c "import setproctitle" || echo pip install setproctitle
  1. Restart salt-master and ensure each master subprocess command line is appended with the subprocess title:
$ ps a | grep salt-master
 7079 pts/2    S+     0:00 python ./scripts/salt-master ProcessManager
 7087 pts/2    S+     0:00 python ./scripts/salt-master MultiprocessingLoggingQueue
 7090 pts/2    Sl+    0:00 python ./scripts/salt-master ZeroMQPubServerChannel
 7091 pts/2    S+     0:00 python ./scripts/salt-master EventPublisher
 7094 pts/2    Sl+    0:01 python ./scripts/salt-master Reactor
 7095 pts/2    S+     0:00 python ./scripts/salt-master Maintenance
 7096 pts/2    S+     0:00 python ./scripts/salt-master ReqServer_ProcessManager
 7097 pts/2    Sl+    0:00 python ./scripts/salt-master MWorkerQueue
 7099 pts/2    Sl+    0:00 python ./scripts/salt-master MWorker-0
  1. Ensure all minions are connected:
salt --summary '*' test.ping
  1. Save the memory consumption at this moment:
ps aux | grep salt-master > ps.start
  1. Do the test
  2. Save the memory consumption at the end:
ps aux | grep salt-master > ps.end

Additionally please provide the count of executed state.apply during the test.

@DmitryKuzmenko DmitryKuzmenko added the Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged label Jun 23, 2016
@DmitryKuzmenko DmitryKuzmenko modified the milestones: C 7, C 8 Jun 27, 2016
@meggiebot meggiebot modified the milestones: C 6, C 7 Jul 5, 2016
@meggiebot meggiebot modified the milestones: Blocked, C 6 Jul 27, 2016
@meggiebot meggiebot added info-needed waiting for more info and removed Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged labels Aug 8, 2016
@meggiebot
Copy link

@johje349 we'd like to make additional improvements here if you can provide the information requested by Dmitry above.

@johje349
Copy link
Author

Requested information provided below. I now have 66 minions and the master is configured with 24 worker threads. I do 10 state.apply during the test.
My test state:

root@asx-salt:~# cat test/init.sls 
{% for host, ip in salt['mine.get']('*', 'network.ip_addrs').iteritems() %} 
test-{{ host }}: 
  cmd.run: 
    - name: "echo {{ host }}" 
{% endear %}

Execution:

root@asx-salt:~# salt --summary '*' test.ping

-------------------------------------------
Summary
-------------------------------------------
# of minions targeted: 66
# of minions returned: 66
# of minions that did not return: 0
# of minions with errors: 0
-------------------------------------------

root@asx-salt:~# service salt-master restart

root@asx-salt:~# free
             total       used       free     shared    buffers     cached
Mem:       4048268    2206308    1841960        432      85820     729712
-/+ buffers/cache:    1390776    2657492
Swap:            0          0          0

root@asx-salt:~# ps aux | grep salt-master > ps.start

root@asx-salt:~# for i in {1..11}; do salt '*' state.apply --summary test; done

root@asx-salt:~# free
             total       used       free     shared    buffers     cached
Mem:       4048268    2636220    1412048        432      90188     745744
-/+ buffers/cache:    1800288    2247980
Swap:            0          0          0

root@asx-salt:~# ps aux | grep salt-master > ps.end

root@asx-salt:~# cat ps.start
root     18054  0.6  0.9 239072 36604 ?        Ss   15:32   0:00 /usr/bin/python /usr/bin/salt-master ProcessManager
root     18064  0.0  0.4 147556 18064 ?        S    15:32   0:00 /usr/bin/python /usr/bin/salt-master MultiprocessingLoggingQueue
root     18065  0.0  0.7 316824 31912 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master ZeroMQPubServerChannel
root     18068  0.0  0.7 234900 29960 ?        S    15:32   0:00 /usr/bin/python /usr/bin/salt-master EventPublisher
root     18069  0.5  0.8 241504 35108 ?        S    15:32   0:00 /usr/bin/python /usr/bin/salt-master Maintenance
root     18124  0.6  0.8 976832 34060 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master Reactor
root     18125  0.0  0.7 239072 31444 ?        S    15:32   0:00 /usr/bin/python /usr/bin/salt-master ReqServer_ProcessManager
root     18126  0.0  0.8 1427012 32404 ?       Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master MWorkerQueue
root     18161  0.6  0.9 324352 36512 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master MWorker-0
root     18162  0.7  0.9 324356 36520 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master MWorker-1
root     18164  0.6  0.9 324336 36508 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master MWorker-2
root     18165  0.7  0.9 324356 36528 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master MWorker-3
root     18166  0.7  0.9 324360 36540 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master MWorker-4
root     18167  0.7  0.9 324360 36544 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master MWorker-5
root     18168  0.6  0.9 324364 36548 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master MWorker-6
root     18169  0.7  0.9 324368 36552 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master MWorker-7
root     18170  0.6  0.9 324364 36548 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master MWorker-8
root     18171  0.7  0.9 324368 36556 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master MWorker-9
root     18172  0.7  0.9 324368 36560 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master MWorker-10
root     18173  0.6  0.9 324368 36560 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master MWorker-11
root     18174  0.6  0.9 324368 36568 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master MWorker-12
root     18175  0.6  0.9 324328 36528 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master MWorker-13
root     18176  0.6  0.9 324368 36564 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master MWorker-14
root     18178  0.6  0.9 324356 36556 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master MWorker-15
root     18179  0.6  0.9 324368 36572 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master MWorker-16
root     18182  0.7  0.9 324368 36572 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master MWorker-17
root     18183  0.6  0.9 324368 36580 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master MWorker-18
root     18185  0.6  0.9 324368 36584 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master MWorker-19
root     18188  0.6  0.9 324368 36584 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master MWorker-20
root     18189  0.6  0.9 324372 36592 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master MWorker-21
root     18190  0.6  0.9 324372 36592 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master MWorker-22
root     18193  0.6  0.9 324376 36592 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master MWorker-23

root@asx-salt:~# cat ps.end
root     18054  0.2  0.9 239072 36604 ?        Ss   15:32   0:00 /usr/bin/python /usr/bin/salt-master ProcessManager
root     18064  0.0  0.4 147556 18064 ?        S    15:32   0:00 /usr/bin/python /usr/bin/salt-master MultiprocessingLoggingQueue
root     18065  0.0  0.7 316824 32100 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master ZeroMQPubServerChannel
root     18068  0.1  1.1 298516 48088 ?        S    15:32   0:00 /usr/bin/python /usr/bin/salt-master EventPublisher
root     18069  0.4  0.8 242100 35892 ?        S    15:32   0:01 /usr/bin/python /usr/bin/salt-master Maintenance
root     18124  1.0  0.8 978388 36428 ?        Sl   15:32   0:03 /usr/bin/python /usr/bin/salt-master Reactor
root     18125  0.0  0.7 239072 31444 ?        S    15:32   0:00 /usr/bin/python /usr/bin/salt-master ReqServer_ProcessManager
root     18126  0.6  0.8 1427012 35920 ?       Sl   15:32   0:02 /usr/bin/python /usr/bin/salt-master MWorkerQueue
root     18161  0.8  1.0 479356 41936 ?        Sl   15:32   0:03 /usr/bin/python /usr/bin/salt-master MWorker-0
root     18162  1.0  1.0 479580 44048 ?        Sl   15:32   0:03 /usr/bin/python /usr/bin/salt-master MWorker-1
root     18164  0.8  1.0 330632 42756 ?        Sl   15:32   0:03 /usr/bin/python /usr/bin/salt-master MWorker-2
root     18165  0.8  1.0 330812 43076 ?        Sl   15:32   0:03 /usr/bin/python /usr/bin/salt-master MWorker-3
root     18166  0.9  1.0 330888 43072 ?        Sl   15:32   0:03 /usr/bin/python /usr/bin/salt-master MWorker-4
root     18167  0.9  1.0 330980 43128 ?        Sl   15:32   0:03 /usr/bin/python /usr/bin/salt-master MWorker-5
root     18168  0.9  1.0 330740 42784 ?        Sl   15:32   0:03 /usr/bin/python /usr/bin/salt-master MWorker-6
root     18169  0.9  1.0 331132 43448 ?        Sl   15:32   0:03 /usr/bin/python /usr/bin/salt-master MWorker-7
root     18170  0.8  1.1 478444 46576 ?        Sl   15:32   0:03 /usr/bin/python /usr/bin/salt-master MWorker-8
root     18171  0.8  1.0 331184 43244 ?        Sl   15:32   0:03 /usr/bin/python /usr/bin/salt-master MWorker-9
root     18172  1.0  1.0 478936 42704 ?        Sl   15:32   0:03 /usr/bin/python /usr/bin/salt-master MWorker-10
root     18173  1.1  1.0 331464 42632 ?        Sl   15:32   0:04 /usr/bin/python /usr/bin/salt-master MWorker-11
root     18174  1.1  1.0 331196 42496 ?        Sl   15:32   0:04 /usr/bin/python /usr/bin/salt-master MWorker-12
root     18175  0.8  1.0 330816 42108 ?        Sl   15:32   0:03 /usr/bin/python /usr/bin/salt-master MWorker-13
root     18176  1.0  1.0 479008 42964 ?        Sl   15:32   0:03 /usr/bin/python /usr/bin/salt-master MWorker-14
root     18178  0.9  1.0 330948 42124 ?        Sl   15:32   0:03 /usr/bin/python /usr/bin/salt-master MWorker-15
root     18179  0.9  1.0 330888 42388 ?        Sl   15:32   0:03 /usr/bin/python /usr/bin/salt-master MWorker-16
root     18182  0.8  1.0 479072 42144 ?        Sl   15:32   0:03 /usr/bin/python /usr/bin/salt-master MWorker-17
root     18183  1.1  1.0 479312 43208 ?        Sl   15:32   0:04 /usr/bin/python /usr/bin/salt-master MWorker-18
root     18185  0.9  1.0 331116 42160 ?        Sl   15:32   0:03 /usr/bin/python /usr/bin/salt-master MWorker-19
root     18188  0.8  1.0 330796 41808 ?        Sl   15:32   0:02 /usr/bin/python /usr/bin/salt-master MWorker-20
root     18189  0.8  1.0 331016 42268 ?        Sl   15:32   0:03 /usr/bin/python /usr/bin/salt-master MWorker-21
root     18190  1.0  1.0 331152 42476 ?        Sl   15:32   0:03 /usr/bin/python /usr/bin/salt-master MWorker-22
root     18193  0.7  1.0 479340 42196 ?        Sl   15:32   0:02 /usr/bin/python /usr/bin/salt-master MWorker-23

@johje349
Copy link
Author

The memory consumption seems pretty stable during subsequent runs though. A clear improvement since the issue was reported. The version of master and minions is 2016.3.2.
I ran another 50 state.apply:

root@asx-salt:~# free
             total       used       free     shared    buffers     cached
Mem:       4048268    2920048    1128220        432     107252     760480
-/+ buffers/cache:    2052316    1995952
Swap:            0          0          0

root@asx-salt:~# ps aux | grep salt-master > ps.end
root@asx-salt:~# cat ps.end 
root     18054  0.0  0.9 239072 36604 ?        Ss   15:32   0:00 /usr/bin/python /usr/bin/salt-master ProcessManager
root     18064  0.0  0.4 147556 18064 ?        S    15:32   0:00 /usr/bin/python /usr/bin/salt-master MultiprocessingLoggingQueue
root     18065  0.0  0.8 316824 33004 ?        Sl   15:32   0:00 /usr/bin/python /usr/bin/salt-master ZeroMQPubServerChannel
root     18068  0.1  2.3 314148 93384 ?        S    15:32   0:03 /usr/bin/python /usr/bin/salt-master EventPublisher
root     18069  0.3  0.9 246684 40448 ?        S    15:32   0:08 /usr/bin/python /usr/bin/salt-master Maintenance
root     18124  1.0  0.9 978388 36576 ?        Sl   15:32   0:21 /usr/bin/python /usr/bin/salt-master Reactor
root     18125  0.0  0.7 239072 31444 ?        S    15:32   0:00 /usr/bin/python /usr/bin/salt-master ReqServer_ProcessManager
root     18126  0.6  0.9 1427012 36552 ?       Sl   15:32   0:14 /usr/bin/python /usr/bin/salt-master MWorkerQueue
root     18161  0.7  1.2 483564 50484 ?        Sl   15:32   0:14 /usr/bin/python /usr/bin/salt-master MWorker-0
root     18162  0.8  1.2 482828 49692 ?        Sl   15:32   0:17 /usr/bin/python /usr/bin/salt-master MWorker-1
root     18164  0.7  1.2 484888 48632 ?        Sl   15:32   0:15 /usr/bin/python /usr/bin/salt-master MWorker-2
root     18165  0.7  1.2 484840 49764 ?        Sl   15:32   0:15 /usr/bin/python /usr/bin/salt-master MWorker-3
root     18166  0.7  1.2 485284 49484 ?        Sl   15:32   0:16 /usr/bin/python /usr/bin/salt-master MWorker-4
root     18167  0.7  1.1 482188 45960 ?        Sl   15:32   0:15 /usr/bin/python /usr/bin/salt-master MWorker-5
root     18168  0.7  1.1 482928 46976 ?        Sl   15:32   0:15 /usr/bin/python /usr/bin/salt-master MWorker-6
root     18169  0.7  1.2 485580 50664 ?        Sl   15:32   0:14 /usr/bin/python /usr/bin/salt-master MWorker-7
root     18170  0.7  1.2 483132 51196 ?        Sl   15:32   0:14 /usr/bin/python /usr/bin/salt-master MWorker-8
root     18171  0.7  1.1 482964 47428 ?        Sl   15:32   0:15 /usr/bin/python /usr/bin/salt-master MWorker-9
root     18172  0.7  1.3 485496 53384 ?        Sl   15:32   0:16 /usr/bin/python /usr/bin/salt-master MWorker-10
root     18173  0.8  1.1 483892 47768 ?        Sl   15:32   0:16 /usr/bin/python /usr/bin/salt-master MWorker-11
root     18174  0.7  1.1 483504 47464 ?        Sl   15:32   0:16 /usr/bin/python /usr/bin/salt-master MWorker-12
root     18175  0.8  1.2 485040 49700 ?        Sl   15:32   0:16 /usr/bin/python /usr/bin/salt-master MWorker-13
root     18176  0.7  1.3 484348 52672 ?        Sl   15:32   0:15 /usr/bin/python /usr/bin/salt-master MWorker-14
root     18178  0.7  1.1 483004 46672 ?        Sl   15:32   0:15 /usr/bin/python /usr/bin/salt-master MWorker-15
root     18179  0.7  1.1 483140 48044 ?        Sl   15:32   0:16 /usr/bin/python /usr/bin/salt-master MWorker-16
root     18182  0.7  1.3 486424 53292 ?        Sl   15:32   0:15 /usr/bin/python /usr/bin/salt-master MWorker-17
root     18183  0.8  1.2 483904 52168 ?        Sl   15:32   0:16 /usr/bin/python /usr/bin/salt-master MWorker-18
root     18185  0.7  1.1 483892 47992 ?        Sl   15:32   0:15 /usr/bin/python /usr/bin/salt-master MWorker-19
root     18188  0.7  1.2 481944 50064 ?        Sl   15:32   0:15 /usr/bin/python /usr/bin/salt-master MWorker-20
root     18189  0.7  1.1 484268 48260 ?        Sl   15:32   0:14 /usr/bin/python /usr/bin/salt-master MWorker-21
root     18190  0.8  1.1 483332 48232 ?        Sl   15:32   0:17 /usr/bin/python /usr/bin/salt-master MWorker-22
root     18193  0.7  1.2 483096 51100 ?        Sl   15:32   0:14 /usr/bin/python /usr/bin/salt-master MWorker-23

@meggiebot meggiebot modified the milestones: C 5, Blocked Aug 10, 2016
@DmitryKuzmenko
Copy link
Contributor

@johje349 thank you, this makes sense. I'll continue work on this as soon as possible.

@DmitryKuzmenko
Copy link
Contributor

Hopefully it's done: at least in my tests the memory don't leak anymore.

@meggiebot
Copy link

Awesome

@meggiebot meggiebot added the fixed-pls-verify fix is linked, bug author to confirm fix label Sep 6, 2016
@DmitryKuzmenko DmitryKuzmenko removed the fixed-pls-verify fix is linked, bug author to confirm fix label Sep 9, 2016
skizunov added a commit to skizunov/salt that referenced this issue Sep 30, 2016
It has been observed that when running this command:

```
salt "*" test.ping
```

sometimes the command would return `Minion did not return. [No response]`
for some of the minions even though the minions did indeed respond
(reproduced running Windows salt-master on Python 3 using the TCP
transport).

After investigating this further, it seems that there is a race condition
where if the response via event happens before events are being listened
for, the response is lost. For instance, in
`salt.client.LocalClient.cmd_cli` which is what is invoked in the command
above, it won't start listening for events until `get_cli_event_returns`
which invokes `get_iter_returns` which invokes `get_returns_no_block`
which invokes `self.event.get_event` which will connect to the event bus
if it hasn't connected yet (which is the case the first time it hits
this code). But events may be fired anytime after `self.pub()` is
executed which occurs before this code.

We need to ensure that events are being listened for before it is
possible they return. We also want to avoid issue saltstack#31454 which is what
PR saltstack#36024 fixed but in turn caused this issue. This is the approach I
have taken to try to tackle this issue:

It doesn't seem possible to generically discern if events can be
returned by a given function that invokes `run_job` and contains an
event searching function such as `get_cli_event_returns`. So for all
such functions that could possibly need to search the event bus, we
do the following:
- Record if the event bus is currently being listened to.
- When invoking `run_job`, ensure that `listen=True` so that `self.pub()`
will ensure that the event bus is listed to before sending the payload.
- When all possible event bus activities are concluded, if the event
bus was not originally being listened to, stop listening to it. This is
designed so that issue saltstack#31454 does not reappear. We do this via a
try/finally block in all instances of such code.

Signed-off-by: Sergey Kizunov <sergey.kizunov@ni.com>
DmitryKuzmenko pushed a commit to DSRCorporation/salt that referenced this issue Oct 20, 2016
It has been observed that when running this command:

```
salt "*" test.ping
```

sometimes the command would return `Minion did not return. [No response]`
for some of the minions even though the minions did indeed respond
(reproduced running Windows salt-master on Python 3 using the TCP
transport).

After investigating this further, it seems that there is a race condition
where if the response via event happens before events are being listened
for, the response is lost. For instance, in
`salt.client.LocalClient.cmd_cli` which is what is invoked in the command
above, it won't start listening for events until `get_cli_event_returns`
which invokes `get_iter_returns` which invokes `get_returns_no_block`
which invokes `self.event.get_event` which will connect to the event bus
if it hasn't connected yet (which is the case the first time it hits
this code). But events may be fired anytime after `self.pub()` is
executed which occurs before this code.

We need to ensure that events are being listened for before it is
possible they return. We also want to avoid issue saltstack#31454 which is what
PR saltstack#36024 fixed but in turn caused this issue. This is the approach I
have taken to try to tackle this issue:

It doesn't seem possible to generically discern if events can be
returned by a given function that invokes `run_job` and contains an
event searching function such as `get_cli_event_returns`. So for all
such functions that could possibly need to search the event bus, we
do the following:
- Record if the event bus is currently being listened to.
- When invoking `run_job`, ensure that `listen=True` so that `self.pub()`
will ensure that the event bus is listed to before sending the payload.
- When all possible event bus activities are concluded, if the event
bus was not originally being listened to, stop listening to it. This is
designed so that issue saltstack#31454 does not reappear. We do this via a
try/finally block in all instances of such code.

Signed-off-by: Sergey Kizunov <sergey.kizunov@ni.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Confirmed Salt engineer has confirmed bug/feature - often including a MCVE Core relates to code central or existential to Salt P2 Priority 2 severity-high 2nd top severity, seen by most users, causes major problems
Projects
None yet
Development

No branches or pull requests

6 participants