Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak on custom execution module scheduled every minute #32349

Closed
arthurzenika opened this issue Apr 5, 2016 · 12 comments
Closed

Memory leak on custom execution module scheduled every minute #32349

arthurzenika opened this issue Apr 5, 2016 · 12 comments
Labels
Bug broken, incorrect, or confusing behavior Core relates to code central or existential to Salt fixed-pls-verify fix is linked, bug author to confirm fix severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Milestone

Comments

@arthurzenika
Copy link
Contributor

Description of Issue/Question

I get some "Exception [Errno 12] Cannot allocate memory occurred in scheduled job" errors in the log (reported to sentry).

The munin graph below shows some clear patterns of a memory leak.

Am willing to spend some time debugging this if needed (otherwise a @daily crontab restart of salt-minion probably will do the trick...)

Setup

screenshot_2016-04-05_09-31-47

Steps to Reproduce Issue

It has about 10 of schedules such as this one :

      smokeping-for-host.logilab.fr:
        args:
        - host.logilab.fr
        enabled: true
        function: lglb_network.ping_json
        jid_include: true
        kwargs:
          normalize_url: true
        maxrunning: 1
        minutes: 1
        name: smokeping-for-host.logilab.fr
        returner: master
        seconds: 0
        splay: 15

The master returner is the one published here : #12653 (comment)

The custom module is as follows (converted network.ping to a dict/json output) :

import salt.utils

def _normalize_url(url):
    ''' 
    Normalize URL, mainly so that carbon return can put them in single bucket instead of a hierarchy
    '''
    return url.replace('.','_')

def ping_json(host, timeout=False, normalize_url=False):
    '''
    Performs a ping to a host and returns the result as json

    CLI Example:

    .. code-block:: bash

        salt '*' lglb_network.ping_json saltstack.com

    .. versionadded:: 2015.5.0

    Set the time to wait for a response in seconds.

    .. code-block:: bash

        salt '*' lglb_network.ping_json saltstack.com timeout=3
    '''
    if timeout:
        cmd = 'ping -W {0} -c 4 {1}'.format(timeout, salt.utils.network.sanitize_host(host))
    else:
        cmd = 'ping -c 4 {0}'.format(salt.utils.network.sanitize_host(host))
    ret = __salt__['cmd.run_stdout'](cmd)
    try:
        last_line = ret.splitlines()[-1]
    except IndexError:
        # probably unknown host - TODO log warning
        return {}
    labels, results = last_line.split('=')
    labels = labels.replace('rtt ','').strip()
    results = results.replace(' ms','').strip().split('/')
    result_dict = {}
    for index, category in enumerate(labels.split('/')):
        result_dict[category] = results[index]
    key = salt.utils.network.sanitize_host(host)
    if normalize_url:
        key = _normalize_url(key)
    return {key : result_dict}

Versions Report

# salt-call --versions-report
Salt Version:
           Salt: 2015.8.8.2

Dependency Versions:
         Jinja2: 2.7.3
       M2Crypto: Not Installed
           Mako: Not Installed
         PyYAML: 3.11
          PyZMQ: 14.4.0
         Python: 2.7.9 (default, Mar  1 2015, 12:57:24)
           RAET: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.0.5
           cffi: 0.8.6
       cherrypy: Not Installed
       dateutil: 2.2
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
        libgit2: Not Installed
        libnacl: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.2
   mysql-python: 1.2.3
      pycparser: 2.10
       pycrypto: 2.6.1
         pygit2: Not Installed
   python-gnupg: Not Installed
          smmap: Not Installed
        timelib: Not Installed

System Versions:
           dist: debian 8.3 
        machine: x86_64
        release: 3.16.0-4-amd64
         system: debian 8.3 

@jfindlay jfindlay added the info-needed waiting for more info label Apr 5, 2016
@jfindlay jfindlay added this to the Blocked milestone Apr 5, 2016
@jfindlay
Copy link
Contributor

jfindlay commented Apr 5, 2016

@arthurlogilab, thanks for reporting. Does the system you are reporting do anything else that might compound/complicate the leaking memory?

@arthurzenika
Copy link
Contributor Author

@jfindlay yes, I can try to isolate this in a VM, but do you have any tips to debug this on this setup (which has other things going on) ?

@jfindlay
Copy link
Contributor

jfindlay commented Apr 5, 2016

@arthurlogilab, I'm not very experienced with issues like this. You may want to ask @cachedout.

@cachedout
Copy link
Contributor

Likely fixed by #32373

@cachedout cachedout modified the milestones: Approved, Blocked Apr 13, 2016
@cachedout cachedout added Bug broken, incorrect, or confusing behavior severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around fixed-pls-verify fix is linked, bug author to confirm fix Core relates to code central or existential to Salt and removed info-needed waiting for more info labels Apr 13, 2016
@cachedout
Copy link
Contributor

Since I'm reasonably certain that this is fixed, I am going to close this. If it turns out not to be the case, please leave a comment and we'll re-open it. Thanks.

@arthurzenika
Copy link
Contributor Author

Thanks @cachedout

@tserong
Copy link
Contributor

tserong commented Jun 8, 2016

I seem to have hit this memory leak too (with a schedule running a custom execution module every 10 seconds) on salt 2015.8.7. I tried applying only the patch for #32373 to my installed 2015.8.7 minion, but it doesn't seem to have fixed the problem, so there must be something else going on...

@isbm
Copy link
Contributor

isbm commented Jun 8, 2016

@tserong Does it happens also on 2015.8.10?

@tserong
Copy link
Contributor

tserong commented Jun 8, 2016

@tserong Does it happens also on 2015.8.10?

2015.8.10 works fine (no leak AFAICT)

@isbm
Copy link
Contributor

isbm commented Jun 8, 2016

Then only question what PR already fixed that.

@tserong
Copy link
Contributor

tserong commented Jun 9, 2016

Yeah. I'll spend some time with git bisect (although this is probably going to be rather tedious...)

@tserong
Copy link
Contributor

tserong commented Jun 10, 2016

#32474 seems to fix it for me, when applied on top of 2015.8.7. That means the earliest tagged version that's got the fix is v2015.8.9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Core relates to code central or existential to Salt fixed-pls-verify fix is linked, bug author to confirm fix severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Projects
None yet
Development

No branches or pull requests

5 participants