Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Salt 2016.3.0 (Boron) clean_old_jobs fails #33544

Closed
tjuup opened this issue May 26, 2016 · 8 comments
Closed

Salt 2016.3.0 (Boron) clean_old_jobs fails #33544

tjuup opened this issue May 26, 2016 · 8 comments
Assignees
Labels
Bug broken, incorrect, or confusing behavior Core relates to code central or existential to Salt fixed-pls-verify fix is linked, bug author to confirm fix P1 Priority 1 severity-critical top severity, seen by most users, serious issues severity-high 2nd top severity, seen by most users, causes major problems ZRELEASED - Boron

Comments

@tjuup
Copy link

tjuup commented May 26, 2016

Description of Issue/Question

Master log fills with these errors:

016-05-26 12:41:44,647 [salt.utils.process][ERROR   ][2706] An un-handled exception from the multiprocessing process 'Maintenance-11' was caught:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/salt/utils/process.py", line 613, in _run
    return self._original_run()
  File "/usr/lib/python2.7/dist-packages/salt/master.py", line 236, in run
    salt.daemons.masterapi.clean_old_jobs(self.opts)
  File "/usr/lib/python2.7/dist-packages/salt/daemons/masterapi.py", line 187, in clean_old_jobs
    mminion.returners[fstr]()
  File "/usr/lib/python2.7/dist-packages/salt/returners/local_cache.py", line 413, in clean_old_jobs
    shutil.rmtree(t_path)
  File "/usr/lib/python2.7/shutil.py", line 239, in rmtree
    onerror(os.listdir, path, sys.exc_info())
  File "/usr/lib/python2.7/shutil.py", line 237, in rmtree
    names = os.listdir(path)
OSError: [Errno 2] No such file or directory: '/var/cache/salt/master/jobs/a9'
2016-05-26 12:41:44,652 [salt.utils.process][INFO    ][31591] Process <class 'salt.master.Maintenance'> (2706) died with exit status None, restarting...

Setup

master config:

  68 # Set the number of hours to keep old job information in the job cache:
  69   keep_jobs: 6

Steps to Reproduce Issue

update to 2016.3.0 Since then salt.master.Maintenance tries to cleanup files/dirs that are not in /var/cache/salt/master/jobs/

Versions Report

root@salt:~# salt --versions-report
Salt Version:
Salt: 2016.3.0

Dependency Versions:
cffi: Not Installed
cherrypy: 3.2.2
dateutil: 1.5
gitdb: 0.5.4
gitpython: 0.3.2 RC1
ioflo: Not Installed
Jinja2: 2.7.2
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: Not Installed
Mako: 0.9.1
msgpack-pure: Not Installed
msgpack-python: 0.3.0
mysql-python: 1.2.3
pycparser: Not Installed
pycrypto: 2.6.1
pygit2: Not Installed
Python: 2.7.6 (default, Jun 22 2015, 17:58:13)
python-gnupg: Not Installed
PyYAML: 3.10
PyZMQ: 14.4.0
RAET: Not Installed
smmap: 0.8.2
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.0.4

System Versions:
dist: Ubuntu 14.04 trusty
machine: x86_64
release: 3.13.0-83-generic
system: Linux
version: Ubuntu 14.04 trusty

@shawnbutts
Copy link

I'm seeing the same thing.

@cachedout
Copy link
Contributor

That's REALLY odd. You don't somehow have more than one Maintenance process running, do you?

@tjuup
Copy link
Author

tjuup commented May 26, 2016

Good guess, i stopped the salt-master service and checked that no other salt-master process was lingering.
Then i did an ls -l of /var/cache/salt/master/jobs and I see

<snip>
drwxr-xr-x 10 root root 4096 May 26 17:37 33
drwxr-xr-x  3 root root 4096 May 26 15:37 34
drwxr-xr-x 15 root root 4096 May 26 17:27 35
drwxr-xr-x  6 root root 4096 May 26 15:37 36

After that in the log the first OSError

2016-05-26 18:28:28,862 [salt.utils.process][ERROR   ][24373] An un-handled exception from the multiprocessing process 'Maintenance-4' was caught:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/salt/utils/process.py", line 613, in _run
    return self._original_run()
  File "/usr/lib/python2.7/dist-packages/salt/master.py", line 236, in run
    salt.daemons.masterapi.clean_old_jobs(self.opts)
  File "/usr/lib/python2.7/dist-packages/salt/daemons/masterapi.py", line 187, in clean_old_jobs
    mminion.returners[fstr]()
  File "/usr/lib/python2.7/dist-packages/salt/returners/local_cache.py", line 413, in clean_old_jobs
    shutil.rmtree(t_path)
  File "/usr/lib/python2.7/shutil.py", line 239, in rmtree
    onerror(os.listdir, path, sys.exc_info())
  File "/usr/lib/python2.7/shutil.py", line 237, in rmtree
    names = os.listdir(path)
OSError: [Errno 2] No such file or directory: '/var/cache/salt/master/jobs/35'
2016-05-26 18:28:28,867 [salt.utils.process][INFO    ][24360] Process <class 'salt.master.Maintenance'> (24373) died with exit status None, restarting...

Looks like 2 processes trying to do the same cleanup?

cachedout pushed a commit to cachedout/salt that referenced this issue May 26, 2016
The first time through the loop we deleted the dir and then stack
traced the second time through the loop if we hit the other conditional.

Resolves saltstack#33544
@cachedout
Copy link
Contributor

Please see #33555 for a patch that resolves this issue.

@cachedout cachedout added Bug broken, incorrect, or confusing behavior fixed-pls-verify fix is linked, bug author to confirm fix Core relates to code central or existential to Salt severity-high 2nd top severity, seen by most users, causes major problems P1 Priority 1 labels May 26, 2016
@tjuup
Copy link
Author

tjuup commented May 26, 2016

After applying the patch and restarting the salt-master the error did not show up anymore.
Thanks for your quick fix!

@cachedout
Copy link
Contributor

@tjuup You're welcome. Apologies for the breakage. :]

@rvora
Copy link

rvora commented Jun 3, 2016

Thanks that patch seems to have fixed this issue.

@hgfischer
Copy link
Contributor

@cachedout It seems this patch is not in the latest deb release, because I'm still having this issues. When will it be released?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Core relates to code central or existential to Salt fixed-pls-verify fix is linked, bug author to confirm fix P1 Priority 1 severity-critical top severity, seen by most users, serious issues severity-high 2nd top severity, seen by most users, causes major problems ZRELEASED - Boron
Projects
None yet
Development

No branches or pull requests

7 participants