Non-root Users Unable to Review Job Status #55275

jpittiglio · 2019-11-12T20:30:46Z

Description of Issue

Followed instructions to setup non-root users with the ability to run jobs as specified at https://docs.saltstack.com/en/latest/ref/publisheracl.html

Running jobs as non-root user completes as expected:

[ec2-user@salt ~]$ salt 'salt' test.ping
salt:
    True

Similarly, running jobs using the --async flag works as expected:

[ec2-user@salt ~]$ salt 'salt' test.ping --async

Executed command with job ID: 20191112202301098417

However, attempting to view previous jobs results using salt-run jobs.lookup_jid <x> or the salt.client.LocalClient.get_cli_returns function fails. Example:

[ec2-user@salt ~]$ salt-run jobs.lookup_jid 20191112193054933477
Exception occurred in runner jobs.lookup_jid: Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/salt/client/mixins.py", line 381, in low
    data['return'] = func(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/salt/runners/jobs.py", line 128, in lookup_jid
    display_progress=display_progress
  File "/usr/lib/python3.7/site-packages/salt/runners/jobs.py", line 200, in list_job
    ret['Result'] = mminion.returners['{0}.get_jid'.format(returner)](jid)
  File "/usr/lib/python3.7/site-packages/salt/returners/local_cache.py", line 357, in get_jid
    with salt.utils.files.fopen(retp, 'rb') as rfh:
  File "/usr/lib/python3.7/site-packages/salt/utils/files.py", line 399, in fopen
    f_handle = open(*args, **kwargs)  # pylint: disable=resource-leakage
PermissionError: [Errno 13] Permission denied: '/var/cache/salt/master/jobs/00/f18031815ef2f13a28096fabced02cd5ea815a672b5a50ac58bf8730d097dd/salt/return.p'

Per the linked documentation, reviewing the root level directory shows expected permissions:

[ec2-user@salt ~]$ ll /var/cache/salt/master/jobs/00/f18031815ef2f13a28096fabced02cd5ea815a672b5a50ac58bf8730d097dd/
total 4
-rw-r--r-- 1 root root 20 Nov 12 19:30 jid
drwxr-xr-x 2 root root 22 Nov 12 19:30 salt

However, the return.p file shows it is read+write by root only:

[ec2-user@salt ~]$ ll /var/cache/salt/master/jobs/00/f18031815ef2f13a28096fabced02cd5ea815a672b5a50ac58bf8730d097dd/salt/
total 4
-rw------- 1 root root 27 Nov 12 19:30 return.p

Setup

Standard RPM installation on an AWS EC2 instance running Amazon Linux 2. Configured to allow ec2-user to run all states on all nodes as following in /etc/salt/master:

publisher_acl:
  ec2-user:
    - .*

Executed chmod 755 /var/cache/salt /var/cache/salt/master /var/cache/salt/master/jobs /var/run/salt /var/run/salt/master as indicated in linked documentation.

Possible Solution

Issue seems to stem from the following:

salt/salt/utils/atomicfile.py

Line 132 in 01b9405

atomic_rename(self._tmp_filename, self._filename)

Ultimately, the return.p file is created as a new temporary file, which I assume is given the permissions read+write to root only. Once the temporary file is written and the context handler completes what it needs with the file, the close function is invoked and the temporary file is moved (os.rename on *nix) to the correct job cache location. Since it's moved, the original permissions are retained.

As a temporary workaround, I modified atomicfile.py as follows:

salt/salt/utils/atomicfile.py

Line 101 in 01b9405

atomic_rename = os.rename # pylint: disable=C0103

Instead of os.rename, I use shutil.copyfile

salt/salt/utils/atomicfile.py

Line 132 in 01b9405

atomic_rename(self._tmp_filename, self._filename)

After the rename occurs, I call os.remove(self._tmp_filename)

I have not fully tested this to identify long-term ramifications, but wanted to highlight a possible fix for others in a similar situation. While using an external job cache would likely be a better long-term solution, the documentation implies this should be possible.

Additionally, in some circumstances, there appear to be other non-critical issues - for example, in some scenarios, the following occurs when querying the job ID even after the fixes identified above:

[ec2-user@salt ~]$ salt-run jobs.lookup_jid 20191112202005349718
[WARNING ] Could not write out jid file for job 20191112202020433611. Retrying.
[WARNING ] Could not write out jid file for job 20191112202020433611. Retrying.
[WARNING ] Could not write out jid file for job 20191112202020433611. Retrying.
[WARNING ] Could not write out jid file for job 20191112202020433611. Retrying.
[WARNING ] Could not write out jid file for job 20191112202020433611. Retrying.
[ERROR   ] prep_jid could not store a jid after 5 tries.
[ERROR   ] Could not store job cache info. Job details for this run may be unavailable.
salt:
    True

Note the information is still returned, but it appears a new job is trying to be created but cannot be. Likely unrelated and probably needs a separate issue, but wanted to document here.

Versions Report

Salt Version:
           Salt: 2019.2.2
 
Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 2.8.0
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.10.3
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.6.2
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 3.7.4 (default, Oct  2 2019, 19:30:55)
   python-gnupg: Not Installed
         PyYAML: 4.2
          PyZMQ: 18.1.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.5.3
            ZMQ: 4.3.2
 
System Versions:
           dist:   
         locale: UTF-8
        machine: x86_64
        release: 4.14.138-114.102.amzn2.x86_64
         system: Linux
        version: Not Installed

The text was updated successfully, but these errors were encountered:

eliasp · 2019-12-02T12:33:21Z

This is a problem we're seeing in several environments here as well and it shows a general architectural issue of the SaltStack CLI tooling IMHO.
The CLI tools are a hybrid of a local process and a remote job execution, so issues like these show up over and over again in various places (e.g. handling of keys through salt-key etc.).
IMHO, the whole CLI tooling should continuously move towards the goal of not doing any local execution at all, but to merely interact with SaltStack through the master's interface to handle jobs, which are then again completely executed by the Master - the CLI should only be a thin wrapper around all this.

Ch3LL · 2019-12-18T16:17:14Z

im able to replicate this when the salt-master/salt-minion processes started up via root. When i start them up via the same user it does work, but we want it to work while running salt via root. will need to get this fixed up.

stale · 2020-01-19T00:34:45Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

sagetherage · 2020-01-22T19:20:06Z

not stale

stale · 2020-01-22T19:20:08Z

Thank you for updating this issue. It is no longer marked as stale.

petiepooo · 2020-10-30T19:10:05Z

Still not stale.

sagetherage · 2020-11-03T00:53:37Z

@petiepooo no more stalebot -- this is open and will remain so

waynew added this to Needs triage in [Test] Triage Dec 12, 2019

sagetherage added the needs-triage label Dec 17, 2019

sagetherage assigned Ch3LL Dec 17, 2019

Ch3LL added Bug broken, incorrect, or confusing behavior severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around P4 Priority 4 team-core and removed needs-triage labels Dec 18, 2019

Ch3LL added this to the Approved milestone Dec 18, 2019

stale bot added the stale label Jan 19, 2020

stale bot removed the stale label Jan 22, 2020

sagetherage unassigned Ch3LL May 8, 2020

sagetherage removed the P4 Priority 4 label Jun 3, 2020

sagetherage added the doc-rework confusing, misleading, or wrong label Nov 3, 2020

sagetherage removed the team-core label May 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-root Users Unable to Review Job Status #55275

Non-root Users Unable to Review Job Status #55275

jpittiglio commented Nov 12, 2019

eliasp commented Dec 2, 2019

Ch3LL commented Dec 18, 2019

stale bot commented Jan 19, 2020

sagetherage commented Jan 22, 2020

stale bot commented Jan 22, 2020

petiepooo commented Oct 30, 2020

sagetherage commented Nov 3, 2020

Non-root Users Unable to Review Job Status #55275

Non-root Users Unable to Review Job Status #55275

Comments

jpittiglio commented Nov 12, 2019

Description of Issue

Setup

Possible Solution

Versions Report

eliasp commented Dec 2, 2019

Ch3LL commented Dec 18, 2019

stale bot commented Jan 19, 2020

sagetherage commented Jan 22, 2020

stale bot commented Jan 22, 2020

petiepooo commented Oct 30, 2020

sagetherage commented Nov 3, 2020