Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-root Users Unable to Review Job Status #55275

Open
jpittiglio opened this issue Nov 12, 2019 · 7 comments
Open

Non-root Users Unable to Review Job Status #55275

jpittiglio opened this issue Nov 12, 2019 · 7 comments
Labels
Bug broken, incorrect, or confusing behavior doc-rework confusing, misleading, or wrong severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Milestone

Comments

@jpittiglio
Copy link

Description of Issue

Followed instructions to setup non-root users with the ability to run jobs as specified at https://docs.saltstack.com/en/latest/ref/publisheracl.html

Running jobs as non-root user completes as expected:

[ec2-user@salt ~]$ salt 'salt' test.ping
salt:
    True

Similarly, running jobs using the --async flag works as expected:

[ec2-user@salt ~]$ salt 'salt' test.ping --async

Executed command with job ID: 20191112202301098417

However, attempting to view previous jobs results using salt-run jobs.lookup_jid <x> or the salt.client.LocalClient.get_cli_returns function fails. Example:

[ec2-user@salt ~]$ salt-run jobs.lookup_jid 20191112193054933477
Exception occurred in runner jobs.lookup_jid: Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/salt/client/mixins.py", line 381, in low
    data['return'] = func(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/salt/runners/jobs.py", line 128, in lookup_jid
    display_progress=display_progress
  File "/usr/lib/python3.7/site-packages/salt/runners/jobs.py", line 200, in list_job
    ret['Result'] = mminion.returners['{0}.get_jid'.format(returner)](jid)
  File "/usr/lib/python3.7/site-packages/salt/returners/local_cache.py", line 357, in get_jid
    with salt.utils.files.fopen(retp, 'rb') as rfh:
  File "/usr/lib/python3.7/site-packages/salt/utils/files.py", line 399, in fopen
    f_handle = open(*args, **kwargs)  # pylint: disable=resource-leakage
PermissionError: [Errno 13] Permission denied: '/var/cache/salt/master/jobs/00/f18031815ef2f13a28096fabced02cd5ea815a672b5a50ac58bf8730d097dd/salt/return.p'

Per the linked documentation, reviewing the root level directory shows expected permissions:

[ec2-user@salt ~]$ ll /var/cache/salt/master/jobs/00/f18031815ef2f13a28096fabced02cd5ea815a672b5a50ac58bf8730d097dd/
total 4
-rw-r--r-- 1 root root 20 Nov 12 19:30 jid
drwxr-xr-x 2 root root 22 Nov 12 19:30 salt

However, the return.p file shows it is read+write by root only:

[ec2-user@salt ~]$ ll /var/cache/salt/master/jobs/00/f18031815ef2f13a28096fabced02cd5ea815a672b5a50ac58bf8730d097dd/salt/
total 4
-rw------- 1 root root 27 Nov 12 19:30 return.p

Setup

Standard RPM installation on an AWS EC2 instance running Amazon Linux 2. Configured to allow ec2-user to run all states on all nodes as following in /etc/salt/master:

publisher_acl:
  ec2-user:
    - .*

Executed chmod 755 /var/cache/salt /var/cache/salt/master /var/cache/salt/master/jobs /var/run/salt /var/run/salt/master as indicated in linked documentation.

Possible Solution

Issue seems to stem from the following:

atomic_rename(self._tmp_filename, self._filename)

Ultimately, the return.p file is created as a new temporary file, which I assume is given the permissions read+write to root only. Once the temporary file is written and the context handler completes what it needs with the file, the close function is invoked and the temporary file is moved (os.rename on *nix) to the correct job cache location. Since it's moved, the original permissions are retained.

As a temporary workaround, I modified atomicfile.py as follows:

atomic_rename = os.rename # pylint: disable=C0103

Instead of os.rename, I use shutil.copyfile

atomic_rename(self._tmp_filename, self._filename)

After the rename occurs, I call os.remove(self._tmp_filename)

I have not fully tested this to identify long-term ramifications, but wanted to highlight a possible fix for others in a similar situation. While using an external job cache would likely be a better long-term solution, the documentation implies this should be possible.

Additionally, in some circumstances, there appear to be other non-critical issues - for example, in some scenarios, the following occurs when querying the job ID even after the fixes identified above:

[ec2-user@salt ~]$ salt-run jobs.lookup_jid 20191112202005349718
[WARNING ] Could not write out jid file for job 20191112202020433611. Retrying.
[WARNING ] Could not write out jid file for job 20191112202020433611. Retrying.
[WARNING ] Could not write out jid file for job 20191112202020433611. Retrying.
[WARNING ] Could not write out jid file for job 20191112202020433611. Retrying.
[WARNING ] Could not write out jid file for job 20191112202020433611. Retrying.
[ERROR   ] prep_jid could not store a jid after 5 tries.
[ERROR   ] Could not store job cache info. Job details for this run may be unavailable.
salt:
    True

Note the information is still returned, but it appears a new job is trying to be created but cannot be. Likely unrelated and probably needs a separate issue, but wanted to document here.

Versions Report

Salt Version:
           Salt: 2019.2.2
 
Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 2.8.0
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.10.3
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.6.2
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 3.7.4 (default, Oct  2 2019, 19:30:55)
   python-gnupg: Not Installed
         PyYAML: 4.2
          PyZMQ: 18.1.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.5.3
            ZMQ: 4.3.2
 
System Versions:
           dist:   
         locale: UTF-8
        machine: x86_64
        release: 4.14.138-114.102.amzn2.x86_64
         system: Linux
        version: Not Installed
@eliasp
Copy link
Contributor

eliasp commented Dec 2, 2019

This is a problem we're seeing in several environments here as well and it shows a general architectural issue of the SaltStack CLI tooling IMHO.
The CLI tools are a hybrid of a local process and a remote job execution, so issues like these show up over and over again in various places (e.g. handling of keys through salt-key etc.).
IMHO, the whole CLI tooling should continuously move towards the goal of not doing any local execution at all, but to merely interact with SaltStack through the master's interface to handle jobs, which are then again completely executed by the Master - the CLI should only be a thin wrapper around all this.

@Ch3LL
Copy link
Contributor

Ch3LL commented Dec 18, 2019

im able to replicate this when the salt-master/salt-minion processes started up via root. When i start them up via the same user it does work, but we want it to work while running salt via root. will need to get this fixed up.

@Ch3LL Ch3LL added Bug broken, incorrect, or confusing behavior severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around P4 Priority 4 team-core and removed needs-triage labels Dec 18, 2019
@Ch3LL Ch3LL added this to the Approved milestone Dec 18, 2019
@stale
Copy link

stale bot commented Jan 19, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

@stale stale bot added the stale label Jan 19, 2020
@sagetherage
Copy link
Contributor

not stale

@stale
Copy link

stale bot commented Jan 22, 2020

Thank you for updating this issue. It is no longer marked as stale.

@stale stale bot removed the stale label Jan 22, 2020
@sagetherage sagetherage removed the P4 Priority 4 label Jun 3, 2020
@petiepooo
Copy link

Still not stale.

@sagetherage
Copy link
Contributor

@petiepooo no more stalebot -- this is open and will remain so

@sagetherage sagetherage added the doc-rework confusing, misleading, or wrong label Nov 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior doc-rework confusing, misleading, or wrong severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Projects
No open projects
[Test] Triage
  
Needs triage
Development

No branches or pull requests

5 participants