Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CentOS] ppid_map() errors on hardened systems #2167

Open
sehoffmann opened this issue Nov 7, 2022 · 3 comments
Open

[CentOS] ppid_map() errors on hardened systems #2167

sehoffmann opened this issue Nov 7, 2022 · 3 comments

Comments

@sehoffmann
Copy link

Summary

  • OS: CentOS Linux release 7.9.2009 (Core)
  • Architecture: 64bit
  • Psutil version: 5.9.0
  • Python version: Python 3.10.6
  • Type: core

Description

I work on a HPC cluster where access to individual /proc entries is highly restricted. A software package I'm using (horovod) relies on psutil to find the child processes spawned by a supervisor process. Process.children() in turn uses ppid_map() to find any child processes.

Here lies the crux: ppid_map() iterates over ALL processes, including those of other users or the root. This causes a PermissionError when trying to access information about the processes (one might say rightfully so). In a multi-user system, psutil can not expect to have access to information of all processes running on the system. Notice, that to solve the initial task, we actually don't need that additional information; information about our own processes are good enough.

The end result is, that horovod fails in its task of terminating its children and simply enters a zombie state, including those worker processes. I then have to manually kill those processes by hand!

Here is the full stack trace (sorry for the duplicate logs, these come from intertwined outputs from multiple processes):

Traceback (most recent call last):   
  File "[...]/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
Exception in thread Thread-1 (fn):   
Traceback (most recent call last):   
  File "[...]/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "[...]/lib/python3.10/threading.py", line 953, in run
    self.run()
  File "[...]/lib/python3.10/threading.py", line 953, in run
    self.run()
  File "[...]/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "[...]/lib/python3.10/site-packages/horovod/runner/util/threads.py", line 157, in fn
    self._target(*self._args, **self._kwargs)
  File "[...]/lib/python3.10/site-packages/horovod/runner/util/threads.py", line 157, in fn
    self._target(*self._args, **self._kwargs)
  File "[...]/lib/python3.10/site-packages/horovod/runner/util/threads.py", line 157, in fn
    func(*args)
  File "[...]/lib/python3.10/site-packages/horovod/runner/common/util/safe_shell_exec.py", line 41, in terminate_executor_shell_and_children
    func(*args)
  File "[...]/lib/python3.10/site-packages/horovod/runner/common/util/safe_shell_exec.py", line 41, in terminate_executor_shell_and_children
    func(*args)
  File "[...]/lib/python3.10/site-packages/horovod/runner/common/util/safe_shell_exec.py", line 41, in terminate_executor_shell_and_children
    for child in p.children():
  File "[...]/lib/python3.10/site-packages/psutil/__init__.py", line 278, in wrapper
    for child in p.children():
  File "[...]/lib/python3.10/site-packages/psutil/__init__.py", line 278, in wrapper
    for child in p.children():
  File "[...]/lib/python3.10/site-packages/psutil/__init__.py", line 278, in wrapper
    return fun(self, *args, **kwargs)
  File "[...]/lib/python3.10/site-packages/psutil/__init__.py", line 906, in children
    return fun(self, *args, **kwargs)
  File "[...]/lib/python3.10/site-packages/psutil/__init__.py", line 906, in children
    return fun(self, *args, **kwargs)
  File "[...]/lib/python3.10/site-packages/psutil/__init__.py", line 906, in children
    ppid_map = _ppid_map()
  File "[...]/lib/python3.10/site-packages/psutil/_pslinux.py", line 1640, in ppid_map
    ppid_map = _ppid_map()
  File "[...]/lib/python3.10/site-packages/psutil/_pslinux.py", line 1640, in ppid_map
    ppid_map = _ppid_map()
  File "[...]/lib/python3.10/site-packages/psutil/_pslinux.py", line 1640, in ppid_map
    with open_binary("%s/%s/stat" % (procfs_path, pid)) as f:
  File "[...]/lib/python3.10/site-packages/psutil/_common.py", line 711, in open_binary
    with open_binary("%s/%s/stat" % (procfs_path, pid)) as f:
  File "[...]/lib/python3.10/site-packages/psutil/_common.py", line 711, in open_binary
    with open_binary("%s/%s/stat" % (procfs_path, pid)) as f:
  File "[...]/lib/python3.10/site-packages/psutil/_common.py", line 711, in open_binary
    return open(fname, "rb", **kwargs)
PermissionError: [Errno 1] Operation not permitted: '/proc/1/stat'
    return open(fname, "rb", **kwargs)
PermissionError: [Errno 1] Operation not permitted: '/proc/1/stat'
    return open(fname, "rb", **kwargs)
PermissionError: [Errno 1] Operation not permitted: '/proc/1/stat'

Proposed Solution

Ignore any processes without access. Maybe print a warning once to indicate that this behavior might cause issues down the line, but doesn't have to. Introduce an option to disable this warning. This is a much better solution imo than to just fail completely (which as you can take from my description, also causes a lot of trouble).

@sehoffmann sehoffmann added the bug label Nov 7, 2022
@github-actions github-actions bot added the linux label Nov 7, 2022
@sye98
Copy link

sye98 commented Feb 16, 2023

+1. psutil is called in ipykernel/kernelbase.py and is causing excessive permission error messages for me.

@giampaolo
Copy link
Owner

I agree that some APIs that return multiple values, such as Process children(), open_files(), connections() etc., can have a new ignore_ad=bool parameter to ignore AccessDenied internally. It should default to False, meaning raise AD by default. There was a proposal already in some other ticket.

@sehoffmann
Copy link
Author

It should default to False, meaning raise AD by default.

The non-read permission is caused by hidepid=1, a standard option provided by linux. The problem with that is that most any package that uses psutils won't work with hidepid=1 then, because lets be realistic; how many packages going to change that default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants