Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IOError: [Errno 14 (EFAULT)] Bad address raised from psutil.Process() #905

Open
ssanderson opened this issue Oct 4, 2016 · 5 comments
Open

Comments

@ssanderson
Copy link

I'm running an application that periodically tracks the memory usage of its child processes using psutil.Process. Occasionally, I see stack traces on instantiation of Process indicating that an EFAULT was triggered when reading /proc/pid/stat:

Running with psutil version 4.3.0, I get the following traceback on invocation of psutil.Process():

/usr/local/lib/python2.7/dist-packages/psutil/__init__.py, line 364, in __init__
/usr/local/lib/python2.7/dist-packages/psutil/__init__.py, line 390, in _init
/usr/local/lib/python2.7/dist-packages/psutil/__init__.py, line 650, in create_time
/usr/local/lib/python2.7/dist-packages/psutil/_pslinux.py, line 962, in wrapper
/usr/local/lib/python2.7/dist-packages/psutil/_pslinux.py, line 1108, in create_time
/usr/local/lib/python2.7/dist-packages/psutil/_pslinux.py, line 994, in _parse_stat_file
IOError: [Errno 14] Bad address 

The line number at the bottom of the trace corresponds to the read call in this block:

    def _parse_stat_file(self):
        """Parse /proc/{pid}/stat file. Return a list of fields where
        process name is in position 0.
        Using "man proc" as a reference: where "man proc" refers to
        position N, always subscract 2 (e.g starttime pos 22 in
        'man proc' == pos 20 in the list returned here).
        """
        with open_binary("%s/%s/stat" % (self._procfs_path, self.pid)) as f:
            data = f.read()

I'm not sure if this is a kernel bug, if this is an error that psutil should be catching and handling, or if it's an error that I should be catching and handling in my application.

If it's helpful, I'm running Ubuntu 16.04, with kernel version 4.4.0-36:

ssanderson@research-i-4c649b5a:~$ uname -r
4.4.0-36-generic

ssanderson@research-i-4c649b5a:~$ dpkg -l | grep `uname -r`
ii  linux-headers-4.4.0-36-generic     4.4.0-36.55                      amd64        Linux kernel headers for version 4.4.0 on 64 bit x86 SMP
ii  linux-image-4.4.0-36-generic       4.4.0-36.55                      amd64        Linux kernel image for version 4.4.0 on 64 bit x86 SMP
ii  linux-image-extra-4.4.0-36-generic 4.4.0-36.55                      amd64        Linux kernel extra modules for version 4.4.0 on 64 bit x86 SMP

Thanks for maintaining psutil. It's super helpful for us!

@giampaolo
Copy link
Owner

Mmmm that's bad. No, that's not an error you're supposed to catch. I have no idea why it happens and I couldn't find anything on Google, so it may as well be a kernel bug AFAIK. At first I suspected this was a race condition, meaning the PID/process is gone in between open() and read() calls. So I made this quick test script:

from psutil.tests import get_test_subprocess
subp = get_test_subprocess()
with open("/proc/%s/stat" % subp.pid, "r") as f:
    subp.kill()
    subp.wait()
    print(f.read())

....but it raise ESRCH (no such process), not EFAULT (buffer outside of accessible address space). How often does this happen? Could you do me a favor and try to investigate this yourself?
Assuming this is not a kernel bug, what I suspect is that EFAULT may also be raised in case the process is gone (between open() and read() calls). What I would ask you to do is to put a pdb in there and check whether the process is gone. You can change the code like this:

    def _parse_stat_file(self):
        with open_binary("%s/%s/stat" % (self._procfs_path, self.pid)) as f:
            try:          
                data = f.read()
            except EnvironmentError as err:
                if err.errno == errno.EFAULT:
                    import pdb; pdb.set_trace()
                else:
                    raise

Once you're in the pdb you can do pid_exists(self.pid) and check whether the process is gone or not. Could you do that?
If the process is gone I suppose we can treat EFAULT the same way we treat ESRCH and raise NoSuchProcess.

@ssanderson
Copy link
Author

Assuming this is not a kernel bug, what I suspect is that EFAULT may also be raised in case the process is gone (between open() and read() calls)

I don't think this can be the issue, since in the case where I'm seeing this, the process whose stat file I'm failing to read is the process in which the exception is being raised.

I forgot to mention this above, but another confounding variable here is that I'm doing this in a docker container, which means my pid namespace is being virtualized. My current best guess is that that's somehow the culprit, but I haven't been able to figure out a clear pattern yet.

@giampaolo
Copy link
Owner

giampaolo commented Oct 5, 2016

Mmm...

  1. what if you pdb in there, then try to reopen the file and read it again? Is it a permanent condition (keeps returning EFAULT) or it gets solved on a second try or after a while?

  2. what about other /proc files, e.g. /proc/PID/status? Do they have the same problem?

  3. what does "ps" show about that process?

If it's a temporary condition we might try to retry in case of EFAULT.

@giampaolo
Copy link
Owner

Any news about this? It would be great if this could be investigated.

@ssanderson
Copy link
Author

@giampaolo sorry, I lost track of this thread. I haven't seen the issue here in a bit. I'll try to investigate if/when I see it next. I might be able to at least get better logging around whether retrying is helpful. The pdb solution isn't really viable since I'm only seeing the error here intermittently on a large-ish production deployment where a bunch of containers are all checking their memory usage every 10 seconds or so.

@giampaolo giampaolo changed the title IOError: [Errno 14] Bad address raised from psutil.Process() IOError: [Errno 14 (EFAULT)] Bad address raised from psutil.Process() Feb 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants