Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend Process.open_files with other FDs #1119

Closed
hzpc-joostk opened this issue Aug 30, 2017 · 6 comments
Closed

Extend Process.open_files with other FDs #1119

hzpc-joostk opened this issue Aug 30, 2017 · 6 comments

Comments

@hzpc-joostk
Copy link

hzpc-joostk commented Aug 30, 2017

This is a proposal to query all file descriptors separately.

For now, this only works for Linux.

code

# module: psutil._pslinux

# ...

class Process(object):
    """Linux process implementation."""

    # ...

    def get_fdinfo(self, fd):
        # Get file position and flags.
        file = "%s/%s/fdinfo/%s" % (
            self._procfs_path, self.pid, fd)
        with open_binary(file) as f:
            pos = int(f.readline().split()[1])
            flags = int(f.readline().split()[1], 8)
        mode = file_flags_to_mode(flags)
        return pos, flags, mode

    @wrap_exceptions
    def open_fds(self, kind=None):
        assert kind in {'file', 'special', 'pipe', None}, kind

        retlist = []
        files = os.listdir("%s/%s/fd" % (self._procfs_path, self.pid))
        hit_enoent = False

        for fd in files:
            file = "%s/%s/fd/%s" % (self._procfs_path, self.pid, fd)
            try:
                path = readlink(file)
            except OSError as err:
                # ENOENT == file which is gone in the meantime
                if err.errno in (errno.ENOENT, errno.ESRCH):
                    hit_enoent = True
                    continue
                elif err.errno == errno.EINVAL:
                    # not a link
                    continue
                else:
                    raise
            else:
                # If path is not an absolute there's no way to tell
                # whether it's a regular file or not, so we skip it.
                # A regular file is always supposed to be have an
                # absolute path though.
                if path.startswith('/'):
                    if isfile_strict(path):
                        if (kind is not None and kind != 'file'):
                            continue
                    else:
                        if (kind is not None and kind != 'special'):
                            continue
                elif path.startswith('pipe'):
                    if (kind is not None and kind != 'pipe'):
                        continue
                    path = int(path[6:-1])
                
                else:
                    continue

                pos, flags, mode = self.get_fdinfo(fd)
                ntuple = popenfile(path, int(fd), pos, mode, flags)
                retlist.append(ntuple)

        if hit_enoent:
            # raise NSP if the process disappeared on us
            os.stat('%s/%s' % (self._procfs_path, self.pid))

        return retlist

    def open_files(self):
        return self.open_fds(kind='file')

    def open_pipes(self):
        return self.open_fds(kind='pipe')

    def open_specials(self):
        return self.open_fds(kind='special')

benchmark

The following cat | python pipeline sleeps in background.

$ seq 100 > /tmp/foo
$ cat | python -c 'import time; nul = open("/dev/null"); foo = open("/tmp/foo"); foo.readline(20); time.sleep(3600); nul.close(); foo.close()' &
$ pgrep cat; pgrep python
43620
43621

Benchmarking in IPython. With the original _pslinux.Process.open_files():

In [1]: import psutil

In [2]: cat = psutil.Process(43620)
   ...: python = psutil.Process(43621)

In [3]: cat.open_files()
Out[3]: []

In [4]: python.open_files()
Out[4]: [popenfile(path='/tmp/foo', fd=4, position=292, mode='r', flags=32768)]

In [5]: %timeit cat.open_files()
   ...: %timeit python.open_files()
40.8 µs ± 417 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
104 µs ± 1.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

After the implementation of open_fds and related methods above:

In [6]: cat.open_files()     # or cat._proc.open_files()
Out[6]: []

In [7]: python.open_files()  # or python._proc.open_files()
Out[7]: [popenfile(path='/tmp/foo', fd=4, position=292, mode='r', flags=32768)]

In [8]: cat._proc.open_fds()
Out[8]:
[popenfile(path='/dev/pts/5', fd=0, position=0, mode='r+', flags=32770),  # FD0 (STDIN), special character (tty) on pts/5
 popenfile(path=121124880, fd=1, position=0, mode='w', flags=1),          # FD1 (STDOUT), pipe #121124880
 popenfile(path='/dev/pts/5', fd=2, position=0, mode='r+', flags=32770)]  # FD2 (STDERR), special character (tty) on pts/5

In [9]: python._proc.open_fds()
Out[9]:
[popenfile(path=121124880, fd=0, position=0, mode='r', flags=0),          # FD0 (STDIN),  pipe #121124880
 popenfile(path='/dev/pts/5', fd=1, position=0, mode='r+', flags=32770),  # FD1 (STDOUT), special character (tty) on pts/5
 popenfile(path='/dev/pts/5', fd=2, position=0, mode='r+', flags=32770),  # FD2 (STDERR), special character (tty) on pts/5
 popenfile(path='/dev/null', fd=3, position=0, mode='r', flags=32768),    # FD3 special character file '/dev/null'
 popenfile(path='/tmp/foo', fd=4, position=292, mode='r', flags=32768)]   # FD4 regular file '/tmp/foo'

Returned values for are the same.
Timings for the new methods:

%timeit cat._proc.open_fds()          # => 168 µs ± 1 µs per loop
%timeit cat._proc.open_files()        # =>  43 µs ± 1 µs per loop
%timeit cat._proc.open_pipes()        # =>  90 µs ± 1 µs per loop
%timeit cat._proc.open_specials()     # => 128 µs ± 1 µs per loop

%timeit python._proc.open_fds()       # => 273 µs ± 7 µs per loop
%timeit python._proc.open_files()     # => 109 µs ± 1 µs per loop
%timeit python._proc.open_pipes()     # => 113 µs ± 1 µs per loop
%timeit python._proc.open_specials()  # => 194 µs ± 1 µs per loop

_pslinux.Process.open_fds() accepts the following kind values for limiting the results (and processing):

  • file
  • pipe
  • special

Calling the _pslinux.Process.open_fds() without specifying a kind (or through the helper methods), is quite slower, but returns a complete overview of all file descriptors.

new uses

We can see - as intended - these processes share the same pipe:

In [10]: cat._proc.open_pipes()
Out[10]: [popenfile(path=121124880, fd=1, position=0, mode='w', flags=1)]

In [11]: python._proc.open_pipes()
Out[11]: [popenfile(path=121124880, fd=0, position=0, mode='r', flags=0)]

With this information, we can find out which process are attached to a certain pipe. For example, when you want to know which process is attached to the other end of a pipe:

In [12]: pp = []
    ...: pipe_no = 121124880
    ...: for p in psutil.process_iter():
    ...:     try:
    ...:         pipes = p._proc.open_pipes()
    ...:     except psutil.Error:
    ...:         continue
    ...:     if any(pipe.path == pipe_no for pipe in pipes):
    ...:         pp.append(p)
    ...:
    ...: pp

Out[12]:
[<psutil.Process(pid=43620, name='cat') at 140217780332304>,
 <psutil.Process(pid=43621, name='python') at 140217780333480>]

Or more complex:

$ sleep 6000 | cat <(sleep 6000) - &
$ pgrep sleep
9133
9134
$ pgrep cat
9135
In [20]: cat = psutil.Process(9135); cat
Out[20]: <psutil.Process(pid=9135, name='cat') at 140217741637896>

In [21]: cat._proc.open_pipes()
Out[21]:
[popenfile(path=121260467, fd=0, position=0, mode='r', flags=0),
 popenfile(path=121260469, fd=3, position=0, mode='r', flags=32768),
 popenfile(path=121260469, fd=12, position=0, mode='r', flags=0)]

In [22]: %time
    ...: pp = []
    ...: pipes = cat._proc.open_pipes()
    ...: pipe_nos = set(pipe.path for pipe in pipes)
    ...:
    ...: for p in psutil.process_iter():
    ...:     try:
    ...:         pipes = p._proc.open_pipes()
    ...:     except psutil.Error:
    ...:         continue
    ...:     if any(pipe.path in pipe_nos for pipe in pipes):
    ...:         pp.append(p)
    ...:
    ...: pp

CPU times: user 42 ms, sys: 76 ms, total: 118 ms
Wall time: 118 ms

Out[22]:
[<psutil.Process(pid=9133, name='sleep') at 140217736428960>,
 <psutil.Process(pid=9134, name='sleep') at 140217780361200>,
 <psutil.Process(pid=9135, name='cat') at 140217780361536>]

In [23]: for p in pp:
    ...:     print(p.pid, p.name(), p._proc.open_pipes())

9133 sleep [popenfile(path=121260467, fd=1, position=0, mode='w', flags=1)]
9134 sleep [popenfile(path=121260469, fd=1, position=0, mode='w', flags=1)]
9135 cat [popenfile(path=121260467, fd=0, position=0, mode='r', flags=0), popenfile(path=121260469, fd=3, position=0, mode='r', flags=32768), popenfile(path=121260469, fd=12, position=0, mode='r', flags=0)]

We could reconstruct the pipeline using this information and a little knowledge of how cat opens its files. I am not sure why cat has pipe:121260469 open in two file descriptors, but whatever. :-)

thoughts

  • need to think of cross-platform implementation.
  • to make this user-friendly, the new methods should be exposed to psutil.Process.
  • think of better kind names; file should probably be regular or something similar.
  • optionally add kind as a value to named tuple popenfile.
  • optionally add helper function to retrieve the process at the other end of the pipe.
    • maybe a class that describes a pipe and its attached processes.
  • optionally implement limiting to a file descriptor number might be useful.

What do you think @giampaolo ? Thanks for you consideration.
I will create a PR and unittests if you (and others) think this is useful.

@giampaolo
Copy link
Owner

giampaolo commented Aug 31, 2017

Hello there and thanks for putting this up. To be honest this doesn't strike me as something consistent or easily portable across platforms. For instance:

  • flags really isn't useful unless also some constants are exposed (which ones? where are they defined? what do they represent? where are they documented?)
  • path being an integer is kinda weird, as one would expect a string (a minor thing, just mentioning)
  • open_fds should at least return a type, otherwise it's kinda useless. To quote your example:
In [9]: python._proc.open_fds()
Out[9]:
[popenfile(path=121124880, fd=0, position=0, mode='r', flags=0),          # FD0 (STDIN),  pipe #121124880
 popenfile(path='/dev/pts/5', fd=1, position=0, mode='r+', flags=32770),  # FD1 (STDOUT), special character (tty) on pts/5

...you inferred that the first element is a pipe and the second is a "special" but it's should be the lib which tells you that (also what's a "special" fd?). That poses the disturbing question: what are the available fd types and what do they represent? Are there just "special", "pipe" and "file" or also others? To my knowledge there are many others: fifos, sockets, char and block devices, open directories and who knows what else. Should they be returned? Also, each fd type represents a completely different thing. How should they be exposed to the end user?

A utility which managed to wrap this all up is lsof, but that's a cmdline utility which just prints stuff on screen. A lib is something different: you have to think carefully about an API which is consistent across all possible types and which is usable, which is far from easy if possible at all.

Note: this same proposal has been raised before: #285

@hzpc-joostk
Copy link
Author

Hi. Thanks for your time and sharing your thoughts. Good point about #285. I've read it through.

Instead of the suggested hard coded types, the user could parse the file's os.stat(file).st_mode using the stat module. What do you think?

A test where get_type(mode) returns the name of the stat-module function that tested the file's type positive. (Technically a list, which should contain only one function name, AFAIK).

In [   ]: proc = psutil.Process()

In [   ]: proc
Out[   ]: <psutil.Process(pid=38894, name='ipython') at 140000526814512>

In [   ]: def get_type(mode):
     ...:     f_names = [fn for fn in dir(stat) if fn.startswith('S_IS') and callable(getattr(stat, fn))]
     ...:     types = []
     ...:     for name in f_names:
     ...:         func = getattr(stat, name)
     ...:         if func(mode):
     ...:             types.append(name)
     ...:     return types

In [   ]: !ls -l /proc/38894/fd
total 0
lrwx------. 1 joostk wheel 64 Sep  1 13:31 0 -> /dev/pts/7
lrwx------. 1 joostk wheel 64 Sep  1 13:31 1 -> /dev/pts/7
lr-x------. 1 joostk wheel 64 Sep  1 13:31 10 -> pipe:[127721742]
l-wx------. 1 joostk wheel 64 Sep  1 13:31 11 -> pipe:[127721742]
lrwx------. 1 joostk wheel 64 Sep  1 13:31 2 -> /dev/pts/7
lr-x------. 1 joostk wheel 64 Sep  1 13:31 3 -> /dev/urandom
l-wx------. 1 joostk wheel 64 Sep  1 13:31 4 -> /dev/null
lrwx------. 1 joostk wheel 64 Sep  1 13:31 5 -> /home/joostk/.ipython/profile_default/history.sqlite
lrwx------. 1 joostk wheel 64 Sep  1 13:31 6 -> /home/joostk/.ipython/profile_default/history.sqlite
lrwx------. 1 joostk wheel 64 Sep  1 13:31 7 -> [eventpoll]
lr-x------. 1 joostk wheel 64 Sep  1 13:31 8 -> pipe:[127721741]
l-wx------. 1 joostk wheel 64 Sep  1 13:31 9 -> pipe:[127721741]
In [   ]: for fd in os.listdir('/proc/self/fd'):
     ...:     file = '/proc/self/fd/%s' % fd
     ...:     path = os.readlink(file)
     ...:     st = os.stat(file)
     ...:     mode = st.st_mode
     ...:     print(fd, stat.filemode(mode), mode, stat.S_IFMT(mode), get_type(mode), path)
     ...:
0  crw--w----  8592  8192 ['S_ISCHR']  /dev/pts/7
1  crw--w----  8592  8192 ['S_ISCHR']  /dev/pts/7
2  crw--w----  8592  8192 ['S_ISCHR']  /dev/pts/7
3  crw-rw-rw-  8630  8192 ['S_ISCHR']  /dev/urandom
4  crw-rw-rw-  8630  8192 ['S_ISCHR']  /dev/null
5  -rw------- 33152 32768 ['S_ISREG']  /home/joostk/.ipython/profile_default/history.sqlite
6  -rw------- 33152 32768 ['S_ISREG']  /home/joostk/.ipython/profile_default/history.sqlite
7  -rw------- 33152 32768 ['S_ISREG']  [eventpoll]
8  prw-------  4480  4096 ['S_ISFIFO'] pipe:[127721741]
9  prw-------  4480  4096 ['S_ISFIFO'] pipe:[127721741]
10 prw-------  4480  4096 ['S_ISFIFO'] pipe:[127721742]
11 prw-------  4480  4096 ['S_ISFIFO'] pipe:[127721742]
12 -rw------- 33152 32768 ['S_ISREG']  /home/joostk/.ipython/profile_default/history.sqlite-journal

The user could also test whether the stat.S_IFREG bit was set in os.stat(file).st_mode or use the helper function stat.S_IFMT(mode).

The constants in the stat module can be used as kind parameter in my proposal:

class Process(object):
    """Linux process implementation."""

    # ...

    @wrap_exceptions
    def open_fds(self, kind=None):
        # ...
        if kind is None or stat.S_IFMT(st_mode) == kind:
            # parse fdinfo, append to ret_list

    def open_files(self):
        return self.open_fds(kind=stat.S_IFREG)
  • Probably drop .open_pipes and .open_specials or rename them accordingly.
  • Add stat_result, st_mode or type to popenfile:
    • os.stat(path):
      popenfile(..., stat=os.stat_result(...))
    • os.stat(path).st_mode:
      popenfile(..., st_mode=1234)
    • stat.S_IFMT(os.stat(path).st_mode):
      popenfile(..., type=1234)

What do you think?

@giampaolo
Copy link
Owner

giampaolo commented Sep 3, 2017

Uhm... yes, using stat.S_IF* functions looks like a good idea (and the doc should mention them).
Maybe we should have an extra "type" field which will be a human readable string like "regular", "fifo", etc.
It must be noted that stat.S_IF* functions do not satisfy all possible types though (there are more) so they are not enough in order to fill all the possible "type" fields.

"position" (offset) only makes sense for regular files. I see that lsof has a column called "SIZE/OFF(set)" so I suppose that depending on the fd sometimes it refers to it's size (e.g. a directory - 4096), some other times to its offset/position (e.g. a regular file). As such maybe our field should be called differently ("sizeoff" probably).

lsof also has a NODE column which should be the inode. AFAIU that's needed in order to identify a fd uniquely so it looks like it's something we want to provide as well.

"path" field no longer makes sense as a fd can be a socket or something else different than a path. lsof call this "NAME", which looks like a better name.

open_pipes() and open_special() are probably useless. I envision this in form of a single open_fds(type=None) method where the type argument is probably a string ("regular", "pipe", "fifo", etc).

In summary, I envision this as something along these lines:

>>> p.open_fds()
[popenfile(name='/tmp/foo', fd=4, sizeoff=292, flags=32768, node=2, type="regular"),
 ...]

One thing which leaves me puzzled is how socket fds should be represented, because they have extra information (local / remote address and status). Right now we already have Process.connections() which covers this use case, so we risk to duplicate functionality. FWIW lsof puts local/remote address and states in the NAME column.

@hzpc-joostk
Copy link
Author

hzpc-joostk commented Sep 12, 2017

Thanks. Comparing lsof of a certain process and via htop and its l (L) command for the same process, the NAME, NODE/INODE, TYPE and OFFSET/SIZE columns make sense to reuse in popenfile.

It seems that lsof doesn't always show the OFFSET, unless option -o was passed. Perhaps offset/position and size can be included both in popenfile and None if it was not determined (e.g. a CHR has no size nor offset).

popenfile(name='/tmp/foo', fd=4, size=292, offset=123, flags=32768, node=2, type="REG")

Changing the names of popenfile named tuple may break code of your users. Should we preserve the current names as an alias in the named tuple?

class popenfile(psutil._psplatform.popenfile):
    @property
    def path(self):
        return self.name

    @property
    def position(self):
        return self.sizeoff  # or self.offset
    ...

@hzpc-joostk
Copy link
Author

hzpc-joostk commented Oct 3, 2017

Hey @giampaolo I guess there is not a lot of interest in this feature. 😉
I could create a pull request with some of the discussed ideas or drop the idea, especially as we haven't thought about OSs other than Linux.

@giampaolo
Copy link
Owner

Sorry I've been travelling and didn't have much time. With that said, I'm not against the idea per se but it looks like a very big task with a hard to define API so let's just say it's low priority for me at this point. I'd say let's just drop it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants