Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process.is_running() speedup #59

Closed
giampaolo opened this issue May 23, 2014 · 7 comments
Closed

Process.is_running() speedup #59

giampaolo opened this issue May 23, 2014 · 7 comments

Comments

@giampaolo
Copy link
Owner

From billiej...@gmail.com on July 14, 2009 17:59:42

The current is_running() implementation relies on __eq__ which performs a
comparison against all the properties of two Process object instances in
pure Python.

By doing some benchmarks it is easily noticeable how a call to this method
takes a lot more time if compared to other Process methods/properties calls.

It would be worthy trying to find alternative approaches to speedup the
current implementation.

Original issue: http://code.google.com/p/psutil/issues/detail?id=59

@giampaolo giampaolo self-assigned this May 23, 2014
@giampaolo
Copy link
Owner Author

From jlo...@gmail.com on July 14, 2009 09:22:14

Did you have anything in mind on how to speed this up? Have you profiled this at all
to see what takes the most time? My guess (but I could be wrong) is that it's the
creation of the new Process object in is_running() and not the code in __eq__ that
takes the most time. The __eq__() code is just using built-in functions to read some
attributes and compare them, so it shouldn't be all that slow. 

If it turns out that __eq__() is the culprit after all then one thing that comes to
mind is selecting a specific subset of items to search for equality instead of
checking all of them. For example, just look at PID, ppid, name, command line, path.
That would cut down the number of items that are being checked and also eliminate
several function calls in the body of __eq__ that are currently being used, including
the string operation for startswith() and the check for callable() etc.

@giampaolo
Copy link
Owner Author

From billiej...@gmail.com on July 14, 2009 10:45:32

One problem is that we are comparing against too many properties: currently
*everything* the Process class has to offer except callables and private methods.

It is true that __eq__ uses fast builtin functions for comparison, but every time it
asks for a property, that's time spent on calling the underlying C code and we should
avoid that whenever possible.

What I had in mind was to determine a reliable and *limited* subset of properties to
use as a "signature" to identify a Process object uniquely.

Given the fact that it's unlikely that the kernel will reuse the same PID for a short
amount of time, mixing (pid + process creation time) already gives us a discrete
amount of uniqueness:

def __eq__(self, other):
    h1 = (self.pid, self.create_time)
    h2 = (other.pid, other.create_time)
    return h1 == h2

Since we're not sure about the kernel behavior across platforms when it comes to
assign new PIDs we could need to add more values to enforce such uniqueness by
picking some other properties but I'm not sure which ones exactly.

I'd be for using cmdline but the underlying C call determining it also determines
ppid, name and path in one shot, hence it couldn't be the best choice.

Thoughts?

@giampaolo
Copy link
Owner Author

From jlo...@gmail.com on July 14, 2009 14:02:43

I think PID + create time is good enough, since a process can't have both a reused
PID and the same create time in any normal circumstance I can come up with. That
should speed things up a bunch.

I'm not sure why is_running got coded this way: 

    def is_running(self):
        """Return whether the current process is running in the current process
        list."""
        try:
            new_proc = Process(self.pid)
            # calls get_process_info() which may in turn trigger NSP exception
            str(new_proc)
        except NoSuchProcess:
            return False
        return self == new_proc

That's going to be much slower because the call to str() is forcing the new_proc
Process object to fill out all the attributes by calling the C code before we check
for equality. Whatever the reason, if we change that around like the below it should
work fine and be much faster after the changes are made to __eq__()

    def is_running(self):
        """Return whether the current process is running in the current process
        list."""
        try:
            new_proc = Process(self.pid)
            return self == new_proc
        except NoSuchProcess:
            return False

@giampaolo
Copy link
Owner Author

From billiej...@gmail.com on July 15, 2009 02:35:58

Committed as r416 .

Before the patch:
$ python -m timeit -s "import os, psutil; p = psutil.Process(os.getpid())"
"p.is_running()"
1000 loops, best of 3: 1.29 msec per loop


After the patch:
$ python -m timeit -s "import os, psutil; p = psutil.Process(os.getpid())"
"p.is_running()"
10000 loops, best of 3: 135 usec per loop


That's about 10 times faster.

Status: Fixed

@giampaolo
Copy link
Owner Author

From billiej...@gmail.com on September 03, 2009 12:48:09

Status: FixedInSVN

@giampaolo
Copy link
Owner Author

From billiej...@gmail.com on September 17, 2009 01:57:42

Status: Fixed

@giampaolo
Copy link
Owner Author

From g.rodola on March 02, 2013 03:50:02

Updated csets after the SVN -> Mercurial migration: r416 == revision 498c34a2245c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant