Speedup Process methods #799

Closed
giampaolo opened this Issue Mar 29, 2016 · 20 comments

Comments

Projects
None yet
3 participants
@giampaolo
Owner

giampaolo commented Mar 29, 2016

This is something I've been thinking about for a while. Problem with current Process class implementation is that if you want to fetch multiple process info the underlying (C / Python) implementation may unnecessarily do the same thing more than once.

For instance, on Linux we read /proc/pid/stat file to get terminal, cpu_times, and create_time, and each time we invoke those methods we open the file and read from it. We get the one info we're interested in and discard the rest.
A similar thing happens on basically every OS. For instance on BSD we use kinfo_proc syscall to get basically 80% of all process info (uids, gids, create_time, ppid, io_counters, status etc.).
Again, all this info retrieved once (in C) and re-requested every time we call a Process method.

Since we typically get more than one info about the process (e.g. think about a top-like app) it appears clear that this could (and should) be done in a single operation. A possible solution would be to provide a context manager which temporarily puts the Process instance in a state so that internally the requested metrics are determined in a single shot and then "cached" / "stored" somewhere:

p = psutil.Process()
with p.oneshot():
    p.terminal()  # internally, this retrieves terminal, cpu_times and create time
    p.cpu_times()  # return the cached value
    p.create_time()  # return the cached value

Note: Process.as_dict() method would use this method implicitly.

=== EDITS AFTER COMMENTS BELOW ===

Branch

master...oneshotfiles_bucket

Benchmark scripts

Linux (+2.56x speedup)

$ python scripts/internal/bench_oneshot.py 
11 methods involved on platform 'linux2' (1000 iterations):
    cpu_percent
    cpu_times
    create_time
    gids
    name
    num_ctx_switches
    num_threads
    ppid
    status
    terminal
    uids
normal:  0.233 secs
oneshot: 0.091 secs
speedup: +2.56x

Windows (+1.9x or +6.5x speedup)

user's process:

C:\Python27\python.exe scripts\internal\bench_oneshot.py
13 methods involved on platform 'win32' (1000 iterations, psutil 4.5.0)
    cpu_affinity
    cpu_percent
    cpu_times
    io_counters
    ionice
    memory_info
    memory_percentnice
    num_ctx_switches
    num_handles
    num_threads
    parent
    ppid
normal:  1.243 secs
onshot:  0.655 secs
speedup: +1.90x

other user's process:

C:\Python27\python.exe scripts\internal\bench_oneshot.py
11 methods involved on platform 'win32' (1000 iterations, psutil 4.4.2):
    cpu_percent
    cpu_times
    create_time
    io_counters
    memory_info
    memory_percent
    num_ctx_switches
    num_handles
    num_threads
    parent
    ppid
normal:  5.027 secs
onshot:  0.765 secs
speedup: +6.57x

FreeBSD (+2.18x speedup)

$ python scripts/internal/bench_oneshot.py 
13 methods involved on platform 'freebsd10' (1000 iterations):
    cpu_percent
    cpu_times
    create_time
    gids
    io_counters
    memory_full_info
    memory_info
    memory_percent
    num_ctx_switches
    ppid
    status
    terminal
    uids
normal:  0.121 secs
oneshot: 0.056 secs
speedup: +2.18x

OSX (+1.92x speedup)

$ python scripts/internal/bench_oneshot.py
13 methods involved on platform 'darwin' (1000 iterations):
    cpu_percent
    cpu_times
    create_time
    gids
    memory_info
    memory_percent
    name
    num_ctx_switches
    num_threads
    parent
    ppid
    terminal
    uids
    username
normal:  0.200 secs
onshot:  0.104 secs
speedup: +1.92x

SunOS (+1.37x speedup)

$ python scripts/internal/bench_oneshot.py
12 methods involved on platform 'sunos5' (1000 iterations):
    cmdline
    create_time
    gids
    memory_full_info
    memory_info
    memory_percent
    name
    num_threads
    ppid
    status
    terminal
    uids
normal:  0.087 secs
oneshot: 0.064 secs
speedup: +1.37x
@nicolargo

This comment has been minimized.

Show comment
Hide comment
@nicolargo

nicolargo Apr 26, 2016

Contributor

+1 for this enhancement request. It will be awesome for the Glances project.

Contributor

nicolargo commented Apr 26, 2016

+1 for this enhancement request. It will be awesome for the Glances project.

@giampaolo

This comment has been minimized.

Show comment
Hide comment
@giampaolo

giampaolo Apr 30, 2016

Owner

I started working on this in a separate branch (master...oneshotfiles_bucket
) and completed the Linux implementation. The code below runs about twice as fast:

import psutil
import time

attrs = ['ppid', 'uids', 'gids', 'num_ctx_switches', 'num_threads', 'status',
         'name', 'cpu_times', 'terminal']
p = psutil.Process()
t = time.time()
for x in range(1000):
    p.as_dict(attrs)
print(time.time() - t)
Owner

giampaolo commented Apr 30, 2016

I started working on this in a separate branch (master...oneshotfiles_bucket
) and completed the Linux implementation. The code below runs about twice as fast:

import psutil
import time

attrs = ['ppid', 'uids', 'gids', 'num_ctx_switches', 'num_threads', 'status',
         'name', 'cpu_times', 'terminal']
p = psutil.Process()
t = time.time()
for x in range(1000):
    p.as_dict(attrs)
print(time.time() - t)
@nicolargo

This comment has been minimized.

Show comment
Hide comment
@nicolargo

nicolargo Jul 10, 2016

Contributor

Any head up on this enhancement ?

Contributor

nicolargo commented Jul 10, 2016

Any head up on this enhancement ?

@giampaolo

This comment has been minimized.

Show comment
Hide comment
@giampaolo

giampaolo Jul 10, 2016

Owner

I completed the linux implementation but I still have to benchmark it
properly. All other platform implementations are still missing. It's gonna
take a while.
On Jul 10, 2016 1:20 PM, "Nicolas Hennion" notifications@github.com wrote:

Any head up on this enhancement ?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#799 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAplLE5TZUT2J0ENMsnRbgt6kc7j46wiks5qUNWPgaJpZM4H6kAZ
.

Owner

giampaolo commented Jul 10, 2016

I completed the linux implementation but I still have to benchmark it
properly. All other platform implementations are still missing. It's gonna
take a while.
On Jul 10, 2016 1:20 PM, "Nicolas Hennion" notifications@github.com wrote:

Any head up on this enhancement ?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#799 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAplLE5TZUT2J0ENMsnRbgt6kc7j46wiks5qUNWPgaJpZM4H6kAZ
.

@giampaolo

This comment has been minimized.

Show comment
Hide comment
@giampaolo

giampaolo Aug 2, 2016

Owner

Linux benchmark. With this I get a 2x speedup (twice as fast) if I involve all the "one shot" methods, meaning I am emulating the best possible scenario:

import psutil
import time


def doit(p):
    p.name()
    p.terminal()
    p.cpu_times()
    p.create_time()
    p.status()
    p.ppid()
    #
    p.num_ctx_switches()
    p.num_threads()
    p.uids()
    p.gids()


p = psutil.Process()

t = time.time()
for x in range(1000):
    doit(p)
print("normal:  %f" % (time.time() - t))

t = time.time()
for x in range(1000):
    with p.oneshot():
        doit(p)
print("oneshot: %f" % (time.time() - t))

Output:

normal:  0.189042
oneshot: 0.097632
Owner

giampaolo commented Aug 2, 2016

Linux benchmark. With this I get a 2x speedup (twice as fast) if I involve all the "one shot" methods, meaning I am emulating the best possible scenario:

import psutil
import time


def doit(p):
    p.name()
    p.terminal()
    p.cpu_times()
    p.create_time()
    p.status()
    p.ppid()
    #
    p.num_ctx_switches()
    p.num_threads()
    p.uids()
    p.gids()


p = psutil.Process()

t = time.time()
for x in range(1000):
    doit(p)
print("normal:  %f" % (time.time() - t))

t = time.time()
for x in range(1000):
    with p.oneshot():
        doit(p)
print("oneshot: %f" % (time.time() - t))

Output:

normal:  0.189042
oneshot: 0.097632
@giampaolo

This comment has been minimized.

Show comment
Hide comment
@giampaolo

giampaolo Aug 2, 2016

Owner

FreeBSD impact deriving from getting multiple (14) info when only 1 is needed is negligible 0.46 secs vs. 0.42, so even when NOT using oneshot() and retrieving a single process info does not slow things down.

Owner

giampaolo commented Aug 2, 2016

FreeBSD impact deriving from getting multiple (14) info when only 1 is needed is negligible 0.46 secs vs. 0.42, so even when NOT using oneshot() and retrieving a single process info does not slow things down.

giampaolo added a commit that referenced this issue Aug 3, 2016

@giampaolo

This comment has been minimized.

Show comment
Hide comment
@giampaolo

giampaolo Aug 3, 2016

Owner

Linux speedup went from 1.9x to 2.6x after f851be9.

Owner

giampaolo commented Aug 3, 2016

Linux speedup went from 1.9x to 2.6x after f851be9.

giampaolo added a commit that referenced this issue Aug 3, 2016

giampaolo added a commit that referenced this issue Aug 3, 2016

giampaolo added a commit that referenced this issue Aug 3, 2016

@giampaolo

This comment has been minimized.

Show comment
Hide comment
@giampaolo

giampaolo Aug 3, 2016

Owner

BSD platforms implementation is completed. On FreeBSD I get a +2.18x speedup.
I also added a benchmark script here: https://github.com/giampaolo/psutil/blob/oneshot/scripts/internal/bench_oneshot.py.

Owner

giampaolo commented Aug 3, 2016

BSD platforms implementation is completed. On FreeBSD I get a +2.18x speedup.
I also added a benchmark script here: https://github.com/giampaolo/psutil/blob/oneshot/scripts/internal/bench_oneshot.py.

@nicolargo

This comment has been minimized.

Show comment
Hide comment
@nicolargo

nicolargo Aug 4, 2016

Contributor
Contributor

nicolargo commented Aug 4, 2016

@giampaolo

This comment has been minimized.

Show comment
Hide comment
@giampaolo

giampaolo Aug 4, 2016

Owner

Yes, this is intended for all OSes, even though Windows is probably gonna be the most difficult platform because it has less C APIs which can be used to directly retrieve multiple info in one shot.
For instance, BSD is the exact opposite, as in one shot you get a whole blob of stuff:

#ifdef __FreeBSD__

The only Windows C call I can think of that is being used basically all the time on Windows is OpenProcess.
We use a wrapper around it:

...which is extensively used in the main C extension module:

~/svn/psutil {master}$ grep psutil_handle psutil/_psutil_windows.c | wc -l
16

What we can do is get the handle once, store it in Python (as an int), then pass it back to the C extension as an argument, and do this as long as we're in the oneshot context (then on __exit__ we're gonna "CloseHandle() it"). The methods involved should be (at least): cpu_times(), create_time(), memory_info(), nice(), io_counters(), cpu_affinity(), num_handles() and memory_maps(). So yes, also on Windows there's a lot of space for speeding things up quite a bit.

Owner

giampaolo commented Aug 4, 2016

Yes, this is intended for all OSes, even though Windows is probably gonna be the most difficult platform because it has less C APIs which can be used to directly retrieve multiple info in one shot.
For instance, BSD is the exact opposite, as in one shot you get a whole blob of stuff:

#ifdef __FreeBSD__

The only Windows C call I can think of that is being used basically all the time on Windows is OpenProcess.
We use a wrapper around it:

...which is extensively used in the main C extension module:

~/svn/psutil {master}$ grep psutil_handle psutil/_psutil_windows.c | wc -l
16

What we can do is get the handle once, store it in Python (as an int), then pass it back to the C extension as an argument, and do this as long as we're in the oneshot context (then on __exit__ we're gonna "CloseHandle() it"). The methods involved should be (at least): cpu_times(), create_time(), memory_info(), nice(), io_counters(), cpu_affinity(), num_handles() and memory_maps(). So yes, also on Windows there's a lot of space for speeding things up quite a bit.

giampaolo added a commit that referenced this issue Aug 5, 2016

@giampaolo

This comment has been minimized.

Show comment
Hide comment
@giampaolo

giampaolo Aug 5, 2016

Owner

Solaris implementation 630b40d +1.37x speedup.

Owner

giampaolo commented Aug 5, 2016

Solaris implementation 630b40d +1.37x speedup.

giampaolo added a commit that referenced this issue Aug 6, 2016

#799, oneshot(), windows: expose C functions to OpenProcess and Clos…
…eHandle in order to keep the handle reference at Python level and allow caching

giampaolo added a commit that referenced this issue Aug 6, 2016

giampaolo added a commit that referenced this issue Aug 6, 2016

@giampaolo

This comment has been minimized.

Show comment
Hide comment
@giampaolo

giampaolo Aug 8, 2016

Owner

It turns out storing OpenProcess handle in Python is slower than retrieving it in C every time. I experimented with this in here:
oneshot...oneshot-winfiles_bucket
...and I get a -1.5x slowdown. As such Windows is apparently the only platform which cannot take advantage of this.

Owner

giampaolo commented Aug 8, 2016

It turns out storing OpenProcess handle in Python is slower than retrieving it in C every time. I experimented with this in here:
oneshot...oneshot-winfiles_bucket
...and I get a -1.5x slowdown. As such Windows is apparently the only platform which cannot take advantage of this.

giampaolo added a commit that referenced this issue Aug 23, 2016

giampaolo added a commit that referenced this issue Aug 23, 2016

giampaolo added a commit that referenced this issue Oct 7, 2016

giampaolo added a commit that referenced this issue Oct 7, 2016

@giampaolo

This comment has been minimized.

Show comment
Hide comment
@giampaolo

giampaolo Oct 7, 2016

Owner

OSX implemented as of 7b2a6b3 and cf21849. The speedup is 1.8x! Unless I'm missing something else we should be done with all platforms.

Owner

giampaolo commented Oct 7, 2016

OSX implemented as of 7b2a6b3 and cf21849. The speedup is 1.8x! Unless I'm missing something else we should be done with all platforms.

@nicolargo

This comment has been minimized.

Show comment
Hide comment
@nicolargo

nicolargo Oct 7, 2016

Contributor

Good news @giampaolo !

Contributor

nicolargo commented Oct 7, 2016

Good news @giampaolo !

giampaolo added a commit that referenced this issue Oct 7, 2016

@giampaolo

This comment has been minimized.

Show comment
Hide comment
@giampaolo

giampaolo Oct 7, 2016

Owner

OSX: going from 1.8 to 1.9 speedup with 1e8cef9.

Owner

giampaolo commented Oct 7, 2016

OSX: going from 1.8 to 1.9 speedup with 1e8cef9.

giampaolo added a commit that referenced this issue Oct 8, 2016

giampaolo added a commit that referenced this issue Oct 28, 2016

@giampaolo

This comment has been minimized.

Show comment
Hide comment
@giampaolo

giampaolo Oct 28, 2016

Owner

It turns out the apparent slowdown occurring on Windows as per my previous message #799 (comment) was due to the benchmark script not being stable enough, so we're good also on Windows.
https://github.com/giampaolo/psutil/blob/7f51f0074b6d727a01fea0290ed0988dd51ad288/scripts/internal/bench_oneshot_2.py script relying on perf module shows a +1.2x speedup.
With c10a7aa and 3efb6bf I went from +1.2x to +1.8x.

Owner

giampaolo commented Oct 28, 2016

It turns out the apparent slowdown occurring on Windows as per my previous message #799 (comment) was due to the benchmark script not being stable enough, so we're good also on Windows.
https://github.com/giampaolo/psutil/blob/7f51f0074b6d727a01fea0290ed0988dd51ad288/scripts/internal/bench_oneshot_2.py script relying on perf module shows a +1.2x speedup.
With c10a7aa and 3efb6bf I went from +1.2x to +1.8x.

@giampaolo

This comment has been minimized.

Show comment
Hide comment
@giampaolo

giampaolo Oct 28, 2016

Owner

The interesting thing about Windows is that because some Process methods use a dual implementation (see #304) we can get a way bigger speedup for PIDs owned by other users, for which the first "fast" implementation raises AccessDenied.
On a high-privileged PID by using oneshot() I am now getting an awesome +6.3x speedup!

Owner

giampaolo commented Oct 28, 2016

The interesting thing about Windows is that because some Process methods use a dual implementation (see #304) we can get a way bigger speedup for PIDs owned by other users, for which the first "fast" implementation raises AccessDenied.
On a high-privileged PID by using oneshot() I am now getting an awesome +6.3x speedup!

@giampaolo

This comment has been minimized.

Show comment
Hide comment
@giampaolo

giampaolo Nov 5, 2016

Owner

OK, this is now merged into master as of de41bcc.

Owner

giampaolo commented Nov 5, 2016

OK, this is now merged into master as of de41bcc.

@giampaolo giampaolo closed this Nov 5, 2016

@nicolargo

This comment has been minimized.

Show comment
Hide comment
@nicolargo

nicolargo Nov 5, 2016

Contributor

Great job @giampaolo !

Many thanks.

Contributor

nicolargo commented Nov 5, 2016

Great job @giampaolo !

Many thanks.

@nicolargo nicolargo referenced this issue in nicolargo/glances Nov 5, 2016

Closed

Relatively high CPU usage #519

@suzaku

This comment has been minimized.

Show comment
Hide comment
@suzaku

suzaku Nov 7, 2016

Great job!

suzaku commented Nov 7, 2016

Great job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment