[cachetop] top-like cachestat #615
|
Hey Can you provide an example file and man page also. Al
|
|
@mcaleavya will do |
|
Tested, looks awesome! Also, this is the first tool that uses curses -- and I think it's very appropriate here. This is also the first tool that uses A couple of questions regarding the UI:
|
I am just using default curse here. The only part of the initialization done is through the wrapper: http://svn.python.org/projects/python/branches/py3k/Lib/curses/wrapper.py
yes, I will fix that. I am currently just checking the height available but should definitely also check the width. I have a local change that fit perfectly in 80chars, even less after I removed unecesary white-spaces. |
|
Might just be my weird system, but I get a segfault whenever I run the tool. Looks like it's in libcurses:
|
| + misses = 0 | ||
| + rhits = 0 | ||
| + whits = 0 | ||
| + | ||
| + counts = bpf.get_table("counts") | ||
| + stats = defaultdict(lambda: defaultdict(int)) | ||
| + for k, v in counts.items(): | ||
| + stats["%d-%d-%s" % (k.pid, k.uid, k.comm)][k.ip] = v.value | ||
| + stats_list = [] | ||
| + | ||
| + for pid, count in sorted(stats.items(), key=lambda stat: stat[0]): | ||
| + for k, v in count.items(): | ||
| + if re.match('mark_page_accessed', bpf.ksym(k)) is not None: | ||
| + mpa = v | ||
| + if mpa < 0: | ||
| + mpa = 0 |
|
markdrayton
We could make these lines more succinct, e.g.:
though since this code is also in cachestat.py we might want to commit as-is then change both at once.
4ast
Agree with Mark that this piece can be more python-ish. |
|
This is great! A few suggestions:
|
yeah, this is how I had it originally. I was not the most
Definitely, but I thought of keeping this for a next iteration and keep this first-pass diff as simple as it is. |
@brendangregg what do you run? There is a good chance I am not handling edge cases, that being said it should except within python rather than segfault :) |
|
It would also be good to have an option to not have the tool clear the screen with maybe a timestamp so we can look at what process were using the cache historically. |
|
@mcaleavya: That definitely sounds like something that can be added in v2 :-) |
@mcaleavya do you mean something like top in batch mode? I thought about something along the line of keeping a ring buffer with stats and being able to navigate back and forth in time, that would be from within the current run though. |
|
@chantra yes that would be good, even if we can also have a timestamp at the top next to buffers etc then that would make finding things easier when looking at historical output. I had in mind something similar to solaris prstat -c which prints reports below rather than clearing the screen. looking forward to the next update. Cheers |
|
@brendangregg I am also getting a curses segfault when running ubuntu 16.04 from https://atlas.hashicorp.com/bento/boxes/ubuntu-16.04 with 4.6 from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.6-yakkety/ As much as I can tell, this only reproduce when using bcc module and only once BPF code is loaded using
|
|
redirecting stderr make this work.... @brendangregg does that work for you too?
|
|
@chantra yes, that works, thanks! I'd copy-n-paste a screenshot to prove it, but it cleared the screen. Need -C, --no-clear-screen :-) Pretty great to see a ncurses example anyway. |
that's seems harder than it should with python's ncurses. Looks like I would have to reimplement what |
|
I was trying to debug the segfault without much luck. Do you know of a simple example of Python curses wrapper to try? I have some simple examples of Python curses, but they work, so I'm guessing it might be related to wrapper. |
|
@brendangregg, it is actually |
|
I'd updated some things including bcc, and my segfault stack is now:
|
|
oh, I was messing with |cat, but that's likely an unrelated error... |
|
down the rabbit hole I found after installing: http://packages.ubuntu.com/xenial/libncurses5-dbg
Attempting a quick local patch of |
|
I see you're still in doupdate(). I'm now always in _nc_keypad()... I added libncurses5-dbg, but needed to also add the source in /build to make gdb happy. I noticed that that URL has the source tarball for ncurses-6.0+20160213, but not ncurses-6.0-20160213 -- I don't know if there's any difference between the "+" and "-"... |
|
I think it's because (include/term.h):
and
So we're both hitting the same problem: cur_term is NULL. And of course that's the type of segfault I'm in:
|
|
cur_term is set by set_curterm() (at least), which is called with NULL before the crash:
Using bcc to debug bcc. :) |
|
... need to add a "-s" option to trace.py for printing stack traces. |
|
using gdb instead:
|
|
Now in llvm... so FileDescriptorHasColors() calls terminalHasColors(), which: static bool terminalHasColors(int fd) {
[...]
// Now extract the structure allocated by setupterm and free its memory
// through a really silly dance.
struct term *termp = set_curterm((struct term *)nullptr);
(void)del_curterm(termp); // Drop any errors here.Which seems like it would trash cur_term, and leave it set to NULL... |
|
Here's what that llvm code used to be: static bool terminalHasColors() {
if (const char *term = std::getenv("TERM")) {
// Most modern terminals support ANSI escape sequences for colors.
// We could check terminfo, or have a list of known terms that support
// colors, but that would be overkill.
// The user can always ask for no colors by setting TERM to dumb, or
// using a commandline flag.
return strcmp(term, "dumb") != 0;
}
return false;
}And the change was llvm-mirror/llvm@d485e7b#diff-a4fb6575a290937bc9142e3d7efc8989 |
|
wow. great sleuthing! |
as expected, this crashed just a little further down the road :) |
|
I had it crash on me but rebuild my Vm. Can't remember if I had built to On 22 Jul 2016 09:42, "chantra" notifications@github.com wrote:
|
|
@4ast work around is to redirect stderr to /dev/null. Alternatively... I can add this patch: diff --git a/tools/cachetop.py b/tools/cachetop.py
index b1ea9a6..e18e5c5 100755
--- a/tools/cachetop.py
+++ b/tools/cachetop.py
@@ -21,6 +21,7 @@ from bcc import BPF
import argparse
import curses
+import os
import pwd
import re
import signal
@@ -252,4 +253,7 @@ def parse_arguments():
return args
args = parse_arguments()
+# hacky hacky
+nullfd = os.open('/dev/null', os.O_WRONLY)
+os.dup2(nullfd, 2)
curses.wrapper(handle_loop, args) |
|
would passing -fno-color-diagnostics to clang solve it as well? |
|
stderr didn't work for me. Where would I add -fno-color-diagnostics? |
|
something like this:
completely untested. |
|
Thanks, it gets further:
(Which I also saw earlier when running it from strace...) I can fix it using: b.attach_kprobe(event=str("add_to_page_cache_lru"), fn_name="do_count")
b.attach_kprobe(event=str("mark_page_accessed"), fn_name="do_count")
b.attach_kprobe(event=str("account_page_dirtied"), fn_name="do_count")
b.attach_kprobe(event=str("mark_buffer_dirty"), fn_name="do_count")Or by deleting:
@chantra you can pick which makes most sense. :) Also, if you update loader.cc, please put a comment in there somewhere like:
Since at some point in the future we might be able to remove it. |
|
The assert is mine. Removing the unicode import is a good bet for now. I'll make it all better in #623. |
|
ok, compiled bcc with This is now working for me without redirecting stderr. |
|
Tested, works, thanks! I'll merge it. A couple of possible follow-ons for later:
|
| + Return a tuple containing: | ||
| + buffer | ||
| + cached | ||
| + list of tuple with per process cache stats | ||
| + ''' | ||
| + rtaccess = 0 | ||
| + wtaccess = 0 | ||
| + mpa = 0 | ||
| + mbd = 0 | ||
| + apcl = 0 | ||
| + apd = 0 | ||
| + access = 0 | ||
| + misses = 0 | ||
| + rhits = 0 | ||
| + whits = 0 | ||
| + |
|
chantra
As I was refactoring stuff between |
Alike cachestat.py but providing cache stats at the process level.