-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cachetop] top-like cachestat #615
Conversation
Hey Can you provide an example file and man page also. Al
|
@mcaleavya will do |
Tested, looks awesome! Also, this is the first tool that uses curses -- and I think it's very appropriate here. This is also the first tool that uses A couple of questions regarding the UI:
|
I am just using default curse here. The only part of the initialization done is through the wrapper: http://svn.python.org/projects/python/branches/py3k/Lib/curses/wrapper.py
yes, I will fix that. I am currently just checking the height available but should definitely also check the width. I have a local change that fit perfectly in 80chars, even less after I removed unecesary white-spaces. |
Might just be my weird system, but I get a segfault whenever I run the tool. Looks like it's in libcurses:
|
if re.match('mark_page_accessed', bpf.ksym(k)) is not None: | ||
mpa = v | ||
if mpa < 0: | ||
mpa = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make these lines more succinct, e.g.:
mpa = v if mpa > 0 else 0
though since this code is also in cachestat.py we might want to commit as-is then change both at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact I guess:
mpa = max(0, v)
is the ultimate in brevity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Mark that this piece can be more python-ish.
The rest looks good to me.
This is great! A few suggestions:
|
yeah, this is how I had it originally. I was not the most
Definitely, but I thought of keeping this for a next iteration and keep this first-pass diff as simple as it is. |
@brendangregg what do you run? There is a good chance I am not handling edge cases, that being said it should except within python rather than segfault :) |
It would also be good to have an option to not have the tool clear the screen with maybe a timestamp so we can look at what process were using the cache historically. |
@mcaleavya: That definitely sounds like something that can be added in v2 :-) |
@mcaleavya do you mean something like top in batch mode? I thought about something along the line of keeping a ring buffer with stats and being able to navigate back and forth in time, that would be from within the current run though. |
@chantra yes that would be good, even if we can also have a timestamp at the top next to buffers etc then that would make finding things easier when looking at historical output. I had in mind something similar to solaris prstat -c which prints reports below rather than clearing the screen. looking forward to the next update. Cheers |
@brendangregg I am also getting a curses segfault when running ubuntu 16.04 from https://atlas.hashicorp.com/bento/boxes/ubuntu-16.04 with 4.6 from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.6-yakkety/ As much as I can tell, this only reproduce when using bcc module and only once BPF code is loaded using
|
redirecting stderr make this work.... @brendangregg does that work for you too?
|
@chantra yes, that works, thanks! I'd copy-n-paste a screenshot to prove it, but it cleared the screen. Need -C, --no-clear-screen :-) Pretty great to see a ncurses example anyway. |
that's seems harder than it should with python's ncurses. Looks like I would have to reimplement what |
I was trying to debug the segfault without much luck. Do you know of a simple example of Python curses wrapper to try? I have some simple examples of Python curses, but they work, so I'm guessing it might be related to wrapper. |
@brendangregg, it is actually |
I'd updated some things including bcc, and my segfault stack is now:
|
oh, I was messing with |cat, but that's likely an unrelated error... |
down the rabbit hole I found after installing: http://packages.ubuntu.com/xenial/libncurses5-dbg
Attempting a quick local patch of |
I see you're still in doupdate(). I'm now always in _nc_keypad()... I added libncurses5-dbg, but needed to also add the source in /build to make gdb happy. I noticed that that URL has the source tarball for ncurses-6.0+20160213, but not ncurses-6.0-20160213 -- I don't know if there's any difference between the "+" and "-"... |
I think it's because (include/term.h):
and
So we're both hitting the same problem: cur_term is NULL. And of course that's the type of segfault I'm in:
|
cur_term is set by set_curterm() (at least), which is called with NULL before the crash:
Using bcc to debug bcc. :) |
... need to add a "-s" option to trace.py for printing stack traces. |
using gdb instead:
|
Now in llvm... so FileDescriptorHasColors() calls terminalHasColors(), which: static bool terminalHasColors(int fd) {
[...]
// Now extract the structure allocated by setupterm and free its memory
// through a really silly dance.
struct term *termp = set_curterm((struct term *)nullptr);
(void)del_curterm(termp); // Drop any errors here. Which seems like it would trash cur_term, and leave it set to NULL... |
Here's what that llvm code used to be: static bool terminalHasColors() {
if (const char *term = std::getenv("TERM")) {
// Most modern terminals support ANSI escape sequences for colors.
// We could check terminfo, or have a list of known terms that support
// colors, but that would be overkill.
// The user can always ask for no colors by setting TERM to dumb, or
// using a commandline flag.
return strcmp(term, "dumb") != 0;
}
return false;
} And the change was llvm-mirror/llvm@d485e7b#diff-a4fb6575a290937bc9142e3d7efc8989 |
wow. great sleuthing! |
as expected, this crashed just a little further down the road :) |
I had it crash on me but rebuild my Vm. Can't remember if I had built to On 22 Jul 2016 09:42, "chantra" notifications@github.com wrote:
|
@4ast work around is to redirect stderr to /dev/null. Alternatively... I can add this patch: diff --git a/tools/cachetop.py b/tools/cachetop.py
index b1ea9a6..e18e5c5 100755
--- a/tools/cachetop.py
+++ b/tools/cachetop.py
@@ -21,6 +21,7 @@ from bcc import BPF
import argparse
import curses
+import os
import pwd
import re
import signal
@@ -252,4 +253,7 @@ def parse_arguments():
return args
args = parse_arguments()
+# hacky hacky
+nullfd = os.open('/dev/null', os.O_WRONLY)
+os.dup2(nullfd, 2)
curses.wrapper(handle_loop, args) |
would passing -fno-color-diagnostics to clang solve it as well? |
stderr didn't work for me. Where would I add -fno-color-diagnostics? |
something like this:
completely untested. |
Thanks, it gets further:
(Which I also saw earlier when running it from strace...) I can fix it using: b.attach_kprobe(event=str("add_to_page_cache_lru"), fn_name="do_count")
b.attach_kprobe(event=str("mark_page_accessed"), fn_name="do_count")
b.attach_kprobe(event=str("account_page_dirtied"), fn_name="do_count")
b.attach_kprobe(event=str("mark_buffer_dirty"), fn_name="do_count") Or by deleting:
@chantra you can pick which makes most sense. :) Also, if you update loader.cc, please put a comment in there somewhere like:
Since at some point in the future we might be able to remove it. |
The assert is mine. Removing the unicode import is a good bet for now. I'll make it all better in #623. |
ok, compiled bcc with This is now working for me without redirecting stderr. |
Alike cachestat.py but providing cache stats at the process level.
make interval a positional parameter.
* pass -fno-color-diagnostics to clang * remove unicode import (iovisor#623) * add time to cachetop output * add keybindings to cachetop.8 * add cachetop links to README.md
Tested, works, thanks! I'll merge it. A couple of possible follow-ons for later:
|
misses = 0 | ||
rhits = 0 | ||
whits = 0 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I was refactoring stuff between cachetop
and cachestats
I realized this should be within theloop on line 90.
The current logic was only initializing page accesses, mark dirty.. at the beginning of the method, preventing counters to be ever reset for each PIDs. Piggyback iovisor#615 (comment) Tested by running both tools manually.
The current logic was only initializing page accesses, mark dirty.. at the beginning of the method, preventing counters to be ever reset for each PIDs. Piggyback iovisor#615 (comment) Tested by running both tools manually.
The current logic was only initializing page accesses, mark dirty.. at the beginning of the method, preventing counters to be ever reset for each PIDs. Piggyback #615 (comment) Tested by running both tools manually.
The current logic was only initializing page accesses, mark dirty.. at the beginning of the method, preventing counters to be ever reset for each PIDs. Piggyback iovisor#615 (comment) Tested by running both tools manually.
The current logic was only initializing page accesses, mark dirty.. at the beginning of the method, preventing counters to be ever reset for each PIDs. Piggyback iovisor#615 (comment) Tested by running both tools manually.
Alike cachestat.py but providing cache stats at the process level.