[cachetop] top-like cachestat #615

chantra · 2016-07-14T20:15:28Z

Alike cachestat.py but providing cache stats at the process level.

mcaleavya · 2016-07-14T20:18:45Z

Hey

Can you provide an example file and man page also.

Al
On 14 Jul 2016 21:15, "chantra" notifications@github.com wrote:

Alike cachestat.py but providing cache stats at the process level.

You can view, comment on, or merge this pull request online at:

#615
Commit Summary

[cachetop] top-like cachestat

File Changes

A tools/cachetop.py
https://github.com/iovisor/bcc/pull/615/files#diff-0 (252)

Patch Links:

https://github.com/iovisor/bcc/pull/615.patch

https://github.com/iovisor/bcc/pull/615.diff

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#615, or mute the thread
https://github.com/notifications/unsubscribe/ADPbVHq8IRdkBK-JzmFn7WzARvW9-h4fks5qVpjkgaJpZM4JMzpt
.

chantra · 2016-07-14T20:24:22Z

@mcaleavya will do

goldshtn · 2016-07-15T08:13:23Z

Tested, looks awesome! Also, this is the first tool that uses curses -- and I think it's very appropriate here. This is also the first tool that uses bpf_get_current_uid_gid.

A couple of questions regarding the UI:

Among my other sins, I use a black on white terminal. Is there a way to get the curses UI to respect that? Right now I'm getting grey on black.
With terminal width = 80 the text doesn't quite fit on the screen. Could you possibly reduce the column widths slightly so that it fits?

chantra · 2016-07-15T17:59:57Z

Among my other sins, I use a black on white terminal. Is there a way to get the curses UI to respect that? Right now I'm getting grey on black.

I am just using default curse here. The only part of the initialization done is through the wrapper: http://svn.python.org/projects/python/branches/py3k/Lib/curses/wrapper.py
I guess this is a matter of detecting the terminal's color pair and initializing them properly, however, I would keep this for a separate diff though as this would actually be better of in a separate lib file so other tools could import and reuse the code without going through writing the same boiler plater and hitting the same issues.

With terminal width = 80 the text doesn't quite fit on the screen. Could you possibly reduce the column widths slightly so that it fits?

yes, I will fix that. I am currently just checking the height available but should definitely also check the width. I have a local change that fit perfectly in 80chars, even less after I removed unecesary white-spaces.

brendangregg · 2016-07-15T22:45:21Z

Might just be my weird system, but I get a segfault whenever I run the tool. Looks like it's in libcurses:

Program received signal SIGSEGV, Segmentation fault.
         0x00007ffff34ae40d in doupdate () from /lib/x86_64-linux-gnu/libncursesw.so.5
(gdb) bt
#0  0x00007ffff34ae40d in doupdate () from /lib/x86_64-linux-gnu/libncursesw.so.5
#1  0x00007ffff34a27e6 in wrefresh () from /lib/x86_64-linux-gnu/libncursesw.so.5
#2  0x00007ffff349b616 in ?? () from /lib/x86_64-linux-gnu/libncursesw.so.5
#3  0x00007ffff349c325 in wgetch () from /lib/x86_64-linux-gnu/libncursesw.so.5
#4  0x00007ffff36c8ec3 in ?? () from /usr/lib/python2.7/lib-dynload/_curses.x86_64-linux-gnu.so
#5  0x00000000004c4d5a in PyEval_EvalFrameEx ()
#6  0x00000000004c2e05 in PyEval_EvalCodeEx ()
#7  0x00000000004def08 in ?? ()
#8  0x00000000004b1153 in PyObject_Call ()
#9  0x00000000004c73ec in PyEval_EvalFrameEx ()
#10 0x00000000004c2e05 in PyEval_EvalCodeEx ()
#11 0x00000000004caf42 in PyEval_EvalFrameEx ()
#12 0x00000000004c2e05 in PyEval_EvalCodeEx ()
#13 0x00000000004c2ba9 in PyEval_EvalCode ()
#14 0x00000000004f20ef in ?? ()
#15 0x00000000004eca72 in PyRun_FileExFlags ()
#16 0x00000000004eb1f1 in PyRun_SimpleFileExFlags ()
#17 0x000000000049e18a in Py_Main ()
#18 0x00007ffff7811830 in __libc_start_main (main=0x49daf0 <main>, argc=2, argv=0x7fffffffdfc8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffffffdfb8) at ../csu/libc-start.c:291
#19 0x000000000049da19 in _start ()

markdrayton · 2016-07-16T00:33:08Z

tools/cachetop.py

+            if re.match('mark_page_accessed', bpf.ksym(k)) is not None:
+                mpa = v
+                if mpa < 0:
+                    mpa = 0


We could make these lines more succinct, e.g.:

mpa = v if mpa > 0 else 0

though since this code is also in cachestat.py we might want to commit as-is then change both at once.

In fact I guess:

mpa = max(0, v)

is the ultimate in brevity.

Agree with Mark that this piece can be more python-ish.
The rest looks good to me.

markdrayton · 2016-07-16T00:46:14Z

This is great! A few suggestions:

pass the interval as a positional argument. Many other interval-based tools (vmstat, cachestat, memleak) are invoked as toolname [interval [count]] so this would be more consistent.
can you add something to indicate what the current sort column is? After a few < and > I can't remember which column the output is sorted by :)

chantra · 2016-07-16T10:09:41Z

pass the interval as a positional argument. Many other interval-based tools (vmstat, cachestat, memleak) are invoked as toolname [interval [count]] so this would be more consistent.

yeah, this is how I had it originally. I was not the most argparse friendly and I would have had to handle the arguments. Also, I retired the count part as it was less meaningful with a top interface (unless batching is handle, in which case it could be reintroduced).
It seems the *top ecosystem has not agreed on a standard way either :)

can you add something to indicate what the current sort column is? After a few < and > I can't remember which column the output is sorted by :)

Definitely, but I thought of keeping this for a next iteration and keep this first-pass diff as simple as it is.

chantra · 2016-07-16T10:11:26Z

Might just be my weird system, but I get a segfault whenever I run the tool.

@brendangregg what do you run? There is a good chance I am not handling edge cases, that being said it should except within python rather than segfault :)

mcaleavya · 2016-07-16T12:52:14Z

It would also be good to have an option to not have the tool clear the screen with maybe a timestamp so we can look at what process were using the cache historically.

goldshtn · 2016-07-16T12:59:14Z

@mcaleavya: That definitely sounds like something that can be added in v2 :-)

chantra · 2016-07-16T15:41:24Z

It would also be good to have an option to not have the tool clear the screen with maybe a timestamp so we can look at what process were using the cache historically.

@mcaleavya do you mean something like top in batch mode?

I thought about something along the line of keeping a ring buffer with stats and being able to navigate back and forth in time, that would be from within the current run though.
But yes, definitely not for this first pass :)

mcaleavya · 2016-07-16T16:30:25Z

@chantra yes that would be good, even if we can also have a timestamp at the top next to buffers etc then that would make finding things easier when looking at historical output. I had in mind something similar to solaris prstat -c which prints reports below rather than clearing the screen. looking forward to the next update.

Cheers
Al

chantra · 2016-07-18T06:46:42Z

@brendangregg I am also getting a curses segfault when running ubuntu 16.04 from https://atlas.hashicorp.com/bento/boxes/ubuntu-16.04 with 4.6 from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.6-yakkety/

As much as I can tell, this only reproduce when using bcc module and only once BPF code is loaded using b = BPF(text=bpf_text) and the first interaction with curses happen (s = stdscr.getch() here.)

(gdb) bt
#0  0x00007ffff307740d in doupdate () from /lib/x86_64-linux-gnu/libncursesw.so.5
#1  0x00007ffff306b7e6 in wrefresh () from /lib/x86_64-linux-gnu/libncursesw.so.5
#2  0x00007ffff3064616 in ?? () from /lib/x86_64-linux-gnu/libncursesw.so.5
#3  0x00007ffff3065325 in wgetch () from /lib/x86_64-linux-gnu/libncursesw.so.5
#4  0x00007ffff3291ec3 in PyCursesWindow_GetCh.lto_priv.109 (self=0x7ffff7f7d198, args=())
    at /build/python2.7-M_HuAO/python2.7-2.7.12/Modules/_cursesmodule.c:853
#5  0x00000000004c41da in call_function (oparg=<optimized out>, pp_stack=0x7fffffffde00) at ../Python/ceval.c:4350
#6  PyEval_EvalFrameEx () at ../Python/ceval.c:2987
#7  0x00000000004c22e5 in PyEval_EvalCodeEx () at ../Python/ceval.c:3582
#8  0x00000000004de9b8 in function_call.lto_priv () at ../Objects/funcobject.c:523
#9  0x00000000004b0de3 in PyObject_Call () at ../Objects/abstract.c:2546
#10 0x00000000004c68e3 in ext_do_call (nk=<optimized out>, na=<optimized out>, flags=<optimized out>, pp_stack=0x7fffffffe0b8,
    func=<function at remote 0x7ffff34b3d70>) at ../Python/ceval.c:4664
#11 PyEval_EvalFrameEx () at ../Python/ceval.c:3026
#12 0x00000000004c22e5 in PyEval_EvalCodeEx () at ../Python/ceval.c:3582
#13 0x00000000004ca3d2 in fast_function (nk=0, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffe2c0,
    func=<function at remote 0x7ffff360e230>) at ../Python/ceval.c:4445
#14 call_function (oparg=<optimized out>, pp_stack=0x7fffffffe2c0) at ../Python/ceval.c:4370
#15 PyEval_EvalFrameEx () at ../Python/ceval.c:2987
#16 0x00000000004c22e5 in PyEval_EvalCodeEx () at ../Python/ceval.c:3582
#17 0x00000000004c2089 in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at ../Python/ceval.c:669
#18 0x00000000004f1e6f in run_mod.lto_priv () at ../Python/pythonrun.c:1376
#19 0x00000000004ec802 in PyRun_FileExFlags () at ../Python/pythonrun.c:1362
#20 0x00000000004eaf81 in PyRun_SimpleFileExFlags () at ../Python/pythonrun.c:948
#21 0x000000000049e36a in Py_Main () at ../Modules/main.c:640
#22 0x00007ffff7811830 in __libc_start_main (main=0x49dcd0 <main>, argc=2, argv=0x7fffffffe708, init=<optimized out>,
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe6f8) at ../csu/libc-start.c:291
#23 0x000000000049dbf9 in _start ()

chantra · 2016-07-18T22:30:35Z

redirecting stderr make this work.... @brendangregg does that work for you too?

./tools/cachetop.py -i 1 2> /dev/null

brendangregg · 2016-07-18T23:21:55Z

@chantra yes, that works, thanks!

I'd copy-n-paste a screenshot to prove it, but it cleared the screen. Need -C, --no-clear-screen :-)

Pretty great to see a ncurses example anyway.

chantra · 2016-07-19T16:48:51Z

I'd copy-n-paste a screenshot to prove it, but it cleared the screen. Need -C, --no-clear-screen :-)

that's seems harder than it should with python's ncurses. Looks like I would have to reimplement what endwin does.

brendangregg · 2016-07-20T23:04:26Z

I was trying to debug the segfault without much luck. Do you know of a simple example of Python curses wrapper to try? I have some simple examples of Python curses, but they work, so I'm guessing it might be related to wrapper.

chantra · 2016-07-20T23:36:04Z

@brendangregg, it is actually BPF(text) that triggers the segfault. I did not troubleshoot it any further than that, but without initializing BPF, I dont repro the issue.
I should have said that in my comment before, sorry for that.

brendangregg · 2016-07-21T20:43:19Z

I'd updated some things including bcc, and my segfault stack is now:

Program received signal SIGSEGV, Segmentation fault.
  0x00007ffff3c7855c in _nc_keypad (sp=0xb00590, flag=0) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_options.c:362
362         (void) NCURSES_PUTP2_FLUSH("keypad_xmit", keypad_xmit);
(gdb) bt
#0  0x00007ffff3c7855c in _nc_keypad (sp=0xb00590, flag=0) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_options.c:362
#1  0x00007ffff36cace5 in ?? () from /usr/lib/python2.7/lib-dynload/_curses.x86_64-linux-gnu.so
#2  0x00000000004c4d5a in PyEval_EvalFrameEx ()
#3  0x00000000004c2e05 in PyEval_EvalCodeEx ()
#4  0x00000000004caf42 in PyEval_EvalFrameEx ()
#5  0x00000000004c2e05 in PyEval_EvalCodeEx ()
#6  0x00000000004c2ba9 in PyEval_EvalCode ()
#7  0x00000000004f20ef in ?? ()
#8  0x00000000004eca72 in PyRun_FileExFlags ()
#9  0x00000000004eb1f1 in PyRun_SimpleFileExFlags ()
#10 0x000000000049e18a in Py_Main ()
#11 0x00007ffff7811830 in __libc_start_main (main=0x49daf0 <main>, argc=2, argv=0x7fffffffdfc8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffffffdfb8) at ../csu/libc-start.c:291
#12 0x000000000049da19 in _start ()

brendangregg · 2016-07-21T21:21:42Z

oh, I was messing with |cat, but that's likely an unrelated error...

chantra · 2016-07-21T21:39:22Z

down the rabbit hole I found after installing: http://packages.ubuntu.com/xenial/libncurses5-dbg

(gdb) bt
#0  ClrBlank (win=0xb095c0) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tty/tty_update.c:1129
#1  ClrUpdate () at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tty/tty_update.c:1147
#2  doupdate () at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tty/tty_update.c:1010
#3  0x00007ffff306b7e6 in wrefresh (win=win@entry=0xb095c0)
    at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/base/lib_refresh.c:65
#4  0x00007ffff3064499 in recur_wrefresh (win=win@entry=0xb095c0)
    at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/base/lib_getch.c:384
#5  0x00007ffff3064616 in _nc_wgetch (win=win@entry=0xb095c0, result=result@entry=0x7fffffffdcf4, use_meta=1)
    at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/base/lib_getch.c:491
#6  0x00007ffff3065325 in wgetch (win=0xb095c0)
    at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/base/lib_getch.c:672
#7  0x00007ffff3291ec3 in PyCursesWindow_GetCh.lto_priv.109 (self=0x7ffff7f7d1b0, args=())
    at /build/python2.7-M_HuAO/python2.7-2.7.12/Modules/_cursesmodule.c:853
#8  0x00000000004c41da in call_function (oparg=<optimized out>, pp_stack=0x7fffffffde00) at ../Python/ceval.c:435
#9  PyEval_EvalFrameEx () at ../Python/ceval.c:2987
#10 0x00000000004c22e5 in PyEval_EvalCodeEx () at ../Python/ceval.c:3582
#11 0x00000000004de9b8 in function_call.lto_priv () at ../Objects/funcobject.c:523
#12 0x00000000004b0de3 in PyObject_Call () at ../Objects/abstract.c:2546
#13 0x00000000004c68e3 in ext_do_call (nk=<optimized out>, na=<optimized out>, flags=<optimized out>,
    pp_stack=0x7fffffffe0b8, func=<function at remote 0x7ffff34b2d70>) at ../Python/ceval.c:4664
#14 PyEval_EvalFrameEx () at ../Python/ceval.c:3026
#15 0x00000000004c22e5 in PyEval_EvalCodeEx () at ../Python/ceval.c:3582
#16 0x00000000004ca3d2 in fast_function (nk=0, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffe2c0,
    func=<function at remote 0x7ffff360e230>) at ../Python/ceval.c:4445
#17 call_function (oparg=<optimized out>, pp_stack=0x7fffffffe2c0) at ../Python/ceval.c:4370
#18 PyEval_EvalFrameEx () at ../Python/ceval.c:2987
#19 0x00000000004c22e5 in PyEval_EvalCodeEx () at ../Python/ceval.c:3582
#20 0x00000000004c2089 in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>)
    at ../Python/ceval.c:669
#21 0x00000000004f1e6f in run_mod.lto_priv () at ../Python/pythonrun.c:1376
#22 0x00000000004ec802 in PyRun_FileExFlags () at ../Python/pythonrun.c:1362
#23 0x00000000004eaf81 in PyRun_SimpleFileExFlags () at ../Python/pythonrun.c:948
#24 0x000000000049e36a in Py_Main () at ../Modules/main.c:640
#25 0x00007ffff7811830 in __libc_start_main (main=0x49dcd0 <main>, argc=2, argv=0x7fffffffe708,
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe6f8)
    at ../csu/libc-start.c:291
#26 0x000000000049dbf9 in _start ()

ncurses/tty/tty_update.c

1124
1125 static NCURSES_INLINE NCURSES_CH_T
1126 ClrBlank(NCURSES_SP_DCLx WINDOW *win)
1127 {
1128     NCURSES_CH_T blank = blankchar;
1129     if (back_color_erase)
1130     AddAttr(blank, (AttrOf(BCE_BKGD(SP_PARM, win)) & BCE_ATTRS));
1131     return blank;
1132 }
1133

(gdb) print back_color_erase
No symbol "back_color_erase" in current context.

Attempting a quick local patch of tty_update.c and rebuilding the package to see what happens.

brendangregg · 2016-07-21T21:58:35Z

I see you're still in doupdate(). I'm now always in _nc_keypad()... I added libncurses5-dbg, but needed to also add the source in /build to make gdb happy. I noticed that that URL has the source tarball for ncurses-6.0+20160213, but not ncurses-6.0-20160213 -- I don't know if there's any difference between the "+" and "-"...

brendangregg · 2016-07-21T23:25:30Z

I think it's because (include/term.h):

#define CUR cur_term->type. 
[...]
#define back_color_erase               CUR Booleans[28]  
[...]
#define keypad_local                   CUR Strings[88]

and

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f055f29fb54 in _nc_keypad (sp=0x1c24af0, flag=0) at ../ncurses/./tinfo/lib_options.c:377
377             if (!flag && keypad_local) {
(gdb) print cur_term
$1 = (TERMINAL *) 0x0

So we're both hitting the same problem: cur_term is NULL.

And of course that's the type of segfault I'm in:

(gdb) disas _nc_keypad
Dump of assembler code for function _nc_keypad:
   0x00007ffff3c78540 <+0>: test   %rdi,%rdi
   0x00007ffff3c78543 <+3>: je     0x7ffff3c785bd <_nc_keypad+125>
   0x00007ffff3c78545 <+5>: push   %rbp
   0x00007ffff3c78546 <+6>: push   %rbx
   0x00007ffff3c78547 <+7>: mov    %esi,%ebp
   0x00007ffff3c78549 <+9>: mov    %rdi,%rbx
   0x00007ffff3c7854c <+12>:    sub    $0x8,%rsp
   0x00007ffff3c78550 <+16>:    mov    0x217a69(%rip),%rax        # 0x7ffff3e8ffc0
   0x00007ffff3c78557 <+23>:    test   %esi,%esi
   0x00007ffff3c78559 <+25>:    mov    (%rax),%rax
=> 0x00007ffff3c7855c <+28>:    mov    0x20(%rax),%rax
[...]
(gdb) i r
rax            0x0  0
[...]

brendangregg · 2016-07-21T23:27:22Z

cur_term is set by set_curterm() (at least), which is called with NULL before the crash:

# ./trace.py 'p:tinfo:set_curterm "%d", arg1'
TIME     PID    COMM         FUNC             -
01:00:20 31698  python       set_curterm      38018416
01:00:20 31698  python       set_curterm      38396640
01:00:20 31698  python       set_curterm      39624608
01:00:20 31698  python       set_curterm      0

Using bcc to debug bcc. :)

brendangregg · 2016-07-21T23:28:26Z

... need to add a "-s" option to trace.py for printing stack traces.

brendangregg · 2016-07-21T23:29:27Z

using gdb instead:

(gdb) b set_curterm
Function "set_curterm" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (set_curterm) pending.
(gdb) r cachetop.py
Starting program: /usr/bin/python cachetop.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, set_curterm (termp=termp@entry=0xa818c0) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
80  {
(gdb) c
Continuing.

Breakpoint 1, set_curterm (termp=termp@entry=0xab59f0) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
80  {
(gdb) c
Continuing.

Breakpoint 1, set_curterm (termp=termp@entry=0xbe2e80) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
80  {
(gdb) c
Continuing.

Breakpoint 1, set_curterm (termp=0x0) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
80  {
(gdb) bt
#0  set_curterm (termp=0x0) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
#1  0x00007ffff5a45df5 in llvm::sys::Process::FileDescriptorHasColors(int) () from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#2  0x00007ffff45cbb38 in clang::driver::tools::Clang::ConstructJob(clang::driver::Compilation&, clang::driver::JobAction const&, clang::driver::InputInfo const&, llvm::SmallVector<clang::driver::InputInfo, 4u> const&, llvm::opt::ArgList const&, char const*) const () from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#3  0x00007ffff4570f25 in clang::driver::Driver::BuildJobsForAction(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, char const*, bool, bool, char const*, clang::driver::InputInfo&) const () from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#4  0x00007ffff4571481 in clang::driver::Driver::BuildJobs(clang::driver::Compilation&) const () from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#5  0x00007ffff45731ca in clang::driver::Driver::BuildCompilation(llvm::ArrayRef<char const*>) () from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#6  0x00007ffff4397c5a in ebpf::ClangLoader::parse(std::unique_ptr<llvm::Module, std::default_delete<llvm::Module> >*, std::unique_ptr<std::vector<ebpf::TableDesc, std::allocator<ebpf::TableDesc> >, std::default_delete<std::vector<ebpf::TableDesc, std::allocator<ebpf::TableDesc> > > >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, char const**, int) () from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#7  0x00007ffff4345314 in ebpf::BPFModule::load_cfile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, char const**, int) ()
   from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#8  0x00007ffff434ae5e in ebpf::BPFModule::load_string(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const**, int) ()
   from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#9  0x00007ffff43440c8 in bpf_module_create_c_from_string () from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#10 0x00007ffff690ae40 in ffi_call_unix64 () from /usr/lib/x86_64-linux-gnu/libffi.so.6
#11 0x00007ffff690a8ab in ffi_call () from /usr/lib/x86_64-linux-gnu/libffi.so.6
#12 0x00007ffff6b1a68c in _ctypes_callproc () from /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so
#13 0x00007ffff6b1ed82 in ?? () from /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so
#14 0x00000000004b1153 in PyObject_Call ()
#15 0x00000000004ca5ca in PyEval_EvalFrameEx ()
#16 0x00000000004c2e05 in PyEval_EvalCodeEx ()
#17 0x00000000004def08 in ?? ()
#18 0x00000000004b1153 in PyObject_Call ()
#19 0x00000000004f4c3e in ?? ()
#20 0x00000000004b1153 in PyObject_Call ()
#21 0x00000000004f49b7 in ?? ()
#22 0x00000000004b6e2c in ?? ()
#23 0x00000000004b1153 in PyObject_Call ()
#24 0x00000000004ca5ca in PyEval_EvalFrameEx ()
#25 0x00000000004c2e05 in PyEval_EvalCodeEx ()
#26 0x00000000004def08 in ?? ()
#27 0x00000000004b1153 in PyObject_Call ()
#28 0x00000000004c73ec in PyEval_EvalFrameEx ()
#29 0x00000000004c2e05 in PyEval_EvalCodeEx ()
#30 0x00000000004caf42 in PyEval_EvalFrameEx ()
#31 0x00000000004c2e05 in PyEval_EvalCodeEx ()
#32 0x00000000004c2ba9 in PyEval_EvalCode ()
#33 0x00000000004f20ef in ?? ()
#34 0x00000000004eca72 in PyRun_FileExFlags ()
#35 0x00000000004eb1f1 in PyRun_SimpleFileExFlags ()
---Type <return> to continue, or q <return> to quit---
#36 0x000000000049e18a in Py_Main ()
#37 0x00007ffff7811830 in __libc_start_main (main=0x49daf0 <main>, argc=2, argv=0x7fffffffdfc8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffffffdfb8) at ../csu/libc-start.c:291
#38 0x000000000049da19 in _start ()

brendangregg · 2016-07-21T23:36:40Z

Now in llvm... so FileDescriptorHasColors() calls terminalHasColors(), which:

static bool terminalHasColors(int fd) {
[...]
  // Now extract the structure allocated by setupterm and free its memory
  // through a really silly dance.
  struct term *termp = set_curterm((struct term *)nullptr);
  (void)del_curterm(termp); // Drop any errors here.

Which seems like it would trash cur_term, and leave it set to NULL...

brendangregg · 2016-07-22T02:24:50Z

Here's what that llvm code used to be:

static bool terminalHasColors() {
  if (const char *term = std::getenv("TERM")) {
    // Most modern terminals support ANSI escape sequences for colors.
    // We could check terminfo, or have a list of known terms that support
    // colors, but that would be overkill.
    // The user can always ask for no colors by setting TERM to dumb, or
    // using a commandline flag.
    return strcmp(term, "dumb") != 0;
  }
  return false;
}

And the change was llvm-mirror/llvm@d485e7b#diff-a4fb6575a290937bc9142e3d7efc8989

4ast · 2016-07-22T02:27:45Z

wow. great sleuthing!
any ideas for workaround?

chantra · 2016-07-22T08:42:15Z

Attempting a quick local patch of tty_update.c and rebuilding the package to see what happens.

as expected, this crashed just a little further down the road :)

mcaleavya · 2016-07-22T09:10:52Z

I had it crash on me but rebuild my Vm. Can't remember if I had built to
rc7, but looks OK on rc6.

On 22 Jul 2016 09:42, "chantra" notifications@github.com wrote:

Attempting a quick local patch of tty_update.c and rebuilding the package
to see what happens.
as expected, this crashed just a little further down the road :)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#615 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADPbVL9aQTaOKlY3XL4IpRBcPS4vac6Zks5qYIJqgaJpZM4JMzpt
.

chantra · 2016-07-22T09:11:51Z

@4ast work around is to redirect stderr to /dev/null. Alternatively... I can add this patch:

diff --git a/tools/cachetop.py b/tools/cachetop.py
index b1ea9a6..e18e5c5 100755
--- a/tools/cachetop.py
+++ b/tools/cachetop.py
@@ -21,6 +21,7 @@ from bcc import BPF

 import argparse
 import curses
+import os
 import pwd
 import re
 import signal
@@ -252,4 +253,7 @@ def parse_arguments():
     return args

 args = parse_arguments()
+# hacky hacky
+nullfd = os.open('/dev/null', os.O_WRONLY)
+os.dup2(nullfd, 2)
 curses.wrapper(handle_loop, args)

4ast · 2016-07-22T16:33:36Z

would passing -fno-color-diagnostics to clang solve it as well?

brendangregg · 2016-07-22T17:36:34Z

stderr didn't work for me. Where would I add -fno-color-diagnostics?

4ast · 2016-07-22T17:42:55Z

something like this:

diff --git a/src/cc/frontends/clang/loader.cc b/src/cc/frontends/clang/loader.cc
index 4b0729df6b66..4c88406eb2e9 100644
--- a/src/cc/frontends/clang/loader.cc
+++ b/src/cc/frontends/clang/loader.cc
@@ -102,6 +102,7 @@ int ClangLoader::parse(unique_ptr<llvm::Module> *mod, unique_ptr<vector<TableDes
   vector<const char *> flags_cstr({"-O0", "-emit-llvm", "-I", dstack.cwd(),
                                    "-Wno-deprecated-declarations",
                                    "-Wno-gnu-variable-sized-type-not-at-end",
+                                   "-fno-color-diagnostics",
                                    "-x", "c", "-c", abs_file.c_str()});

   KBuildHelper kbuild_helper(kdir);

completely untested.

brendangregg · 2016-07-22T18:32:52Z

Thanks, it gets further:

# ./cachetop.py
Traceback (most recent call last):
  File "./cachetop.py", line 255, in <module>
    curses.wrapper(handle_loop, args)
  File "/usr/lib/python2.7/curses/wrapper.py", line 43, in wrapper
    return func(stdscr, *args, **kwds)
  File "./cachetop.py", line 177, in handle_loop
    b.attach_kprobe(event="add_to_page_cache_lru", fn_name="do_count")
  File "/usr/lib/python2.7/dist-packages/bcc/__init__.py", line 362, in attach_kprobe
    assert isinstance(event, str), "event must be a string"
AssertionError: event must be a string

(Which I also saw earlier when running it from strace...)

I can fix it using:

    b.attach_kprobe(event=str("add_to_page_cache_lru"), fn_name="do_count")
    b.attach_kprobe(event=str("mark_page_accessed"), fn_name="do_count")
    b.attach_kprobe(event=str("account_page_dirtied"), fn_name="do_count")
    b.attach_kprobe(event=str("mark_buffer_dirty"), fn_name="do_count")

Or by deleting:

from __future__ import unicode_literals

@chantra you can pick which makes most sense. :) Also, if you update loader.cc, please put a comment in there somewhere like:

// -fno-color-diagnostics: this is a workaround for a bug in llvm terminalHasColors() as of
// 22 Jul 2016. Also see bcc #615.

Since at some point in the future we might be able to remove it.

markdrayton · 2016-07-22T18:51:15Z

The assert is mine. Removing the unicode import is a good bet for now. I'll make it all better in #623.

chantra · 2016-07-23T13:38:54Z

ok, compiled bcc with -fno-color-diagnostics, removed unicode import, updated doc.

This is now working for me without redirecting stderr.
Please try and comment feedback, I will squash the commits if needed once we are happy with it.

Alike cachestat.py but providing cache stats at the process level.

make interval a positional parameter.

* pass -fno-color-diagnostics to clang * remove unicode import (iovisor#623) * add time to cachetop output * add keybindings to cachetop.8 * add cachetop links to README.md

brendangregg · 2016-07-24T17:01:06Z

Tested, works, thanks! I'll merge it.

A couple of possible follow-ons for later:

When the command is first run, the screen goes blank as it collects data. Would be nice for it to say "Tracing... Report every 5 seconds.", only once on first startup, so the user isn't look at a blank screen when they first run the command and are wondering if it's working.
Breakdown by file instead of by PID.

chantra · 2016-07-25T15:54:11Z

tools/cachetop.py

+    misses = 0
+    rhits = 0
+    whits = 0
+


As I was refactoring stuff between cachetop and cachestats I realized this should be within theloop on line 90.

Piggyback iovisor#615 (comment)

The current logic was only initializing page accesses, mark dirty.. at the beginning of the method, preventing counters to be ever reset for each PIDs. Piggyback iovisor#615 (comment) Tested by running both tools manually.

The current logic was only initializing page accesses, mark dirty.. at the beginning of the method, preventing counters to be ever reset for each PIDs. Piggyback #615 (comment) Tested by running both tools manually.

The current logic was only initializing page accesses, mark dirty.. at the beginning of the method, preventing counters to be ever reset for each PIDs. Piggyback iovisor#615 (comment) Tested by running both tools manually.

markdrayton reviewed Jul 16, 2016
View reviewed changes

chantra force-pushed the cachetop branch from 571a2a2 to 2abcc28 Compare July 19, 2016 14:56

markdrayton mentioned this pull request Jul 22, 2016

Fix string type handling in attach_kprobe() #623

Closed

chantra and others added 3 commits July 23, 2016 15:48

[cachetop] top-like cachestat

a021fd8

Alike cachestat.py but providing cache stats at the process level.

[cachetop] add example and man page.

75dfd5a

make interval a positional parameter.

[cachetop] fix and doc

e159f7e

* pass -fno-color-diagnostics to clang * remove unicode import (iovisor#623) * add time to cachetop output * add keybindings to cachetop.8 * add cachetop links to README.md

chantra force-pushed the cachetop branch from 5c6516b to e159f7e Compare July 23, 2016 13:48

brendangregg merged commit fa5f2f9 into iovisor:master Jul 24, 2016

chantra reviewed Jul 25, 2016
View reviewed changes

chantra added a commit to chantra/bcc that referenced this pull request Jul 28, 2016

[cachetop] fix stats computation per processes.

0a24ad6

Piggyback iovisor#615 (comment)

chantra mentioned this pull request Jul 28, 2016

[cachetop] fix stats computation per processes. #635

Merged

chantra deleted the cachetop branch July 28, 2016 16:35

osandov mentioned this pull request Jul 31, 2016

llvm::SMDiagnostic::print breaks GNU readline osandov/asmase#1

Open

[cachetop] top-like cachestat #615

[cachetop] top-like cachestat #615

Conversation

chantra commented Jul 14, 2016

mcaleavya commented Jul 14, 2016

Alike cachestat.py but providing cache stats at the process level.

chantra commented Jul 14, 2016

goldshtn commented Jul 15, 2016 • edited Loading

chantra commented Jul 15, 2016

brendangregg commented Jul 15, 2016

markdrayton Jul 16, 2016

Choose a reason for hiding this comment

markdrayton Jul 16, 2016

Choose a reason for hiding this comment

4ast Jul 23, 2016

Choose a reason for hiding this comment

markdrayton commented Jul 16, 2016

chantra commented Jul 16, 2016

chantra commented Jul 16, 2016

mcaleavya commented Jul 16, 2016

goldshtn commented Jul 16, 2016

chantra commented Jul 16, 2016

mcaleavya commented Jul 16, 2016

chantra commented Jul 18, 2016

chantra commented Jul 18, 2016 • edited Loading

brendangregg commented Jul 18, 2016

chantra commented Jul 19, 2016 • edited Loading

brendangregg commented Jul 20, 2016

chantra commented Jul 20, 2016

brendangregg commented Jul 21, 2016

brendangregg commented Jul 21, 2016

chantra commented Jul 21, 2016

brendangregg commented Jul 21, 2016

brendangregg commented Jul 21, 2016

brendangregg commented Jul 21, 2016

brendangregg commented Jul 21, 2016

brendangregg commented Jul 21, 2016

brendangregg commented Jul 21, 2016 • edited Loading

brendangregg commented Jul 22, 2016

4ast commented Jul 22, 2016

chantra commented Jul 22, 2016 • edited Loading

mcaleavya commented Jul 22, 2016

chantra commented Jul 22, 2016

4ast commented Jul 22, 2016

brendangregg commented Jul 22, 2016

4ast commented Jul 22, 2016

brendangregg commented Jul 22, 2016

markdrayton commented Jul 22, 2016

chantra commented Jul 23, 2016

brendangregg commented Jul 24, 2016

chantra Jul 25, 2016

Choose a reason for hiding this comment

goldshtn commented Jul 15, 2016 •

edited

Loading

chantra commented Jul 18, 2016 •

edited

Loading

chantra commented Jul 19, 2016 •

edited

Loading

brendangregg commented Jul 21, 2016 •

edited

Loading

chantra commented Jul 22, 2016 •

edited

Loading