Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cachetop] top-like cachestat #615

Merged
merged 3 commits into from
Jul 24, 2016
Merged

Conversation

chantra
Copy link
Contributor

@chantra chantra commented Jul 14, 2016

Alike cachestat.py but providing cache stats at the process level.

@mcaleavya
Copy link
Contributor

Hey

Can you provide an example file and man page also.

Al
On 14 Jul 2016 21:15, "chantra" notifications@github.com wrote:

Alike cachestat.py but providing cache stats at the process level.

You can view, comment on, or merge this pull request online at:

#615
Commit Summary

  • [cachetop] top-like cachestat

File Changes

Patch Links:


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#615, or mute the thread
https://github.com/notifications/unsubscribe/ADPbVHq8IRdkBK-JzmFn7WzARvW9-h4fks5qVpjkgaJpZM4JMzpt
.

@chantra
Copy link
Contributor Author

chantra commented Jul 14, 2016

@mcaleavya will do

@goldshtn
Copy link
Collaborator

goldshtn commented Jul 15, 2016

Tested, looks awesome! Also, this is the first tool that uses curses -- and I think it's very appropriate here. This is also the first tool that uses bpf_get_current_uid_gid.

A couple of questions regarding the UI:

  1. Among my other sins, I use a black on white terminal. Is there a way to get the curses UI to respect that? Right now I'm getting grey on black.

  2. With terminal width = 80 the text doesn't quite fit on the screen. Could you possibly reduce the column widths slightly so that it fits?

@chantra
Copy link
Contributor Author

chantra commented Jul 15, 2016

  1. Among my other sins, I use a black on white terminal. Is there a way to get the curses UI to respect that? Right now I'm getting grey on black.

I am just using default curse here. The only part of the initialization done is through the wrapper: http://svn.python.org/projects/python/branches/py3k/Lib/curses/wrapper.py
I guess this is a matter of detecting the terminal's color pair and initializing them properly, however, I would keep this for a separate diff though as this would actually be better of in a separate lib file so other tools could import and reuse the code without going through writing the same boiler plater and hitting the same issues.

  1. With terminal width = 80 the text doesn't quite fit on the screen. Could you possibly reduce the column widths slightly so that it fits?

yes, I will fix that. I am currently just checking the height available but should definitely also check the width. I have a local change that fit perfectly in 80chars, even less after I removed unecesary white-spaces.

@brendangregg
Copy link
Member

Might just be my weird system, but I get a segfault whenever I run the tool. Looks like it's in libcurses:

Program received signal SIGSEGV, Segmentation fault.
         0x00007ffff34ae40d in doupdate () from /lib/x86_64-linux-gnu/libncursesw.so.5
(gdb) bt
#0  0x00007ffff34ae40d in doupdate () from /lib/x86_64-linux-gnu/libncursesw.so.5
#1  0x00007ffff34a27e6 in wrefresh () from /lib/x86_64-linux-gnu/libncursesw.so.5
#2  0x00007ffff349b616 in ?? () from /lib/x86_64-linux-gnu/libncursesw.so.5
#3  0x00007ffff349c325 in wgetch () from /lib/x86_64-linux-gnu/libncursesw.so.5
#4  0x00007ffff36c8ec3 in ?? () from /usr/lib/python2.7/lib-dynload/_curses.x86_64-linux-gnu.so
#5  0x00000000004c4d5a in PyEval_EvalFrameEx ()
#6  0x00000000004c2e05 in PyEval_EvalCodeEx ()
#7  0x00000000004def08 in ?? ()
#8  0x00000000004b1153 in PyObject_Call ()
#9  0x00000000004c73ec in PyEval_EvalFrameEx ()
#10 0x00000000004c2e05 in PyEval_EvalCodeEx ()
#11 0x00000000004caf42 in PyEval_EvalFrameEx ()
#12 0x00000000004c2e05 in PyEval_EvalCodeEx ()
#13 0x00000000004c2ba9 in PyEval_EvalCode ()
#14 0x00000000004f20ef in ?? ()
#15 0x00000000004eca72 in PyRun_FileExFlags ()
#16 0x00000000004eb1f1 in PyRun_SimpleFileExFlags ()
#17 0x000000000049e18a in Py_Main ()
#18 0x00007ffff7811830 in __libc_start_main (main=0x49daf0 <main>, argc=2, argv=0x7fffffffdfc8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffffffdfb8) at ../csu/libc-start.c:291
#19 0x000000000049da19 in _start ()

if re.match('mark_page_accessed', bpf.ksym(k)) is not None:
mpa = v
if mpa < 0:
mpa = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make these lines more succinct, e.g.:

mpa = v if mpa > 0 else 0

though since this code is also in cachestat.py we might want to commit as-is then change both at once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact I guess:

mpa = max(0, v)

is the ultimate in brevity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Mark that this piece can be more python-ish.
The rest looks good to me.

@markdrayton
Copy link
Contributor

This is great! A few suggestions:

  • pass the interval as a positional argument. Many other interval-based tools (vmstat, cachestat, memleak) are invoked as toolname [interval [count]] so this would be more consistent.
  • can you add something to indicate what the current sort column is? After a few < and > I can't remember which column the output is sorted by :)

@chantra
Copy link
Contributor Author

chantra commented Jul 16, 2016

pass the interval as a positional argument. Many other interval-based tools (vmstat, cachestat, memleak) are invoked as toolname [interval [count]] so this would be more consistent.

yeah, this is how I had it originally. I was not the most argparse friendly and I would have had to handle the arguments. Also, I retired the count part as it was less meaningful with a top interface (unless batching is handle, in which case it could be reintroduced).
It seems the *top ecosystem has not agreed on a standard way either :)

can you add something to indicate what the current sort column is? After a few < and > I can't remember which column the output is sorted by :)

Definitely, but I thought of keeping this for a next iteration and keep this first-pass diff as simple as it is.

@chantra
Copy link
Contributor Author

chantra commented Jul 16, 2016

Might just be my weird system, but I get a segfault whenever I run the tool.

@brendangregg what do you run? There is a good chance I am not handling edge cases, that being said it should except within python rather than segfault :)

@mcaleavya
Copy link
Contributor

It would also be good to have an option to not have the tool clear the screen with maybe a timestamp so we can look at what process were using the cache historically.

@goldshtn
Copy link
Collaborator

@mcaleavya: That definitely sounds like something that can be added in v2 :-)

@chantra
Copy link
Contributor Author

chantra commented Jul 16, 2016

It would also be good to have an option to not have the tool clear the screen with maybe a timestamp so we can look at what process were using the cache historically.

@mcaleavya do you mean something like top in batch mode?

I thought about something along the line of keeping a ring buffer with stats and being able to navigate back and forth in time, that would be from within the current run though.
But yes, definitely not for this first pass :)

@mcaleavya
Copy link
Contributor

@chantra yes that would be good, even if we can also have a timestamp at the top next to buffers etc then that would make finding things easier when looking at historical output. I had in mind something similar to solaris prstat -c which prints reports below rather than clearing the screen. looking forward to the next update.

Cheers
Al

@chantra
Copy link
Contributor Author

chantra commented Jul 18, 2016

@brendangregg I am also getting a curses segfault when running ubuntu 16.04 from https://atlas.hashicorp.com/bento/boxes/ubuntu-16.04 with 4.6 from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.6-yakkety/

As much as I can tell, this only reproduce when using bcc module and only once BPF code is loaded using b = BPF(text=bpf_text) and the first interaction with curses happen (s = stdscr.getch() here.)

(gdb) bt
#0  0x00007ffff307740d in doupdate () from /lib/x86_64-linux-gnu/libncursesw.so.5
#1  0x00007ffff306b7e6 in wrefresh () from /lib/x86_64-linux-gnu/libncursesw.so.5
#2  0x00007ffff3064616 in ?? () from /lib/x86_64-linux-gnu/libncursesw.so.5
#3  0x00007ffff3065325 in wgetch () from /lib/x86_64-linux-gnu/libncursesw.so.5
#4  0x00007ffff3291ec3 in PyCursesWindow_GetCh.lto_priv.109 (self=0x7ffff7f7d198, args=())
    at /build/python2.7-M_HuAO/python2.7-2.7.12/Modules/_cursesmodule.c:853
#5  0x00000000004c41da in call_function (oparg=<optimized out>, pp_stack=0x7fffffffde00) at ../Python/ceval.c:4350
#6  PyEval_EvalFrameEx () at ../Python/ceval.c:2987
#7  0x00000000004c22e5 in PyEval_EvalCodeEx () at ../Python/ceval.c:3582
#8  0x00000000004de9b8 in function_call.lto_priv () at ../Objects/funcobject.c:523
#9  0x00000000004b0de3 in PyObject_Call () at ../Objects/abstract.c:2546
#10 0x00000000004c68e3 in ext_do_call (nk=<optimized out>, na=<optimized out>, flags=<optimized out>, pp_stack=0x7fffffffe0b8,
    func=<function at remote 0x7ffff34b3d70>) at ../Python/ceval.c:4664
#11 PyEval_EvalFrameEx () at ../Python/ceval.c:3026
#12 0x00000000004c22e5 in PyEval_EvalCodeEx () at ../Python/ceval.c:3582
#13 0x00000000004ca3d2 in fast_function (nk=0, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffe2c0,
    func=<function at remote 0x7ffff360e230>) at ../Python/ceval.c:4445
#14 call_function (oparg=<optimized out>, pp_stack=0x7fffffffe2c0) at ../Python/ceval.c:4370
#15 PyEval_EvalFrameEx () at ../Python/ceval.c:2987
#16 0x00000000004c22e5 in PyEval_EvalCodeEx () at ../Python/ceval.c:3582
#17 0x00000000004c2089 in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at ../Python/ceval.c:669
#18 0x00000000004f1e6f in run_mod.lto_priv () at ../Python/pythonrun.c:1376
#19 0x00000000004ec802 in PyRun_FileExFlags () at ../Python/pythonrun.c:1362
#20 0x00000000004eaf81 in PyRun_SimpleFileExFlags () at ../Python/pythonrun.c:948
#21 0x000000000049e36a in Py_Main () at ../Modules/main.c:640
#22 0x00007ffff7811830 in __libc_start_main (main=0x49dcd0 <main>, argc=2, argv=0x7fffffffe708, init=<optimized out>,
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe6f8) at ../csu/libc-start.c:291
#23 0x000000000049dbf9 in _start ()

@chantra
Copy link
Contributor Author

chantra commented Jul 18, 2016

redirecting stderr make this work.... @brendangregg does that work for you too?

./tools/cachetop.py -i 1 2> /dev/null

@brendangregg
Copy link
Member

@chantra yes, that works, thanks!

I'd copy-n-paste a screenshot to prove it, but it cleared the screen. Need -C, --no-clear-screen :-)

Pretty great to see a ncurses example anyway.

@chantra
Copy link
Contributor Author

chantra commented Jul 19, 2016

I'd copy-n-paste a screenshot to prove it, but it cleared the screen. Need -C, --no-clear-screen :-)

that's seems harder than it should with python's ncurses. Looks like I would have to reimplement what endwin does.

@brendangregg
Copy link
Member

I was trying to debug the segfault without much luck. Do you know of a simple example of Python curses wrapper to try? I have some simple examples of Python curses, but they work, so I'm guessing it might be related to wrapper.

@chantra
Copy link
Contributor Author

chantra commented Jul 20, 2016

@brendangregg, it is actually BPF(text) that triggers the segfault. I did not troubleshoot it any further than that, but without initializing BPF, I dont repro the issue.
I should have said that in my comment before, sorry for that.

@brendangregg
Copy link
Member

I'd updated some things including bcc, and my segfault stack is now:

Program received signal SIGSEGV, Segmentation fault.
  0x00007ffff3c7855c in _nc_keypad (sp=0xb00590, flag=0) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_options.c:362
362         (void) NCURSES_PUTP2_FLUSH("keypad_xmit", keypad_xmit);
(gdb) bt
#0  0x00007ffff3c7855c in _nc_keypad (sp=0xb00590, flag=0) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_options.c:362
#1  0x00007ffff36cace5 in ?? () from /usr/lib/python2.7/lib-dynload/_curses.x86_64-linux-gnu.so
#2  0x00000000004c4d5a in PyEval_EvalFrameEx ()
#3  0x00000000004c2e05 in PyEval_EvalCodeEx ()
#4  0x00000000004caf42 in PyEval_EvalFrameEx ()
#5  0x00000000004c2e05 in PyEval_EvalCodeEx ()
#6  0x00000000004c2ba9 in PyEval_EvalCode ()
#7  0x00000000004f20ef in ?? ()
#8  0x00000000004eca72 in PyRun_FileExFlags ()
#9  0x00000000004eb1f1 in PyRun_SimpleFileExFlags ()
#10 0x000000000049e18a in Py_Main ()
#11 0x00007ffff7811830 in __libc_start_main (main=0x49daf0 <main>, argc=2, argv=0x7fffffffdfc8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffffffdfb8) at ../csu/libc-start.c:291
#12 0x000000000049da19 in _start ()

@brendangregg
Copy link
Member

oh, I was messing with |cat, but that's likely an unrelated error...

@chantra
Copy link
Contributor Author

chantra commented Jul 21, 2016

down the rabbit hole I found after installing: http://packages.ubuntu.com/xenial/libncurses5-dbg

(gdb) bt
#0  ClrBlank (win=0xb095c0) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tty/tty_update.c:1129
#1  ClrUpdate () at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tty/tty_update.c:1147
#2  doupdate () at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tty/tty_update.c:1010
#3  0x00007ffff306b7e6 in wrefresh (win=win@entry=0xb095c0)
    at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/base/lib_refresh.c:65
#4  0x00007ffff3064499 in recur_wrefresh (win=win@entry=0xb095c0)
    at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/base/lib_getch.c:384
#5  0x00007ffff3064616 in _nc_wgetch (win=win@entry=0xb095c0, result=result@entry=0x7fffffffdcf4, use_meta=1)
    at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/base/lib_getch.c:491
#6  0x00007ffff3065325 in wgetch (win=0xb095c0)
    at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/base/lib_getch.c:672
#7  0x00007ffff3291ec3 in PyCursesWindow_GetCh.lto_priv.109 (self=0x7ffff7f7d1b0, args=())
    at /build/python2.7-M_HuAO/python2.7-2.7.12/Modules/_cursesmodule.c:853
#8  0x00000000004c41da in call_function (oparg=<optimized out>, pp_stack=0x7fffffffde00) at ../Python/ceval.c:435
#9  PyEval_EvalFrameEx () at ../Python/ceval.c:2987
#10 0x00000000004c22e5 in PyEval_EvalCodeEx () at ../Python/ceval.c:3582
#11 0x00000000004de9b8 in function_call.lto_priv () at ../Objects/funcobject.c:523
#12 0x00000000004b0de3 in PyObject_Call () at ../Objects/abstract.c:2546
#13 0x00000000004c68e3 in ext_do_call (nk=<optimized out>, na=<optimized out>, flags=<optimized out>,
    pp_stack=0x7fffffffe0b8, func=<function at remote 0x7ffff34b2d70>) at ../Python/ceval.c:4664
#14 PyEval_EvalFrameEx () at ../Python/ceval.c:3026
#15 0x00000000004c22e5 in PyEval_EvalCodeEx () at ../Python/ceval.c:3582
#16 0x00000000004ca3d2 in fast_function (nk=0, na=<optimized out>, n=<optimized out>, pp_stack=0x7fffffffe2c0,
    func=<function at remote 0x7ffff360e230>) at ../Python/ceval.c:4445
#17 call_function (oparg=<optimized out>, pp_stack=0x7fffffffe2c0) at ../Python/ceval.c:4370
#18 PyEval_EvalFrameEx () at ../Python/ceval.c:2987
#19 0x00000000004c22e5 in PyEval_EvalCodeEx () at ../Python/ceval.c:3582
#20 0x00000000004c2089 in PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>)
    at ../Python/ceval.c:669
#21 0x00000000004f1e6f in run_mod.lto_priv () at ../Python/pythonrun.c:1376
#22 0x00000000004ec802 in PyRun_FileExFlags () at ../Python/pythonrun.c:1362
#23 0x00000000004eaf81 in PyRun_SimpleFileExFlags () at ../Python/pythonrun.c:948
#24 0x000000000049e36a in Py_Main () at ../Modules/main.c:640
#25 0x00007ffff7811830 in __libc_start_main (main=0x49dcd0 <main>, argc=2, argv=0x7fffffffe708,
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe6f8)
    at ../csu/libc-start.c:291
#26 0x000000000049dbf9 in _start ()

ncurses/tty/tty_update.c

1124
1125 static NCURSES_INLINE NCURSES_CH_T
1126 ClrBlank(NCURSES_SP_DCLx WINDOW *win)
1127 {
1128     NCURSES_CH_T blank = blankchar;
1129     if (back_color_erase)
1130     AddAttr(blank, (AttrOf(BCE_BKGD(SP_PARM, win)) & BCE_ATTRS));
1131     return blank;
1132 }
1133

(gdb) print back_color_erase
No symbol "back_color_erase" in current context.

Attempting a quick local patch of tty_update.c and rebuilding the package to see what happens.

@brendangregg
Copy link
Member

I see you're still in doupdate(). I'm now always in _nc_keypad()... I added libncurses5-dbg, but needed to also add the source in /build to make gdb happy. I noticed that that URL has the source tarball for ncurses-6.0+20160213, but not ncurses-6.0-20160213 -- I don't know if there's any difference between the "+" and "-"...

@brendangregg
Copy link
Member

I think it's because (include/term.h):

#define CUR cur_term->type. 
[...]
#define back_color_erase               CUR Booleans[28]  
[...]
#define keypad_local                   CUR Strings[88]     

and

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f055f29fb54 in _nc_keypad (sp=0x1c24af0, flag=0) at ../ncurses/./tinfo/lib_options.c:377
377             if (!flag && keypad_local) {
(gdb) print cur_term
$1 = (TERMINAL *) 0x0

So we're both hitting the same problem: cur_term is NULL.

And of course that's the type of segfault I'm in:

(gdb) disas _nc_keypad
Dump of assembler code for function _nc_keypad:
   0x00007ffff3c78540 <+0>: test   %rdi,%rdi
   0x00007ffff3c78543 <+3>: je     0x7ffff3c785bd <_nc_keypad+125>
   0x00007ffff3c78545 <+5>: push   %rbp
   0x00007ffff3c78546 <+6>: push   %rbx
   0x00007ffff3c78547 <+7>: mov    %esi,%ebp
   0x00007ffff3c78549 <+9>: mov    %rdi,%rbx
   0x00007ffff3c7854c <+12>:    sub    $0x8,%rsp
   0x00007ffff3c78550 <+16>:    mov    0x217a69(%rip),%rax        # 0x7ffff3e8ffc0
   0x00007ffff3c78557 <+23>:    test   %esi,%esi
   0x00007ffff3c78559 <+25>:    mov    (%rax),%rax
=> 0x00007ffff3c7855c <+28>:    mov    0x20(%rax),%rax
[...]
(gdb) i r
rax            0x0  0
[...]

@brendangregg
Copy link
Member

cur_term is set by set_curterm() (at least), which is called with NULL before the crash:

# ./trace.py 'p:tinfo:set_curterm "%d", arg1'
TIME     PID    COMM         FUNC             -
01:00:20 31698  python       set_curterm      38018416
01:00:20 31698  python       set_curterm      38396640
01:00:20 31698  python       set_curterm      39624608
01:00:20 31698  python       set_curterm      0

Using bcc to debug bcc. :)

@brendangregg
Copy link
Member

... need to add a "-s" option to trace.py for printing stack traces.

@brendangregg
Copy link
Member

using gdb instead:

(gdb) b set_curterm
Function "set_curterm" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (set_curterm) pending.
(gdb) r cachetop.py
Starting program: /usr/bin/python cachetop.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, set_curterm (termp=termp@entry=0xa818c0) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
80  {
(gdb) c
Continuing.

Breakpoint 1, set_curterm (termp=termp@entry=0xab59f0) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
80  {
(gdb) c
Continuing.

Breakpoint 1, set_curterm (termp=termp@entry=0xbe2e80) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
80  {
(gdb) c
Continuing.

Breakpoint 1, set_curterm (termp=0x0) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
80  {
(gdb) bt
#0  set_curterm (termp=0x0) at /build/ncurses-pKZ1BN/ncurses-6.0+20160213/ncurses/tinfo/lib_cur_term.c:80
#1  0x00007ffff5a45df5 in llvm::sys::Process::FileDescriptorHasColors(int) () from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#2  0x00007ffff45cbb38 in clang::driver::tools::Clang::ConstructJob(clang::driver::Compilation&, clang::driver::JobAction const&, clang::driver::InputInfo const&, llvm::SmallVector<clang::driver::InputInfo, 4u> const&, llvm::opt::ArgList const&, char const*) const () from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#3  0x00007ffff4570f25 in clang::driver::Driver::BuildJobsForAction(clang::driver::Compilation&, clang::driver::Action const*, clang::driver::ToolChain const*, char const*, bool, bool, char const*, clang::driver::InputInfo&) const () from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#4  0x00007ffff4571481 in clang::driver::Driver::BuildJobs(clang::driver::Compilation&) const () from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#5  0x00007ffff45731ca in clang::driver::Driver::BuildCompilation(llvm::ArrayRef<char const*>) () from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#6  0x00007ffff4397c5a in ebpf::ClangLoader::parse(std::unique_ptr<llvm::Module, std::default_delete<llvm::Module> >*, std::unique_ptr<std::vector<ebpf::TableDesc, std::allocator<ebpf::TableDesc> >, std::default_delete<std::vector<ebpf::TableDesc, std::allocator<ebpf::TableDesc> > > >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, char const**, int) () from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#7  0x00007ffff4345314 in ebpf::BPFModule::load_cfile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, char const**, int) ()
   from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#8  0x00007ffff434ae5e in ebpf::BPFModule::load_string(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const**, int) ()
   from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#9  0x00007ffff43440c8 in bpf_module_create_c_from_string () from /usr/lib/x86_64-linux-gnu/libbcc.so.0
#10 0x00007ffff690ae40 in ffi_call_unix64 () from /usr/lib/x86_64-linux-gnu/libffi.so.6
#11 0x00007ffff690a8ab in ffi_call () from /usr/lib/x86_64-linux-gnu/libffi.so.6
#12 0x00007ffff6b1a68c in _ctypes_callproc () from /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so
#13 0x00007ffff6b1ed82 in ?? () from /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so
#14 0x00000000004b1153 in PyObject_Call ()
#15 0x00000000004ca5ca in PyEval_EvalFrameEx ()
#16 0x00000000004c2e05 in PyEval_EvalCodeEx ()
#17 0x00000000004def08 in ?? ()
#18 0x00000000004b1153 in PyObject_Call ()
#19 0x00000000004f4c3e in ?? ()
#20 0x00000000004b1153 in PyObject_Call ()
#21 0x00000000004f49b7 in ?? ()
#22 0x00000000004b6e2c in ?? ()
#23 0x00000000004b1153 in PyObject_Call ()
#24 0x00000000004ca5ca in PyEval_EvalFrameEx ()
#25 0x00000000004c2e05 in PyEval_EvalCodeEx ()
#26 0x00000000004def08 in ?? ()
#27 0x00000000004b1153 in PyObject_Call ()
#28 0x00000000004c73ec in PyEval_EvalFrameEx ()
#29 0x00000000004c2e05 in PyEval_EvalCodeEx ()
#30 0x00000000004caf42 in PyEval_EvalFrameEx ()
#31 0x00000000004c2e05 in PyEval_EvalCodeEx ()
#32 0x00000000004c2ba9 in PyEval_EvalCode ()
#33 0x00000000004f20ef in ?? ()
#34 0x00000000004eca72 in PyRun_FileExFlags ()
#35 0x00000000004eb1f1 in PyRun_SimpleFileExFlags ()
---Type <return> to continue, or q <return> to quit---
#36 0x000000000049e18a in Py_Main ()
#37 0x00007ffff7811830 in __libc_start_main (main=0x49daf0 <main>, argc=2, argv=0x7fffffffdfc8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffffffdfb8) at ../csu/libc-start.c:291
#38 0x000000000049da19 in _start ()

@brendangregg
Copy link
Member

brendangregg commented Jul 21, 2016

Now in llvm... so FileDescriptorHasColors() calls terminalHasColors(), which:

static bool terminalHasColors(int fd) {
[...]
  // Now extract the structure allocated by setupterm and free its memory
  // through a really silly dance.
  struct term *termp = set_curterm((struct term *)nullptr);
  (void)del_curterm(termp); // Drop any errors here.

Which seems like it would trash cur_term, and leave it set to NULL...

@brendangregg
Copy link
Member

Here's what that llvm code used to be:

static bool terminalHasColors() {
  if (const char *term = std::getenv("TERM")) {
    // Most modern terminals support ANSI escape sequences for colors.
    // We could check terminfo, or have a list of known terms that support
    // colors, but that would be overkill.
    // The user can always ask for no colors by setting TERM to dumb, or
    // using a commandline flag.
    return strcmp(term, "dumb") != 0;
  }
  return false;
}

And the change was llvm-mirror/llvm@d485e7b#diff-a4fb6575a290937bc9142e3d7efc8989

@4ast
Copy link
Member

4ast commented Jul 22, 2016

wow. great sleuthing!
any ideas for workaround?

@chantra
Copy link
Contributor Author

chantra commented Jul 22, 2016

Attempting a quick local patch of tty_update.c and rebuilding the package to see what happens.

as expected, this crashed just a little further down the road :)

@mcaleavya
Copy link
Contributor

I had it crash on me but rebuild my Vm. Can't remember if I had built to
rc7, but looks OK on rc6.

On 22 Jul 2016 09:42, "chantra" notifications@github.com wrote:

Attempting a quick local patch of tty_update.c and rebuilding the package
to see what happens.
as expected, this crashed just a little further down the road :)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#615 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADPbVL9aQTaOKlY3XL4IpRBcPS4vac6Zks5qYIJqgaJpZM4JMzpt
.

@chantra
Copy link
Contributor Author

chantra commented Jul 22, 2016

@4ast work around is to redirect stderr to /dev/null. Alternatively... I can add this patch:

diff --git a/tools/cachetop.py b/tools/cachetop.py
index b1ea9a6..e18e5c5 100755
--- a/tools/cachetop.py
+++ b/tools/cachetop.py
@@ -21,6 +21,7 @@ from bcc import BPF

 import argparse
 import curses
+import os
 import pwd
 import re
 import signal
@@ -252,4 +253,7 @@ def parse_arguments():
     return args

 args = parse_arguments()
+# hacky hacky
+nullfd = os.open('/dev/null', os.O_WRONLY)
+os.dup2(nullfd, 2)
 curses.wrapper(handle_loop, args)

@4ast
Copy link
Member

4ast commented Jul 22, 2016

would passing -fno-color-diagnostics to clang solve it as well?

@brendangregg
Copy link
Member

stderr didn't work for me. Where would I add -fno-color-diagnostics?

@4ast
Copy link
Member

4ast commented Jul 22, 2016

something like this:

diff --git a/src/cc/frontends/clang/loader.cc b/src/cc/frontends/clang/loader.cc
index 4b0729df6b66..4c88406eb2e9 100644
--- a/src/cc/frontends/clang/loader.cc
+++ b/src/cc/frontends/clang/loader.cc
@@ -102,6 +102,7 @@ int ClangLoader::parse(unique_ptr<llvm::Module> *mod, unique_ptr<vector<TableDes
   vector<const char *> flags_cstr({"-O0", "-emit-llvm", "-I", dstack.cwd(),
                                    "-Wno-deprecated-declarations",
                                    "-Wno-gnu-variable-sized-type-not-at-end",
+                                   "-fno-color-diagnostics",
                                    "-x", "c", "-c", abs_file.c_str()});

   KBuildHelper kbuild_helper(kdir);

completely untested.

@brendangregg
Copy link
Member

Thanks, it gets further:

# ./cachetop.py
Traceback (most recent call last):
  File "./cachetop.py", line 255, in <module>
    curses.wrapper(handle_loop, args)
  File "/usr/lib/python2.7/curses/wrapper.py", line 43, in wrapper
    return func(stdscr, *args, **kwds)
  File "./cachetop.py", line 177, in handle_loop
    b.attach_kprobe(event="add_to_page_cache_lru", fn_name="do_count")
  File "/usr/lib/python2.7/dist-packages/bcc/__init__.py", line 362, in attach_kprobe
    assert isinstance(event, str), "event must be a string"
AssertionError: event must be a string

(Which I also saw earlier when running it from strace...)

I can fix it using:

    b.attach_kprobe(event=str("add_to_page_cache_lru"), fn_name="do_count")
    b.attach_kprobe(event=str("mark_page_accessed"), fn_name="do_count")
    b.attach_kprobe(event=str("account_page_dirtied"), fn_name="do_count")
    b.attach_kprobe(event=str("mark_buffer_dirty"), fn_name="do_count")

Or by deleting:

from __future__ import unicode_literals

@chantra you can pick which makes most sense. :) Also, if you update loader.cc, please put a comment in there somewhere like:

// -fno-color-diagnostics: this is a workaround for a bug in llvm terminalHasColors() as of
// 22 Jul 2016. Also see bcc #615.

Since at some point in the future we might be able to remove it.

@markdrayton
Copy link
Contributor

The assert is mine. Removing the unicode import is a good bet for now. I'll make it all better in #623.

@chantra
Copy link
Contributor Author

chantra commented Jul 23, 2016

ok, compiled bcc with -fno-color-diagnostics, removed unicode import, updated doc.

This is now working for me without redirecting stderr.
Please try and comment feedback, I will squash the commits if needed once we are happy with it.

chantra and others added 3 commits July 23, 2016 15:48
Alike cachestat.py but providing cache stats at the process level.
make interval a positional parameter.
* pass -fno-color-diagnostics to clang
* remove unicode import (iovisor#623)
* add time to cachetop output
* add keybindings to cachetop.8
* add cachetop links to README.md
@brendangregg
Copy link
Member

Tested, works, thanks! I'll merge it.

A couple of possible follow-ons for later:

  • When the command is first run, the screen goes blank as it collects data. Would be nice for it to say "Tracing... Report every 5 seconds.", only once on first startup, so the user isn't look at a blank screen when they first run the command and are wondering if it's working.
  • Breakdown by file instead of by PID.

@brendangregg brendangregg merged commit fa5f2f9 into iovisor:master Jul 24, 2016
misses = 0
rhits = 0
whits = 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I was refactoring stuff between cachetop and cachestats I realized this should be within theloop on line 90.

chantra added a commit to chantra/bcc that referenced this pull request Jul 28, 2016
chantra added a commit to chantra/bcc that referenced this pull request Jul 28, 2016
The current logic was only initializing page accesses, mark dirty.. at
the beginning of the method, preventing counters to be ever reset for
each PIDs.

Piggyback iovisor#615 (comment)

Tested by running both tools manually.
@chantra chantra deleted the cachetop branch July 28, 2016 16:35
chantra added a commit to chantra/bcc that referenced this pull request Jul 29, 2016
The current logic was only initializing page accesses, mark dirty.. at
the beginning of the method, preventing counters to be ever reset for
each PIDs.

Piggyback iovisor#615 (comment)

Tested by running both tools manually.
4ast pushed a commit that referenced this pull request Jul 29, 2016
The current logic was only initializing page accesses, mark dirty.. at
the beginning of the method, preventing counters to be ever reset for
each PIDs.

Piggyback #615 (comment)

Tested by running both tools manually.
palmtenor pushed a commit to palmtenor/bcc that referenced this pull request Jul 31, 2016
The current logic was only initializing page accesses, mark dirty.. at
the beginning of the method, preventing counters to be ever reset for
each PIDs.

Piggyback iovisor#615 (comment)

Tested by running both tools manually.
chantra added a commit to chantra/bcc that referenced this pull request Sep 16, 2016
The current logic was only initializing page accesses, mark dirty.. at
the beginning of the method, preventing counters to be ever reset for
each PIDs.

Piggyback iovisor#615 (comment)

Tested by running both tools manually.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants