-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
USDT probes #327
Comments
Definitely very useful, not sure how to tackle this yet. |
@brendangregg I want to start hacking on this, and was wondering if you know how I can easily simulate a probe that requires a semaphore. I've looked at this SystemTap page and can't see any mention of it. |
BTW, the SystemTap page I linked to above says that you can simply |
@brendangregg can you explain what sema in that script is for? |
@goldshtn you're right, so steps 1 & 2 can just be:
Or we can include our own sdt.h. I haven't compared both, but I'd guess the SystemTap one has had more work to make it behave well on Linux. Another thing: on the iovisor call today (11am PST biweekly) we briefly discussed that there were at least 3 ways to do USDT probes. This way, using uprobes, is the most obvious and immediate solution, and we should go ahead with it. But later on (much later on) we might investigate other approaches in addition or instead of, including LD_PRELOAD so that tracing can be user-mode to user-mode, reducing overhead. These other approaches should greatly reduce the overhead of memleak too. I'll see if I can dig up a semaphore example. @4ast Why some probes need semaphores? It's so the program can do some expensive argument preparation only when the probe is enabled. For example:
|
USDT is-enabled example: 1. Create tick-dtrace.d:
2. Then create header and object files:
Note that the generated tick-dtrace.h contains the ENABLED macros and semaphores. Here's the file:
3. Create the target program, tick-main.c:
So the loop1 probe is normal, and loop2 is wrapped in is-enabled. Should help testing. 4. Compile tick-main:
5. Check it has USDT probes:
See NT_STAPSDT. |
yes, but looks like steps 1 and 2 are not really necessary. Such .h can be simplified into a macro without any need for generation ? |
@4ast right |
Cool, thanks for the detailed example. But it looks like the consumer doesn't know if the traced program uses the semaphore or not. So it should increment the semaphore in any case. Correct? |
Sounds correct. Unless there's a check to see if the semaphore is in the compiled binary. Which you might test for anyway by checking if the semaphore address is non-zero (I assume if it's not included in the binary, readelf will show zero). |
Yes, I've seen it reported as zero when not using the dtrace-generated infrastructure. BTW, I'm a bit worried about probing things like |
Another thing I just stumbled across: some probe arguments have a format that mentions some global names. For example, in libc I found these probes:
@brendangregg, do you know how we're supposed to resolve this kind of argument? I assume (This output, by the way, is from code I'm planning to integrate into tplist, so you could list USDT probes like tracepoints. For example, |
And here's another interesting phenomenon from my libc -- the same probe name appears multiple times, but at different locations. How would the user specify which probe they are interested in? Or do we probe all the locations?
EDITED: After some testing, this is pretty obvious -- when the probe is used in multiple places in the source program, there are multiple notes for that probe. I also noticed that the arguments don't have to be in the same locations. For example, if the program specifies an immediate value, like 42, the probe's arguments at that location will not use registers, but rather something like 4@$42. The result of this is that the probing program (argdist, trace, etc.) will have to probe each location separately and obtain the arguments in a separate way. |
All right, I have a version of tplist that can display USDT probes on this branch. Next, I'm going to add support for USDT probes to argdist and trace. I'm thinking of initializing __arg1, __arg2, etc. local variables and having arg1, arg2 etc. expressions provided by the user refer to these. This will be 100% natural for trace, and a new concept for argdist. For example, the user says:
Now, trace knows this probe has two arguments of type u64 in R13 and RBP, and generates the following:
What do you think? |
@goldshtn you might have figured it out, but I assume if a probe is in multiple locations, then they all must be traced at the same time. It's an advantage of static probes: you can define a logical tracepoint, and the end-user doesn't need to know if it's instrumenting one location to satisfy that tracepoint or ten (because of the implementation). I don't know about the global_max_fast and mp_ stuff yet. That trace example looks good. |
@brendangregg: I have an updated version on my usdt branch. Before I create a PR, do you have any comments on the final syntax? argdist example:
trace example:
|
Looks good. I think we ultimately may want to add long-form versions of these:
and when teaching bcc, people can begin with the long-form, then move onto the shortcuts. Can be added later. tplist needs an _example.txt. Probably my broken system, but when I tried to build it I got llvm errors:
|
I haven't made any changes to the LLVM stuff, so it's probably your system 😺 |
OK, this issue is now mostly taken care of by #451. I still have to figure out what happens to the semaphore address when the probe is in a shared library. I expect the semaphore address to be relative to the library's load address, in which case the enabling program has to take care of it. EDITED: Yes, it's relative, and it seems to be working now. |
@brendangregg I think this can be closed now. |
Figured out how to build Node with USDT probes. And it seems to work -- even though I haven't checked what the arguments mean yet.
|
Even better:
|
awesome! could you do a step-by-step description how to reproduce the same |
Yes -- I've written a 1000+ word blog post that I intend to turn into a doc file. Hope to get it done tomorrow. Shows how to compile your own USDT probes (following Brendan's examples) and how to use Node with USDT. |
FWIW, last time I tested the Node.js USDT probes:
readelf should show things like this:
Example server (basic-server.js):
Run using Client can be curl, ab, or node, etc. Looking forward to @goldshtn 's blog post! |
Because my system was lacking a newer GLIB, I switched to node v0.11 (http://nodejs.org/dist/v0.11.0/node-v0.11.0.tar.gz), plus an older version of the HTTP server code. I can now build node, and list probes fine:
But not instrument them:
I used bcc to debug itself:
My |
BTW, I deleted the --skip-aliases from trace.py and src/python/bcc/usdt.py, and it worked:
Arguments didn't work (as well as your example), but that might be my build. |
@4ast @brendangregg And here's the blog post I promised yesterday: http://blogs.microsoft.co.il/sasha/2016/03/30/usdt-probe-support-in-bpfbcc/ |
awesome! |
@4ast: I haven't looked into how <sys/sdt.h> works, but it looks like the hardest part is to generate the information on where to find the arguments (registers, globals, offsets, etc.). Would that be worth replicating in our own sdt.h? @brendangregg: There is another low-hanging fruit, which is tapping into OpenJDK's USDT probes. It seems that there have been USDT markers in OpenJDK for a long time now. Quick example:
|
@goldshtn yes (did you build your own openjdk for that?). So the JVM probes have had limited success: there's some interesting things, like GC, class, and thread activity, but the probes that would be obviously useful -- the Java method ones -- are usually too prohibitive to use due to their overhead. DTrace's overhead, that is. Plus one needs -XX:+ExtendedDTraceProbes to make the method ones available, which adds a lot of overhead straight away (although that might have become dynamic https://blogs.oracle.com/sundararajan/entry/dtrace_java_turning_on_off ). |
@goldshtn I was actually proposing to reuse sdt.h and argument storing into notes, since I can do
and it will generate 'nop' insn and correct notes section without doing 'dtrace -...' steps. |
@4ast: But you would still want that static variable to be enabled externally by the tracing program? How is that different from the current semaphores? |
@brendangregg: I will experiment some more with enabling them dynamically. Still, the ones that are on by default like GC, loader, JNI can be very relevant for production monitoring. |
@brendangregg: forgot to mention, I just did |
Just posted some more examples of using JVM probes: http://blogs.microsoft.co.il/sasha/2016/03/31/probing-the-jvm-with-bpfbcc/ |
@goldshtn enabling a variable externally is technically not different from semaphore. The difference that it doesn't need 'dtrace ..' step and extra .o |
OK, but we would still need the location of that variable recorded in the ELF note, right? |
the stap approach is prone to compile bugs as shown in the example. If we do it kernel style then we can have all probes call single dummy function and pass arguments via standard x64 abi, so args will be in di,si,dx,cx,r8 and no parsing of args necessary, but notes section is still needed to know which sdt probes are there. |
It's just that JDK and Node are already instrumented with stap's sdt.h. We could support both I suppose :-) Is that in scope for this issue, or should we open another issue to discuss this? Also, is this something you think is worth investing in at this point in time? |
support for existing stap's sdt is already done and it's absolutely awesome. I think we can push sdt to the next level. |
I have a few questions :-) |
I guess if the check is done at the call site, the function call cost would only be relevant when the probe is enabled, and then it's probably negligible compared to the int 3, kernel transition, BPF execution and so on. But the function call at the call site still makes the calling code a bit bigger. |
@brendangregg: It looks like enabling the ExtendedDTraceProbes flag dynamically is not supported in OpenJDK. I get the following
But the rest of it works fine. I'm going to move on to perf-map support for |
Since USDT does work via trace.py, I want to close this ticket, however, we should have (as part of this ticket or another one) a hello world example. I tried to write one. I think a couple more helper functions would make a big difference. Here is rough attempt at a basic USDT script: #!/usr/bin/python
#
# nodejs_http_server Basic example of node.js USDT tracing.
# For Linux, uses BCC, BPF. Embedded C.
#
# USAGE: nodejs_http_server PID
from __future__ import print_function
from bcc import BPF, ProcUtils, USDTReader
import sys
import signal
if len(sys.argv) < 2:
print("USAGE: nodejs_http_server PID")
exit()
pid = sys.argv[1]
debug = 0
# load BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
int do_trace(struct pt_regs *ctx) {
u32 pid = bpf_get_current_pid_tgid();
if (pid != PID)
return 0;
int __loc_id = 0;
ARGS
bpf_trace_printk("%s\\n", arg6);
return 0;
};
"""
bpf_text = bpf_text.replace('PID', pid)
binary = "node"
probe = "http__server__request"
#
# XXX: the following steps should be simplified, and as much as possible
# moved to the bcc usdt library.
#
# 1. find the binary, and the probe:
path = ProcUtils.which(binary)
if path is None or len(path) == 0:
print("ERROR: could not find %s path. Exiting." % binary)
exit()
reader = USDTReader(bin_path=path)
for p in reader.probes:
if p.name == probe:
usdtprobe = p
break
if usdtprobe == None:
print("ERROR: could not find probe %s. Exiting." % probe)
exit()
# 2. do probe args
args = usdtprobe.generate_usdt_cases(pid=int(pid))
bpf_text = bpf_text.replace('ARGS', args)
# 3. try to enable the probe, after setting up disable on Ctrl-C
def signal_exit(signal, frame):
if (usdtprobe.need_enable()):
usdtprobe.disable(int(pid))
exit()
try:
signal.signal(signal.SIGINT, signal_exit)
if (usdtprobe.need_enable()):
usdtprobe.enable(int(pid))
except:
print("ERROR: could not enable probe %s. Exiting." % probe)
exit()
# 4. attach usdt probe:
if debug:
print(bpf_text)
b = BPF(text=bpf_text)
for i, location in enumerate(usdtprobe.locations):
# same USDT probe can appear in more than one code location
b.attach_uprobe(name=path, addr=location.address,
fn_name="do_trace", pid=int(pid))
# header
print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "ARGS"))
# format output
while 1:
try:
(task, pid, cpu, flags, ts, msg) = b.trace_fields()
except ValueError:
continue
print("%-18.9f %-16s %-6d %s" % (ts, task, pid, msg)) It works. Now here's one proposal for how this could be improved (see the two "XXX"s): #!/usr/bin/python
#
# nodejs_http_server Basic example of node.js USDT tracing.
# For Linux, uses BCC, BPF. Embedded C.
#
# USAGE: nodejs_http_server PID
from __future__ import print_function
from bcc import BPF, ProcUtils, USDTReader
import sys
import signal
if len(sys.argv) < 2:
print("USAGE: nodejs_http_server PID")
exit()
pid = sys.argv[1]
debug = 0
# load BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
int do_trace(struct pt_regs *ctx) {
u32 pid = bpf_get_current_pid_tgid();
if (pid != PID)
return 0;
unsigned long long arg6 = bcc_get_usdtarg(6); /* XXX */
bpf_trace_printk("%s\\n", arg6);
return 0;
};
"""
bpf_text = bpf_text.replace('PID', pid)
if debug:
print(bpf_text)
b = BPF(text=bpf_text)
binary = "node"
probe = "http__server__request"
# XXX attach_usdt_probe() finds the binary, probe, enables it if necessary, and
# calls attach_uprobe(). It also adds a signal handler to disable the probe on
# script exit. If anything goes wrong, it returns an error.
if b.attach_usdt_probe(binary=binary, probe=probe, fn_name="do_trace",
pid=int(pid)) != 0:
print("ERROR: could not instrument %s:%s. Exiting." % (binary, probe))
exit()
# header
print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "ARGS"))
# format output
while 1:
try:
(task, pid, cpu, flags, ts, msg) = b.trace_fields()
except ValueError:
continue
print("%-18.9f %-16s %-6d %s" % (ts, task, pid, msg)) |
I dig it. The USDT implementation has been fully ported to C++ in #498, and it comes with a lot of extra goodies (including new helpers that would make implementing I'd like to get the old USDT Python APIs working on the new C++ implementation first. Once that's done, I'll get started with your suggestions. The PR could use a couple extra eyes btw. |
Ok, thanks. I need to learn the new lua stuff. /examples/lua is a big help! |
Based on the lua USDT api (#518), I think this is how the Python api could look: #!/usr/bin/python
#
# nodejs_http_server Basic example of node.js USDT tracing.
# For Linux, uses BCC, BPF. Embedded C.
#
# USAGE: nodejs_http_server PID
from __future__ import print_function
from bcc import BPF, USDT
import sys
if len(sys.argv) < 2:
print("USAGE: nodejs_http_server PID")
exit()
pid = sys.argv[1]
debug = 1
# load BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
int do_trace(struct pt_regs *ctx) {
uint64_t addr;
char path[128];
bpf_usdt_readarg(6, ctx, &addr);
bpf_probe_read(&path, sizeof(path), (void *)addr);
bpf_trace_printk("path: %s\\n", path);
return 0;
};
"""
# enable USDT probe from given PID
u = USDT(pid=pid)
u.enable_probe(event="http__server__request", fn_name="do_trace")
if debug:
print(u.get_text())
print(bpf_text)
# initialize BPF
b = BPF(text=bpf_text, usdt=u)
# header
print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "ARGS"))
# format output
while 1:
try:
(task, pid, cpu, flags, ts, msg) = b.trace_fields()
except ValueError:
continue
print("%-18.9f %-16s %-6d %s" % (ts, task, pid, msg)) @vmg did you start work porting it to Python already? Looks like there are three things to do:
Is that it? It's a little odd, since normally we're creating a BPF object, and then doing attach_kprobe() later. Here we're doing a USDT enable_probe() and then initializing BPF. Seems backwards. But I get that it's easier to setup this way. CC @goldshtn , who I'm sure is interested as well. |
Here's a start; I'm missing _attach_uprobes() and cleanup. diff --git a/src/python/bcc/__init__.py b/src/python/bcc/__init__.py
index 048712a..fd0d29a 100644
--- a/src/python/bcc/__init__.py
+++ b/src/python/bcc/__init__.py
@@ -142,7 +142,7 @@ class BPF(object):
raise Exception("Could not find file %s" % filename)
return filename
- def __init__(self, src_file="", hdr_file="", text=None, cb=None, debug=0, cflags=[]):
+ def __init__(self, src_file="", hdr_file="", text=None, cb=None, usdt=None, debug=0, cflags=[]):
"""Create a a new BPF module with the given source code.
Note:
@@ -166,6 +166,8 @@ class BPF(object):
self.tables = {}
cflags_array = (ct.c_char_p * len(cflags))()
for i, s in enumerate(cflags): cflags_array[i] = s.encode("ascii")
+ if text and usdt:
+ text = usdt.get_text() + text
if text:
self.module = lib.bpf_module_create_c_from_string(text.encode("ascii"),
self.debug, cflags_array, len(cflags_array))
@@ -773,5 +775,5 @@ class BPF(object):
except KeyboardInterrupt:
exit()
-from .usdt import USDTReader
+from .usdt import USDTReader, USDT
diff --git a/src/python/bcc/usdt.py b/src/python/bcc/usdt.py
index b4e3151..5e8b66d 100644
--- a/src/python/bcc/usdt.py
+++ b/src/python/bcc/usdt.py
@@ -12,6 +12,9 @@
# See the License for the specific language governing permissions and
# limitations under the License.
+import ctypes as ct
+from .libbcc import lib
+
import os
import struct
import re
@@ -19,6 +22,21 @@ import re
from . import BPF
from . import ProcStat, ProcUtils
+class USDT(object):
+ def __init__(self, pid=None):
+ if pid != None:
+ self.context = lib.bcc_usdt_new_frompid(int(pid))
+ if self.context == 0:
+ raise ValueError("valid pid is required for USDT()")
+
+ def enable_probe(self, probe, fn_name):
+ if lib.bcc_usdt_enable_probe(self.context, probe.encode("ascii"),
+ fn_name.encode("ascii")) < 0:
+ raise ValueError("failed to enable USDT probe: " + probe)
+
+ def get_text(self):
+ return ct.cast(lib.bcc_usdt_genargs(self.context), ct.c_char_p).value
+
class USDTArgument(object):
def __init__(self, size, is_signed, location, And example program: #!/usr/bin/python
#
# nodejs_http_server Basic example of node.js USDT tracing.
# For Linux, uses BCC, BPF. Embedded C.
#
# USAGE: nodejs_http_server PID
from __future__ import print_function
from bcc import BPF, USDT
import sys
if len(sys.argv) < 2:
print("USAGE: nodejs_http_server PID")
exit()
pid = sys.argv[1]
debug = 1
# load BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
int do_trace(struct pt_regs *ctx) {
uint64_t addr;
char path[128];
bpf_usdt_readarg(6, ctx, &addr);
bpf_probe_read(&path, sizeof(path), (void *)addr);
bpf_trace_printk("path:%s\\n", path);
return 0;
};
"""
# enable USDT probe from given PID
u = USDT(pid=pid)
u.enable_probe(probe="http__server__request", fn_name="do_trace")
if debug:
print(u.get_text())
print(bpf_text)
# initialize BPF
b = BPF(text=bpf_text, usdt=u)
# header
print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "ARGS"))
# format output
while 1:
try:
(task, pid, cpu, flags, ts, msg) = b.trace_fields()
except ValueError:
print("value error")
continue
print("%-18.9f %-16s %-6d %s" % (ts, task, pid, msg)) |
@brendangregg: This looks very good, and timely too -- I'm working on generalizing @vmg has some work in progress that he shared over here. I think it's more important to get the basic features in the library -- we can take care of |
Oh cool, so @vmg's is much further along. I'll test it out. oh, so I was missing the libbcc.py stuff, and had to do all the ctype stuff manually... |
Fixed in #624 |
User-level statically defined tracing probes have been placed in various applications and runtimes, including Java, Node.js, MySQL, and PostgreSQL. These allow API-stable scripts to be written, that do not depend on tracing raw user-level functions (uprobes).
As an example of hacking in USDT tracing using ftrace, see: http://www.brendangregg.com/blog/2015-07-03/hacking-linux-usdt-ftrace.html . The unpublished script I referred to is: https://gist.github.com/brendangregg/f1b3d09c14088522065b
For a simple example to trace:
1. Create tick-dtrace.d:
2. Then create an object file:
3. Create the target program, tick-main.c:
4. Compile tick-main:
5. Check it has USDT probes:
See NT_STAPSDT etc.
This is a basic probe. There is another type, isenabled, which I discussed in the blog post, and requires a semaphore to activate.
The text was updated successfully, but these errors were encountered: