Higher-level language support for tracing #425

Open
brendangregg opened this Issue Mar 3, 2016 · 4 comments

Projects

None yet

2 participants

Contributor

bcc has been used successfully for the creation of many tracing tools, and may be thought of as a powerful and explicit language for tool authors. It is, however, verbose, and involves common code constructs that feel like boilerplate, causing many tools to be dozens of lines of code. A high-level language could simplify tool generation.

A high-level language can also encourage ad hoc analysis: the development of custom instrumentation at the command line (one-liners), beyond the coverage of the tool collection.

The value of a high-level tracing language has been explored with other tracing tools in the past. This issue is to discuss if/why/how we develop a high-level language for BPF, which may or may not necessarily be part of bcc.

One such project is already underway: https://github.com/iovisor/ply

Contributor

I wanted to share my own opinions in a separate comment: I think this is important and nice to have, but I also have mixed feelings about about it based on prior experience.

About two decades ago Sun released SymbEL (SE for short), a high-level language for creating performance tools. It was able to access all the different sources of performance data, and provided a simple and consistent API for accessing them. Great. SymbEL was shipped as the SE Toolkit, which contained many useful tools as examples: nx.se for network monitoring, zoom.se as system dashboard, etc. You might be able to guess what happened.

Practically no one learned SymbEL, and everyone used the tools. (I might have been the only non-Sun employee who learned and published new SymbEL.) But SymbEL was successful, in a way -- the SE Toolkit became a common system addition, and customers were using its example tools.

If we can make bcc a simple addition to a system, many people will get value from its tools. It'll be the thing you add (like sysstat) to get biosnoop, execsnoop, ext4slower, tcpretrans, etc. People may use it and not even know about bcc or BPF. And the most important work to help this be successful are the high priority features (#231, #232, #327, #328).

But I'm not opposed to a high-level language either, and I'll be one of its top users for ad hoc analysis. It won't help as much as you may think for writing tools: the bcc boilerplate is not the hard work I do -- I spend much more time testing, researching, and documenting these tools.

Here's one way we could design such a language: take the existing tools in bcc, and prototype how they should look, which we can do in this github issue. Some should just be one-liners, for example, syncsnoop should really be something like:

bpt -c 'kprobe:sys_sync { printf("%-18.6f sync()\n", timestamp_ns / 1000000); }'

I just called it bpt as short for bytecode probe tracer.

As a tool/script, it can include the header as well. Using "BEGIN" (from awk):

#!/usr/bin/bpt

BEGIN {
        printf("%-18s CALL\n", "TIME(s)");
}

kprobe:sys_sync {
        printf("%-18.6f sync()", timestamp_ns / 1000000);
}

Anyone want to take a swing at other tools?

Of course there's ply, which is already exploring this.

agentzh commented Jul 31, 2016

+1

@brendangregg Your examples look so much like dtrace's D scripts :)

Contributor

That syntax is based on awk, and DTrace is also based on awk. I think awk is a pretty good fit, although someone may find/create another language that works even better.

It was suggested on twitter recently to try SQL [1]. Too crazy? Here's the previous example in SQL-ish syntax:

SELECT timestamp_ns / 1000000 AS "%-18s sync()" FROM kprobe:sys_sync

Now bitehist.py (from examples/tracing):

SELECT req->__data_len AS log2_hist FROM kprobe:blk_account_io_completion

They aren't as terrible as I would have guessed. :) I'm not advocating for SQL, but I am advocating to experiment and find something that's great.

It's been months since I've commented on this, and in the meantime we've created many new tools, and we have more capabilities. I don't think tool authorship is the biggest problem we'll face. The main problem for many will be what to do with these capabilities. (@agentzh and myself already have a laundry list of things we do in tracing, but I don't think most people do.)

As for the role of a new language: if it's not that necessary for tool generation, then it may be best aimed at supporting ad hoc analysis. Perhaps building on @goldshtn's trace and argdist, which already cover most of the common ad hoc use cases.

[1] https://twitter.com/Kentzo/status/757350220250222594

agentzh commented Aug 4, 2016 edited

@brendangregg Well, SQL is not really that crazy and I think that's a natural thought actually. A running system is a database that is changing at real time per se (well, at least conceptually). And dynamic tracing frameworks just provide query languages or just query capabilities to such special "data bases". So SQL is a natural analogy to the query language that can used here even though our special real-time "data bases" may not be really relational but way more complicated and way richer. I've been thinking along that line as well in the past few years myself as well.

But you know, most of the complexity in our tools lies in walking down complicated data structures in the target process (be it kernel space or user land) and collect enough information or compute results along the way. So I think a good subset of C in the query language can make such walking code much easier to write since we often already have such traversal code in the target system, often in the C/C++ language. I believe such a subset of C language can be very useful in speeding up the hardest part of the tool building process, at least such parts can be "functions" used in a SQL-like query language, for example. Maybe using a SQL-like language to query such complex raw data structures in the virtual memory space can be a fun research project. But more practically, reusing existing imperative C/C++ code snippet is much more practical and usually much easier and much faster.

Along this line, I'm planning a Y language (or just ylang) compiler that can compile a mixture of C subset and some other language features for probe specifications. The mix happens in a single language, not like the mixture of C and Python/Lua in bcc which is very inelegant (no offense to the bcc designers). Such a compiler can generate code targeting different debugging frameworks like GDB/Python, LLDB/Python, SystemTap, and even eBPF (well, right now it's not really possible without disabling a good part of bpf_validate to make eBPF turing complete). Big pain has been involved in recreating a tool for different debugging frameworks. They are quite different in the input spec but similar in semantics. And they all have their own pitfalls and limitations. Such artificial differences make the tracing and debugging world a HUGE mess and my ylang compiler is trying to fix it :)

Well, maybe a bit off topic. Just some brain storming :) I'll start the ylang compiler project pretty soon :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment