Go crash with uretprobe #1320

ggaurav10 · 2017-08-25T10:14:15Z

I am attempting to get function latencies from a module written in 'Go'. I am using uprobe and uretprobe in a custom eBPF program to track the function entry and exit respectively.
Below is the system config I am working on:

4.4.0-81-generic #104~14.04.1-Ubuntu SMP Wed Jun 14 12:45:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

The issue is that the Go process crashes because of the conflict in the way return probes work and Go manages stack.
I am interested in knowing if there is any solution to the issue or a workaround for this. Does uretprobe work with Go in any of the later versions, or is this issue fixed in later versions.

The text was updated successfully, but these errors were encountered:

4ast · 2017-08-26T03:13:06Z

are you saying that setting a uprobe somewhere in go code makes it fail?
That is certainly a kernel bug. Please describe the steps to reproduce.
If uprobe is responsible it should fail without bcc. Just adding a uprobe
as described here:
https://www.kernel.org/doc/Documentation/trace/uprobetracer.txt
should make it fail.

shashankmjain · 2017-08-26T09:16:22Z

If we replace uretprobe with uprobe in the bcc code, it works. If uretprobe is used it is crashing the process.

brendangregg · 2017-08-26T17:27:02Z

Here's a comment on my go tracing blog post[1] by Suresh Kumar:

Another problem I ran into: the uretprobe seems to place the return probes by modifying the stack, which is in conflict with how Go manages stack (stacks in Go can grow/shrink at anytime, it does so by copying entire stack to a new larger area, adjusting the pointers in the stack to point to new area etc). So if we are doing a uretprobe, and stack happens to grow (or shrink) at that time, it can lead Go runtime panics. Please see here for an example panic message: https://github.com/surki/misc/blob/master/go.stp#L32-L58

[1] http://www.brendangregg.com/blog/2017-01-31/golang-bcc-bpf-function-tracing.html

4ast · 2017-08-27T00:05:31Z

@brendangregg thanks!
Indeed that's the case.
uprobe works whereas uretprobe makes c++ exception handling fail

$ cat throw.cxx
#include <stdexcept>
int foo(int x) { if (x == 1) throw std::runtime_error("oops!"); return x; }
int main(int ac) {
    try { foo(ac); } catch (std::runtime_error&) {}
    return 0;
}
$ clang++ throw.cxx
$ sudo /usr/share/bcc/tools/trace './a.out:_Z3fooi "%d" arg1' &
$ ./a.out
PID    TID    COMM         FUNC             -
2376753 2376753 a.out        _Z3fooi          1
2376841 2376841 a.out        _Z3fooi          1
$ sudo /usr/share/bcc/tools/trace 'r:./a.out:_Z3fooi "%d" retval' &
$ ./a.out
terminate called after throwing an instance of 'std::runtime_error'
  what():  oops!
$ ./a.out 2
PID    TID    COMM         FUNC             -
2413547 2413547 a.out        _Z3fooi          2

cc @yonghong-song @markdrayton @palmtenor

shashankmjain · 2017-08-27T02:10:45Z

So is there a solution to this problem for golang ? Or is there any other means to capture func latency if you can suggest?

gianlucaborello · 2018-07-25T23:24:59Z

I have been hitting this problem as well: almost every time the runtime decides to shrink or expand the goroutine stack, if there is a uretprobe placed the process will crash because the stack is messed up.

For Golang, the solution I'm actually experimenting with is to "simulate" a uretprobe by using a series of uprobes. In particular:

Given a Golang binary, parse the ELF symbol table and obtain the address of the symbol we want to trace. If needed, attach a uprobe at such address, for example if we want to calculate the timestamp at which the function began executing for latency purposes (be careful to obtain a unique id such as the goroutine id if this information gets saved into a map, don't use the tid).
Instead of attaching a uretprobe to the symbol address, read the ELF text section starting at that address and start decoding instructions until the end of the symbol is reached. While scanning, place a uprobe at every instruction that returns from the procedure (e.g. for x86-64 that would be RETN instructions, opcodes 0xC2 and 0xC3). For the symbols I'm interested in, there's always a very manageable number of RETN instructions, in the 1-5 range, so that's reasonable.
When one of the uprobes installed at the previous point triggers, it's effectively as if we were executing a uretprobe, except that we haven't messed with the stack and so the solution is robust enough to not crash when the Go runtime moves the stack (or so it seems). Also, since the uprobe is placed right before the RET instruction, the stack pointer is already conveniently placed at the beginning of the frame, so we have easy access to the input arguments and return values at a constant offset, since they are all stored on the stack in Go.

Also, this approach has some mild performance benefits, since we avoid the uretprobe overhead. By tracing a tight loop of a simple function from libc:

Non-instrumented function call: 2 ns/call
Instrumented function call with uretprobe: 4 us/call
Instrumented function call with 2 uprobes (at enter + RET instructions): 3 us/call

The drawback is that we now have to decode instructions in userspace, so it's significantly more annoying that the standard alternative, and it's not currently possible with bcc (I'm working using uprobes in a separate project so I have my own BPF loader).

I would appreciate some feedback on this approach, I am not an expert on this matter and would like to know if I'm missing something important.

Thanks

brendangregg · 2018-07-27T21:14:43Z

Sounds good. But I have to wonder: doesn't the kernel already have the code for walking instructions and finding RETNs for the creation of uretprobes? Could a future solution be a variant or flag for uretprobes so the kernel did this behavior?

gianlucaborello · 2018-07-27T22:16:43Z

In general, since uretprobes are implemented by overwriting the return address on the stack, the kernel currently doesn't need to walk the function instructions.

However, as you correctly point out, the per-arch uprobes code already has a pretty comprehensive support for decoding instructions for all the major archs (e.g. arch_uprobe_analyze_insn() function in the tree), which is currently used mostly for validation and to know how to properly handle the out of line execution of the probed instruction.

I could definitely see a future feature where the kernel would, optionally, directly walk the function and place uprobes at the various return instructions. Userspace would just have to provide the symbol address and likely the symbol length, so a small API change overall.

Why hasn't it been done this way in the first place? I have briefly searched the relevant kernel commit logs, lkml archives and the original uprobes paper, but couldn't find a single mention about this approach. Naively I would say because it's more complicated, but it would be interesting to hear some kernel developer's point of view.

In my specific case, I was sharing the userspace solution because I am trying to introduce this functionality in sysdig, and being able to support this without any kernel changes would mean a much wider user adoption, as opposed to waiting for the feature to be upstreamed and, more painfully, waiting for it to make its way into the various distributions.

Thanks for the feedback!

gianlucaborello · 2018-11-26T20:18:58Z

Why hasn't it been done this way in the first place?

After spending a bit more time on this, I can easily answer my original question, which might be useful if someone else reads my previous comments.

Simply placing uprobes when arch-specific return instructions are encountered is not enough. Even if we forget for a second "corner cases" such as panic() in Go, we still have to deal with tail call optimizations done by many compilers.

For example, the function gnutls_record_recv from GnuTLS goes like this:

gnutls_record_recv(gnutls_session_t session, void *data, size_t data_size)
{
	...

	return _gnutls_recv_int(session, GNUTLS_APPLICATION_DATA,
				data, data_size, NULL,
				session->internals.record_timeout_ms);
}

The dynamic symbol table tells us gnutls_record_recv is stored at 0x317e0 and is 26 bytes long, so we can inspect the code:

317e0:    44 8b 8f 3c 15 00 00     mov     0x153c(%rdi),%r9d
317e7:    48 89 d1                 mov     %rdx,%rcx
317ea:    45 31 c0                 xor     %r8d,%r8d
317ed:    48 89 f2                 mov     %rsi,%rdx
317f0:    be 17 00 00 00           mov     $0x17,%esi
317f5:    e9 f6 f8 ff ff           jmpq    0x310f0

The function doesn't return itself, it simply jumps into _gnutls_recv_int, which is at 0x310f0, instead of properly calling it and then explicitly returning. This already makes it difficult to proceed, because we don't know if some other function is calling _gnutls_recv_int, so by recursively instrumenting it (which we can do programmatically, by following jumps until we exhaust the search) we might end up instrumenting too much, as opposed to doing the stack manipulation done by uretprobes.

The problem is worse when the tail call optimization is done with a variable function pointer. For example, this is PR_Recv from lib NSPR:

PR_IMPLEMENT(PRInt32) PR_Recv(PRFileDesc *fd, void *buf, PRInt32 amount, PRIntn flags, PRIntervalTime timeout)
{
	return((fd->methods->recv)(fd,buf,amount,flags,timeout));
}

In this case, there's nothing we can do with static analysis, since the function pointer is obtained at runtime by dereferencing an argument, as we can see from the code:

e690:       48 8b 07                mov    (%rdi),%rax
e693:       48 8b 80 88 00 00 00    mov    0x88(%rax),%rax
e69a:       ff e0                   jmpq   *%rax

Placing a uretprobe via stack manipulation seems to be the only feasible way.

"Luckily", it seems Golang, when using the default GC compiler, is not currently doing any tail call optimization [1] [2], and in fact I've done a few experiments and even with very simple functions the call stack is properly kept. So, for the moment the original method I proposed seems to work, but it's important to keep in mind it might break when/if the GC compiler decides to introduce the optimization.

[1] https://blog.gopheracademy.com/recursion/
[2] golang/go#22624

yzhao1012 · 2019-11-04T17:55:59Z

Sorry to reopen this old thread. We ran into this same problem.

@gianlucaborello Hi Gianluca, for your suggested workaround in
#1320 (comment)
Is the code open-sourced?

yzhao1012 · 2021-11-29T21:02:33Z

Just to get back to this issue:

We did what's suggested in #1320: https://github.com/pixie-io/pixie/blob/5960b3535bbed52763dec0bbb6b2bf20b74c23ac/src/stirling/obj_tools/elf_reader.cc#L502:44

AFAIK, the known issues in this approach are:

Overhead, now there can be multiple probes for one logical "return probe".
Poor portability. Might miss certain return instructions on x86, and need to deal different instruction set. We enumerate all known instructions for x86 in https://github.com/pixie-io/pixie/blob/5960b3535bbed52763dec0bbb6b2bf20b74c23ac/src/stirling/obj_tools/elf_reader.cc#L443:6

Additionally, we have not investigated the behavior in exotic cases like recursive functions, and could become sources of additional bugs.

mariomac · 2023-02-22T13:54:30Z

With respect to this comment:

Also, since the uprobe is placed right before the RET instruction, the stack pointer is already conveniently placed at the beginning of the frame, so we have easy access to the input arguments and return values at a constant offset, since they are all stored on the stack in Go.

It seems that, since Go 1.17 the arguments are passed as registers so I don't see a way to get an arbitrary argument from the uprobe inserted before the function return:
https://go.googlesource.com/go/+/refs/heads/dev.regabi/src/cmd/compile/internal-abi.md#function-call-argument-and-result-passing

gianlucaborello mentioned this issue Jul 25, 2018

runtime: ebpf uretprobe support golang/go#22008

Open

patricksuo mentioned this issue Aug 23, 2018

runtime: fatal error: unknown caller pc when uprobes are attached golang/go#27077

Closed

nyarly mentioned this issue Aug 23, 2018

Gotrace #1719

Open

brendangregg mentioned this issue Oct 16, 2018

Allow '.' in idents for golang bpftrace/bpftrace#166

Merged

fbs mentioned this issue Nov 16, 2019

Golang function latency bpftrace/bpftrace#970

Closed

igorwwwwwwwwwwwwwwwwwwww mentioned this issue May 1, 2020

Golang multiple return values bpftrace/bpftrace#1312

Closed

yonghong-song mentioned this issue Jul 30, 2020

tools/funclatency: run funclatency with go program cause SIGILL #3034

Open

florianl mentioned this issue Mar 29, 2023

Request: Allow opening executables without loading symbols cilium/ebpf#987

Closed

dentarg mentioned this issue Sep 20, 2023

87 Illegal instruction (core dumped) amqproxy cloudamqp/amqproxy#117

Closed

hsueh-yh mentioned this issue Dec 28, 2023

uretprobe会触发brpc服务SIGILL apache/brpc#2486

Open

ajwerner mentioned this issue Sep 6, 2024

cmd/compile: add epilogue_begin to DWARF line info golang/go#69329

Open

alban mentioned this issue Sep 11, 2024

[RFE] Support for uretprobes in Golang programs inspektor-gadget/inspektor-gadget#3474

Open

keisku mentioned this issue Sep 12, 2024

Use uprobe instead of uretprobe keisku/gmon#17

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Go crash with uretprobe #1320

Go crash with uretprobe #1320

ggaurav10 commented Aug 25, 2017

4ast commented Aug 26, 2017

shashankmjain commented Aug 26, 2017

brendangregg commented Aug 26, 2017

4ast commented Aug 27, 2017

shashankmjain commented Aug 27, 2017 •

edited

Loading

gianlucaborello commented Jul 25, 2018 •

edited

Loading

brendangregg commented Jul 27, 2018

gianlucaborello commented Jul 27, 2018

gianlucaborello commented Nov 26, 2018 •

edited

Loading

yzhao1012 commented Nov 4, 2019

yzhao1012 commented Nov 29, 2021

mariomac commented Feb 22, 2023

Go crash with uretprobe #1320

Go crash with uretprobe #1320

Comments

ggaurav10 commented Aug 25, 2017

4ast commented Aug 26, 2017

shashankmjain commented Aug 26, 2017

brendangregg commented Aug 26, 2017

4ast commented Aug 27, 2017

shashankmjain commented Aug 27, 2017 • edited Loading

gianlucaborello commented Jul 25, 2018 • edited Loading

brendangregg commented Jul 27, 2018

gianlucaborello commented Jul 27, 2018

gianlucaborello commented Nov 26, 2018 • edited Loading

yzhao1012 commented Nov 4, 2019

yzhao1012 commented Nov 29, 2021

mariomac commented Feb 22, 2023

shashankmjain commented Aug 27, 2017 •

edited

Loading

gianlucaborello commented Jul 25, 2018 •

edited

Loading

gianlucaborello commented Nov 26, 2018 •

edited

Loading