-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Go crash with uretprobe #1320
Comments
are you saying that setting a uprobe somewhere in go code makes it fail? |
If we replace uretprobe with uprobe in the bcc code, it works. If uretprobe is used it is crashing the process. |
Here's a comment on my go tracing blog post[1] by Suresh Kumar:
[1] http://www.brendangregg.com/blog/2017-01-31/golang-bcc-bpf-function-tracing.html |
@brendangregg thanks!
|
So is there a solution to this problem for golang ? Or is there any other means to capture func latency if you can suggest? |
I have been hitting this problem as well: almost every time the runtime decides to shrink or expand the goroutine stack, if there is a uretprobe placed the process will crash because the stack is messed up. For Golang, the solution I'm actually experimenting with is to "simulate" a uretprobe by using a series of uprobes. In particular:
Also, this approach has some mild performance benefits, since we avoid the uretprobe overhead. By tracing a tight loop of a simple function from libc:
The drawback is that we now have to decode instructions in userspace, so it's significantly more annoying that the standard alternative, and it's not currently possible with bcc (I'm working using uprobes in a separate project so I have my own BPF loader). I would appreciate some feedback on this approach, I am not an expert on this matter and would like to know if I'm missing something important. Thanks |
Sounds good. But I have to wonder: doesn't the kernel already have the code for walking instructions and finding RETNs for the creation of uretprobes? Could a future solution be a variant or flag for uretprobes so the kernel did this behavior? |
In general, since uretprobes are implemented by overwriting the return address on the stack, the kernel currently doesn't need to walk the function instructions. However, as you correctly point out, the per-arch uprobes code already has a pretty comprehensive support for decoding instructions for all the major archs (e.g. I could definitely see a future feature where the kernel would, optionally, directly walk the function and place uprobes at the various return instructions. Userspace would just have to provide the symbol address and likely the symbol length, so a small API change overall. Why hasn't it been done this way in the first place? I have briefly searched the relevant kernel commit logs, lkml archives and the original uprobes paper, but couldn't find a single mention about this approach. Naively I would say because it's more complicated, but it would be interesting to hear some kernel developer's point of view. In my specific case, I was sharing the userspace solution because I am trying to introduce this functionality in sysdig, and being able to support this without any kernel changes would mean a much wider user adoption, as opposed to waiting for the feature to be upstreamed and, more painfully, waiting for it to make its way into the various distributions. Thanks for the feedback! |
After spending a bit more time on this, I can easily answer my original question, which might be useful if someone else reads my previous comments. Simply placing uprobes when arch-specific return instructions are encountered is not enough. Even if we forget for a second "corner cases" such as For example, the function
The dynamic symbol table tells us
The function doesn't return itself, it simply jumps into The problem is worse when the tail call optimization is done with a variable function pointer. For example, this is
In this case, there's nothing we can do with static analysis, since the function pointer is obtained at runtime by dereferencing an argument, as we can see from the code:
Placing a uretprobe via stack manipulation seems to be the only feasible way. "Luckily", it seems Golang, when using the default GC compiler, is not currently doing any tail call optimization [1] [2], and in fact I've done a few experiments and even with very simple functions the call stack is properly kept. So, for the moment the original method I proposed seems to work, but it's important to keep in mind it might break when/if the GC compiler decides to introduce the optimization. [1] https://blog.gopheracademy.com/recursion/ |
Sorry to reopen this old thread. We ran into this same problem. @gianlucaborello Hi Gianluca, for your suggested workaround in |
Just to get back to this issue: We did what's suggested in #1320: https://github.com/pixie-io/pixie/blob/5960b3535bbed52763dec0bbb6b2bf20b74c23ac/src/stirling/obj_tools/elf_reader.cc#L502:44 AFAIK, the known issues in this approach are:
Additionally, we have not investigated the behavior in exotic cases like recursive functions, and could become sources of additional bugs. |
With respect to this comment:
It seems that, since Go 1.17 the arguments are passed as registers so I don't see a way to get an arbitrary argument from the uprobe inserted before the function return: |
I am attempting to get function latencies from a module written in 'Go'. I am using uprobe and uretprobe in a custom eBPF program to track the function entry and exit respectively.
Below is the system config I am working on:
The issue is that the Go process crashes because of the conflict in the way return probes work and Go manages stack.
I am interested in knowing if there is any solution to the issue or a workaround for this. Does uretprobe work with Go in any of the later versions, or is this issue fixed in later versions.
The text was updated successfully, but these errors were encountered: