Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop hello_world return 0 #139

Closed
brendangregg opened this issue Aug 18, 2015 · 7 comments
Closed

Drop hello_world return 0 #139

brendangregg opened this issue Aug 18, 2015 · 7 comments

Comments

@brendangregg
Copy link
Member

From examples/hello_world.py:

prog = """
int hello(void *ctx) {
  bpf_trace_printk("Hello, World!\\n");
  return 0;
};

Is the int ... return 0 necessary? I don't see how it's handled. If it isn't necessary, it would be good to drop it from the hello world example, and make this void.

@brendangregg
Copy link
Member Author

this has been done. closing ticket.

@drzaeus77
Copy link
Collaborator

Actually, this is probably working on accident. I was delinquent in not responding to this issue with an explanation earlier, and I didn't notice this in the code change a few days back.

It is not advised to return an undefined value, since it will be used by the different kernel hookpoints (socket, kprobe, etc.) in different ways. In the case of socket, the return value is used to denote whether to capture the packet to userspace or not. In the case of kprobe, the return value is used to determine whether the perf event that may be attached to the same kprobe should be recorded or not. So, an undefined return value may cause quirky perf behavior, if they are running concurrently.

I would propose that a void return value should only be allowed for function types that have special handling to mangle the return value to be 0 or whichever sane default is appropriate for that function type. I can add this support for kprobes.

@drzaeus77 drzaeus77 reopened this Sep 11, 2015
@brendangregg
Copy link
Member Author

Not sure I understand what you mean: so the k[ret]probe__ functions could be made void safe, but others should return something properly?

@drzaeus77
Copy link
Collaborator

Let me rephrase. All BPF programs must leave a valid value in R0. All kernel hooks that run a BPF program interpret R0 in some way. If we let BCC allow void programs, it will have to internally stuff in a default return value, since we don't want undefined behavior. The choice of 0 for the default value is the most universal one, and since we are focusing on probes I feel comfortable allowing this default for those types of programs (non-zero is the error/skip case for probes), but not as comfortable for the others. For instance, for SCHED_CLS, the return value currently has a specific meaning depending on how the program is stitched in the tc qdisc/filter/action chain. Additionally, non-kprobe program types usually aren't one-liners, so it isn't a great burden to keep program return type required.

@brendangregg
Copy link
Member Author

I'm fine with returning 0 for now, eg, changing hello_world (and others) back to:

from bcc import BPF

BPF(text='int kprobe__sys_clone(void *ctx) { bpf_trace_printk("Hello, World!\\n"); return 0; }').trace_print()

We'd just need to be clear that it has value and what it does. I'm still not clear what happens if I return 1 in this example. hello_world.py needs to be 100% explainable, and make 100% sense. :)

@drzaeus77
Copy link
Collaborator

I just got a better explanation from Alexei, so now hopefully my explanation makes more sense as well.

Let's try an example with hello_world.

$ python -c "from bcc import BPF; BPF(text='int kprobe__sys_clone(void *ctx) { return 0; }').trace_print()"

Now in another shell:

$ perf list | grep sys_clone
  kprobes:p_sys_clone                                [Tracepoint event]
$ perf record -e kprobes:p_sys_clone -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.360 MB perf.data ]
$ perf report --stdio
Error:
The perf.data file has no samples!

Return 1 instead.

$ python -c "from bcc import BPF; BPF(text='int kprobe__sys_clone(void *ctx) { return 1; }').trace_print()"
$ perf report --stdio
...
# Samples: 47  of event 'kprobes:p_sys_clone'
...
    78.72%  byobu-status  [kernel.vmlinux]  [k] sys_clone
    17.02%  tmux          [kernel.vmlinux]  [k] sys_clone
     4.26%  bash          [kernel.vmlinux]  [k] sys_clone

So what changed? Similar to attaching a bpf program to a socket, a kprobe bpf program acts like a filter for other subscribers to the same event. Returning 0 tells the kprobe infrastructure to return immediately after running the bpf program, whereas non-zero allows the other subscribers to run, thus paying the cost of allocating space in the trace buffer for this event and submitting it (for details see kernel/trace/trace_kprobe.c:kprobe_perf_func()). This last step can be costly, but it can also be powerful.

If we are using bpf programs in a standalone mode, where they do all of the work and coordinate with userspace purely through bpf maps and trace_printks, then it is not necessary or performant to pay the cost of submitting the event to the buffer only to discard it later. Currently this is the case for all of our tools and examples.

Having said that, there is a great future opportunity to use this return value in conjunction with other perf events to intelligently trace with the full power of perf.

I don't think this answer gets towards a 100% explainable and 100% sensical user story, but hopefully it makes sense to you. How should we spin it for the one-liner case without getting into the technical weeds?

@brendangregg
Copy link
Member Author

I'm going to return the "return 0"s, given the explanations above, and close this ticket.

samuelnair added a commit to samuelnair/bcc that referenced this issue Aug 22, 2017
samuelnair added a commit to samuelnair/bcc that referenced this issue Sep 2, 2017
natoscott pushed a commit to performancecopilot/pcp that referenced this issue Jun 1, 2018
Add a USDT module collecting metrics from JVM thread starts/stops.
This module does not require any specific Java parameters to work.
Could be used as an example / reference or for collecting metrics.

Wrt the return value in the BPF program even in the unlikely case
that lookup fails, iovisor/bcc#139 has
explanation on BPF program values and it should be OK for BCC PMDA
modules to use 0 (by default, an admin could of course change it).
FridayOrtiz pushed a commit to redcanaryco/redcanary-ebpf-sensor that referenced this issue Nov 17, 2021
* create `do_exit` specific probes that guarantee exit_exit()

* add a function to grab inode information from task_struct

* grab deeper nested paths, up to 150 directories deep

* return 0 in all cases per iovisor/bcc#139
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants