-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Atomicity of Map operation helpers #1521
Comments
Not only is it not clear to our users, I think it is not clear to the tool authors either 😉 I know most of our tools would probably have these embedded race conditions. I sort of hope it wouldn't be critical in most cases, but for high-frequency events the collisions on multiple CPUs could be frequent and produce skewed results. My personal feeling is that we should opt for the synchronized alternative and keep the unsafe behavior for per-CPU maps, but it also depends on how bad the overhead would be. E.g. for a map update that happens 1M times per second across 8 CPUs (for something like counting sched:sched_switch per pid), what is the overhead we should expect from running the map updates synchronized? |
Should we have
|
The way to do synchronization is through |
It would be nice to have examples / tools for percpu tables. Both
Having just I'm also not quite sure why we can't have at least Kernel ships with examples doing just that: |
bcc does have one for
|
Agreed that we may want to have some examples (or tools if racing is a concern) to use per-cpu array/hash. Typical example is like
If there is a potential racing here, then maybe the map |
Would it be possible to add few additional sentences about this to the documentation to raise awareness of the issue? Thanks. |
Yes, it would be good if we have volunteers for this. Do you want to give a try? |
I'm very low on time and I'm not sure would I be knowledgeable enough at this point to instruct others. |
We'd like to track network traffic on file descriptors. It's obviously that threads on different CPUs would write/send/read/recv() on the same file descriptor all the time. This type of atomic operations would be necessary. |
bcc should already supports this. does anybody want to try this and maybe write an example to show how it works in bcc environment? |
@yonghong-song I haven't had any success so far with
My kernel version is: 5.1.3-arch1-1-ARCH |
@housedhorse that is strange. 5.1 should have bpf_spin_lock. You can run |
@yonghong-song
Edit: Here is a link for convenience. |
@housedhorse Aha, I missed this. Could you describe your use case here? If we have a solid use case, we can add |
@yonghong-song I'm maintaining per-executable profiles that need to be created and updated asynchronously. |
@housedhorse could you describe details why |
@yonghong-song I can't go into too much detail since the work I'm doing is related to a research project. Essentially I have profile structures mapped to unique executables which are potentially being updated by multiple processes (running the same binary) at a time. This means per-cpu tables are not an option here. |
@housedhorse no problem. Looks like we do have a use case for tracing. Anybody wants to volunteer doing a kernel patch to extend |
@yonghong-song What kind of work is required? |
In this commit which support bpf_spin_lock/bpf_spin_unlock, The two helpers need to be added to helpers in kernel/trace/bpf_trace.c. |
@yonghong-song I've created a kernel patch with the changes you specified. However, now I'm getting the following:
For some reason the bpf_spinlock struct seems not to be included properly. Here's my diff on Linux 5.3:
Here's my minimal example: typedef struct
{
u64 my_data;
struct bpf_spin_lock my_lock;
}
locked_data;
BPF_PERF_OUTPUT(event);
BPF_ARRAY(testificate, locked_data, 1);
TRACEPOINT_PROBE(raw_syscalls, sys_enter)
{
int key = 0;
locked_data *d = testificate.lookup(&key);
return 0;
} Do you have any further guidance to offer? Edit: I have no problems with the struct on my host OS, only my VM running the custom kernel. I'm thinking I may have screwed up the configuration somehow. |
The map cannot be created. I think the reason mostly due to no BTF (BPF debug format) available. BTF is only available at LLVM9 and later. Could you give a try with latest bcc source? We did not really test BTF in our builtbot because of old LLVM compilers. Not 100% sure whether BTF is broken or not for distros. It would be good if you can check it in your environment. |
Thanks very much for the reply. I'll probably have some time to investigate tomorrow. |
just to be clear. bpf_spin_lock() requires BTF. |
FWIW, it looks like my distro (Ubuntu 19.10) does not have an llvm that emits the right stuff:
llvm version appears to be 10.0, but I get the error above. I'm using bcc from iovisor, not the distro-provided version. Is that causing a mismatch leading to the failure to be able to use a spin lock? Thanks for any help and/or advice. |
Could you add |
Thanks. I can see from the debug output (below) that it looks like
If I continue to have some issues after forcing my bcc to use a newer version I'll follow up. In any case, thanks for your response and help! |
Yes, please try llvm10. |
Got success, which required purging the system of a few old versions of llvm & clang -- cmake was picking up one of the older versions on the system. Once only a single version was installed (I used llvm 9, actually), the BTF annotations were correctly emitted and the spin lock didn't cause any problems. Thanks again. |
Hi @jsommers, 0 maps not supported in current map section! Did you face any such issue? |
@yonghong-song @willfindlay I'm using kernel 5.8 and llvm 11.0. I don't know why? Can you give any advices? Thanks! |
Did you use bpf_spin_lock() on the tracing program or socket filter?
|
@netedwardwu |
I think so. And the below is bpf_spin_lock patch. Also for reference. But I don't know too detail about it. |
Got it, I'm trying a workaround. I just add BPF_FUNC_spin_lock in bpf_trace.c. Hope it will work! |
I don't think it's a good idea
It means you will get trouble because of insufficient preemption checks. |
There are some major issues to use
Yes, all these issues need to be resolved before allowing tracing (kprobe, etc.) to use bpf_spin_lock. @wwwzrb could you describe your use case? It would be good to do a deep analysis to see whether bpf_spin_lock() is really needed or not. If it is, we could see how to improve kernel. @netedwardwu is right. In general using bpf_spin_lock is not safe for tracing programs. But you could do some experiments if you are aware of the pitfalls I listed in the above. |
I'm very agree with you that wrong usage of bpf_spin_lock can easily deadlock. I'll try to clarify the use case. Thanks for your suggestions. @netedwardwu @yonghong-song Exsiting bpf programs usually use BPF_HASH_MAP for key/value strorage, update and so on. I'm trying to replace the functionality of BPF_HASH_MAP by BPF_ARRAY_MAP which supports mmap to reduce the overhead of reading BPF_MAP at user space, i.e., user apps can get access to BPF_ARRAY_MAP directly after mmap without invloking bpf syscall. However, lots of efforts remain to be done to use BPF_ARRAY_MAP:
So we may need bpf_spin_lock for concurrent read/update of index where only using BPF_XADD/__sync_fetch_and_add may not be sufficient. |
Why do you need to "reduce the overhead of reading BPF_MAP at user space"? And in BPF program it has RCU protection...I don't think that you need bpf_spin_lock so far... |
I mean in the case where you have many elements in BPF_MAP and read frequently, the overhead of syscall to traverse the whole BPF_MAP will be evident. I'll try if it works without bpf_spin_lock. By the way, how to enable BTF support? I'm using kernel 5.8, llvm-10.0.0 and pahole 1.17. Also BTF is enabled in kernel config. |
Your kernel version, pahole version, llvm version sounds right. |
Yes, I modify the map_lock test case under /tools/testing/selftests/bpf to check whether bpf_spin_lock works with tracing context. I find that the bpf prog is load successfully after adding bpf_spin_lock func proto in bpf_trace.c and annotating the verifier part which forbiddens usage of bpf_spin_lock in tracing prog. However, the kprobe/tracepoint is not involked at all in our testing. The sample code is attached below as attachment:
|
We have certain helpers wraps around and would be rewritten into separated BPF Map syscalls, such as
lookup_or_init
andincrement
, both involves a lookup and a potential update. However, race condition could happen that BPF programs running on two different CPUs both see empty result and tries to do the create (update), and thus only one would success. We are not handling those cases in our rewritten code (no error handling hereand here, and it's not clear to the users that those helpers (that looks like one single operation) are not atomic.
In addition, map value operations such as in
increment
, could also have race conditions such as if two programs tries to do value pointer increment at the same time. Do you think we should replace those with__sync_fetch_and_add
if the map is not a per-CPU map? But that would certainly cause performance loss...The text was updated successfully, but these errors were encountered: