Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix crashes caused by improper error handling on bpf_look/update/delete_elem #2623

Merged
merged 2 commits into from Jun 3, 2023

Conversation

maokelong
Copy link

@maokelong maokelong commented May 24, 2023

bpf_*_elem returns -errno when error occured. These functions will return -ENOENT(-2) when they failed to find a entry. It's a harmless error that could be found when doing delete and print/zero/clear operations on the same map simultaneously. It's not reasonable to stop the whole program at this moment.

A simple script could reproduce this annoying situation:

i:us:1 {
    @data[rand % 100] = count();
    if (rand % 5 == 0) {
        delete(@data[rand % 10]);
    }
}

i:ms:1 {
    clear(@data);
    // or print(@data);
    // or zero(@data);
}

This script crashed almost immediately and reports errors likes:

Attaching 2 probes...
ERROR: failed to look up elem: -2
terminate called after throwing an instance of 'std::runtime_error'
  what():  Could not clear map with ident "@data", err=-1
fish: 'sudo bpftrace ../../../workspac…' terminated by signal SIGABRT (Abort)

But It's accepable for users that some entries are deleted asynchronously when tring to clear/print/zero a map.

Checklist
  • Language changes are updated in man/adoc/bpftrace.adoc and if needed in docs/reference_guide.md
  • User-visible and non-trivial changes updated in CHANGELOG.md
  • The new behaviour is covered by tests

bpf_*_elem returns -errno when error occured. These functions will
return -ENOENT(-2) when they failed to find a entry. It's a harmless
error that could be found when doing delete and print/zero/clear
operations on the same map simultaneously. It's not reasonable to
stop the whole program at this moment.

Signed-off-by: maokelong <chenjinglong1@huawei.com>
Copy link
Member

@danobi danobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change lgtm. But could you please add a runtime test so we can avoid future regressions?

The script you provided in your PR description seems reasonable to use.

Here's some hints on runtime tests: https://github.com/iovisor/bpftrace/blob/master/tests/README.md#runtime-tests

@maokelong maokelong force-pushed the master branch 2 times, most recently from 3468590 to 194743a Compare May 25, 2023 19:52
@maokelong
Copy link
Author

maokelong commented May 25, 2023

Change lgtm. But could you please add a runtime test so we can avoid future regressions?

The script you provided in your PR description seems reasonable to use.

Here's some hints on runtime tests: https://github.com/iovisor/bpftrace/blob/master/tests/README.md#runtime-tests

@danobi Done. And here's the test report on my pc. : )

Without this patch:
image

With this patch:
image

Copy link
Member

@danobi danobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit but lgtm otherwise

tests/runtime/scripts/parallel_map_access.bt Show resolved Hide resolved
Signed-off-by: maokelong <chenjinglong1@huawei.com>
Copy link
Member

@danobi danobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

Copy link
Contributor

@viktormalik viktormalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks! Could you also add a CHANGELOG entry?

@danobi
Copy link
Member

danobi commented Jun 3, 2023

I'll push a changelog entry up for this

@danobi danobi merged commit a20675c into bpftrace:master Jun 3, 2023
21 checks passed
danobi pushed a commit to danobi/bpftrace that referenced this pull request Jun 3, 2023
danobi added a commit to danobi/bpftrace that referenced this pull request Jun 3, 2023
danobi added a commit that referenced this pull request Jun 3, 2023
@ajor
Copy link
Member

ajor commented Jun 7, 2023

Thanks for the fix!

This did confuse me as I thought we'd already handled at least one of these cases years ago: #823

With a bit of digging it turns out that libbpf changed how they returned errors in 2021: libbpf/libbpf@7c7ba06

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants