Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add allow_partial_usdt_init param so partial init of USDT vector is possible #2476

Merged
merged 1 commit into from Nov 14, 2019

Conversation

davemarchevsky
Copy link
Collaborator

Looking for early feedback here before I write tests / flesh this out more.

Consider this scenario: I'm hoping to attach USDTs to many short-lived processes, let's say 30. I want to gather data from USDT handlers on these processes for 5 seconds. It's guaranteed that at least one of the processes that was alive when I constructed the vector of USDTs to pass into BPF::init will not be alive by the time the 5 seconds are up. I'm OK with this and would still like to gather as much data as possible.

If the process dies after USDT::init() is successful but before attach_usdt is called, we're already OK because we can handle each StatusTuple independently outside of bcc. Similarly, if process dies between attach_usdt and detach_usdt calls, we're fine for same reason.

But if it dies between construction of USDT object and USDT::init() being called by BPF::init(), we currently consider failure to init any USDT critical enough to stop the whole BPF::init().

This PR adds a allow_partial_usdt_init param to BPF::init to let users of bcc signal that they're OK with USDT::init() failures. USDT::initialized() getter is added so bcc user can check which USDTs succeeded / failed.

One downside of this initial pass is that the info in StatusTuples for USDT::init is lost. So if we have some USDT::init failures when allow_partial_usdt_init is true it'll be hard to get visibility into them.

/cc @yonghong-song

@yonghong-song
Copy link
Collaborator

[buildbot, test this please]

@yonghong-song
Copy link
Collaborator

[buildbot, ok to test]

@anakryiko
Copy link
Contributor

Would it be better to have BPF::init and BPF::init_usdt (called for each USDT separately) better and more flexible solution? That way application can decide for itself which USDTs are critical and which can be ignored, etc.

@davemarchevsky
Copy link
Collaborator Author

davemarchevsky commented Jul 30, 2019

@anakryiko This sounds better to me, I was optimizing for "least change to API surface". Some clarification questions:

  • Are you proposing that BPF::init no longer accept a vector of USDTs? And therefore that its functionality be split out into BPF::init_usdt?
    • Alternatively, we can have BPF::init accept USDTs but consider those required to init successfully but allow you to call BPF::init_usdt before calling BPF::init for non-success-required USDT initialization. This would preserve current functionality while enabling my use-case.

I'm going to take a stab at the approach proposed in "Alternatively," unless you're opposed.

@anakryiko
Copy link
Contributor

@davemarchevsky, yeah, I was thinking about second approach, no need to break existing behavior. Thanks!

@palmtenor
Copy link
Member

The only reason we do the USDT::init in BPF::init is that we couldn't copy the initialized probe_ pointer, otherwise user would just initialize the USDT instances outside, pick those succeeded and pass (move) them into BPF::init.

I'm OK with Andrii's idea. We can also add a BPF::add_usdt_info call which need to happen before BPF::init but can fail individually so that users can just add USDTs one by one and choose to ignore the failures.

@davemarchevsky
Copy link
Collaborator Author

davemarchevsky commented Aug 10, 2019

New commit implements @anakryiko's suggestion.

  • Remove no-longer-needed USDT::initialized getter
  • Add BPF::init_usdt: tries to init USDT while being tolerant of failure. Should be called before BPF::init
  • Add BPF::init_fail_reset, contains common logic for "We had a failure somewhere in BPF::init, reset state that we've modified while initing so we can try again.
    • This fixes an additional issue that's maybe a bug. I have a few programs that try doing BPF::init with some USDTs and, if there's an issue initing them, try doing BPF::init again without the usdts arg. Apparently this never worked because usdt_ vector is never cleared so next attempt to do BPF::init would result in USDT::init eventually being called on the USDT which failed to init in the first place (and the program text all_bpf_program being repopulated with duplicate bpf helper functions, which would additionally break things if compilation step was ever reached). Anyways, if we expect folks to be able to do a second BPF::init call after first one fails we must clean some things up to make it possible.

TODO:

  • maybe better name for BPF::init_fail_reset
  • add comments explaining what BPF::init_usdt does to BPF.h, appropriate docs
  • add test
  • validate for my usecase

@yonghong-song
Copy link
Collaborator

I looked at the code. Looks good to me. Function name init_fail_reset() is okay to me. We do need some doc about what is the typical way to use the new API and a test case to actually show how the new API is used.

Copy link
Contributor

@anakryiko anakryiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like this, there is just this convoluted piece I commented separately on.

src/cc/api/BPF.cc Outdated Show resolved Hide resolved
src/cc/api/BPF.cc Outdated Show resolved Hide resolved
@davemarchevsky
Copy link
Collaborator Author

davemarchevsky commented Nov 14, 2019

Addressed comments. Added a test and some docs. Re-validated against my usecase successfully. @yonghong-song ready for final pass. Will squash+rebase after you give it the go-ahead.

@yonghong-song
Copy link
Collaborator

LGTM. Only one minor typo.

… partial init failure of list of USDTs.

modify BPF::init so that failure doesn't leave the BPF object in a broken state such that subsequent inits will also fail
@davemarchevsky
Copy link
Collaborator Author

davemarchevsky commented Nov 14, 2019

@yonghong-song fixed typo, squashed, added descriptive commit msg. Ready for merge IMO

@yonghong-song
Copy link
Collaborator

LGTM. Thanks!

@yonghong-song yonghong-song merged commit ccf8261 into iovisor:master Nov 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants