Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for sharing of python tools/ scripts #1531

Merged
merged 4 commits into from
Jan 22, 2018

Conversation

natoscott
Copy link
Contributor

It would be extremely helpful to the PCP (pcp.io) project, and probably others,
if the eBPF programs in tools/*.py were more readily accessible outside of the
python scripts. This commit suggests a simple way to achieve this, such that
we can use it in PCP ( performancecopilot/pcp#409 ).

If this approach is generally favourable, we'd update more scripts to follow the
same pattern established here.

Also fixes a little bug in BCC python class constructor error reporting.

The exception raised in the BPF class constructor assumed
that the "src_file" argument was used to pass eBPF code -
this is not the case for many of the tools scripts, which
tend to use "text".

Signed-off-by: Nathan Scott <nathans@redhat.com>
Several python tools allow their eBPF code to be printed to
stdout for debugging.  There are other projects that would
like to share these program definitions however, instead of
duplicating code.  Formalise an -e/--ebpf option, and start
using it in several tools (more to come).

Signed-off-by: Nathan Scott <nathans@redhat.com>
@yonghong-song
Copy link
Collaborator

[buildbot, ok to test]

@yonghong-song
Copy link
Collaborator

It is great to see that bcc can be leveraged for PCP.

It looks the PCP just wants to get the bpf program at this point. But be aware that bpf programs may get changed in the future, the data structures and maps defined in the programs could be changed as well. Do PCP have pre-knowledge about how each bpf program may do and what maps should be looked at or it will parse the program to find out the answers? I am asking since I want to understand how future bcc code changes may impact on this dependency.

@natoscott
Copy link
Contributor Author

| It looks the PCP just wants to get the bpf program at this point.

That's correct.

| But be aware that bpf programs may get changed in the future, the data structures and maps defined in the programs could be changed as well.

Yep, understood.

| Do PCP have pre-knowledge about how each bpf program may do and what maps should be looked at or it will parse the program to find out the answers?

It does have prior knowledge, yes, it has to - in PCP we carefully define the metadata (semantics, units, data types) for each metric we export - which is intrinsically part of each script (and the kernel). So, parsing is not an option in practice and due diligence rests with PCP developers and users to make sure the metrics are correctly defined.

| I am asking since I want to understand how future bcc code changes may impact on this dependency.

Thanks! Much of the responsibility rests with us in PCP, in terms of understanding whether changes in bcc/tools and the kernel affect PCP or not.

@yonghong-song
Copy link
Collaborator

cc @brendangregg

@brendangregg
Copy link
Member

I'm fine with adding --ebpf, but I'm not sure I want to spend a letter on it (-e) and also clutter up every USAGE message of every tool, with something that only pcp developers will really care about. Is there a way in python argparse to have a hidden option? We can add it hidden to start with, then unhide it later if there's a good reason for it. Harder to go the other way around.

@pchaigno
Copy link
Contributor

I would also be interested in such an option, even if only for debugging.

Is there a way in python argparse to have a hidden option?

That would be the help=argparse.SUPPRESS option of argparse. From the documentation:

>>> parser = argparse.ArgumentParser(prog='frobble')
>>> parser.add_argument('--foo', help=argparse.SUPPRESS)
>>> parser.print_help()
usage: frobble [-h]

optional arguments:
  -h, --help  show this help message and exit

@myllynen
Copy link
Contributor

myllynen commented Jan 16, 2018

Even though a bpf program would not change in the future, with the proposed patch we would still get program text that needs updating on the fly, for example replacing the PID/port filters, which in turn requires hard-coding BCC tool internal information into external consumers.

This is from biolatency.py:

// output
int trace_req_completion(struct pt_regs *ctx, struct request *req)
{
...
FACTOR
...
STORE
...
}

This is currently substituted by the BCC program like this:

if args.milliseconds:
    bpf_text = bpf_text.replace('FACTOR', 'delta /= 1000000;')
    label = "msecs"
else:
    bpf_text = bpf_text.replace('FACTOR', 'delta /= 1000;')
    label = "usecs"
if args.disks:
    bpf_text = bpf_text.replace('STORAGE',
        'BPF_HISTOGRAM(dist, disk_key_t);')
    bpf_text = bpf_text.replace('STORE',
        'disk_key_t key = {.slot = bpf_log2l(delta)}; ' +
        'bpf_probe_read(&key.disk, sizeof(key.disk), ' +
        'req->rq_disk->disk_name); dist.increment(key);')
else:
    bpf_text = bpf_text.replace('STORAGE', 'BPF_HISTOGRAM(dist);')
    bpf_text = bpf_text.replace('STORE',
        'dist.increment(bpf_log2l(delta));')

At this point embedding the entire program in an external consumer starts to look like an alternative worth considering, given that, as mentioned above, these programs may change in the future - for example, an option could be added that would not change the semantics of the output but would add another filtering capability, that would break external consumers merely getting unsubstituted program text whereas in case of embedding the embedded program would remain fully functional and it could be updated - if that would be even needed - on a later occasion as time permits.

I think ideally we would be able to do something like

if args.ebpf:
    print(prepare(bpf_text))

which would remove PID_FILTER and such. With some modules passing options would be needed, like for the above mentioned biolatency.py, queued vs non-queued, milliseconds vs microseconds, combined vs separate.

Thanks.

@fche
Copy link

fche commented Jan 16, 2018

ISTM too that since the C BPF programs are tightly coupled to their python consumers/drivers, there is little to gain from extricating just the C parts to use elsewhere.

If the iovisor copy gets changed, pcp would have to play catchup. That's made even harder since iovisor is updated / distributed independently, so a pcp user or developer could first get wind of an incompatibility with some random distro or snapshot update. And if pcp were to have to support more than one version of a bpf script, oh dear.

If on the other hand these iovisor scripts are pretty much "done", then there is no harm in duplicating the C BPF parts into the pcp side's tightly-integrated consumer/driver. Then pcp users/devs are isolated from iovisor changes.

As discussed in iovisor#1531
review comments.

Signed-off-by: Nathan Scott <nathans@redhat.com>
@natoscott
Copy link
Contributor Author

@brendangregg @pchaigno thanks! - I've updated the PR with --ebpf as a hidden option, and removed the -e short option as well - both changes verified & work as advertised.

| with the proposed patch we would still get program text that needs updating on the fly

@myllynen this (very valid) concern you raise (and again in e.g. "ideally we would be able to do something like [...]") is actually already handled - the emitted eBPF code has the string filtering applied, and it is the responsibility of the PCP code to invoke the script in a manner appropriate to its needs. Since you and I had discussed that aspect already, I was careful to take it into account.

As one of the people who gets handed the broken PCP pieces when things go wrong, I have no doubt the approach in this PR will work better than en-mass duplication of hundreds of lines - thousands, over time - of often version-dependent kernel code from bcc/tools into PCP. And this approach sets us up for some healthy cross-project collaboration too - win, win as far as I'm concerned.

| If the iovisor copy gets changed, pcp would have to play catchup.

... exactly why all this code should not be cut+pasted into PCP. The most common reason for change is changes to adapt to the underlying kernel, not changes to bcc/tools.

@myllynen
Copy link
Contributor

this (very valid) concern you raise (and again in e.g. "ideally we would be able to do something like [...]") is actually already handled

You're right, I had already forgotten that.

So if we conclude that external consumers should call these programs with -e to get the BPF text, then the simple solution to the above mentioned parameterization case would obviously be calling these programs with the wanted command line options, meaning that should a program be extended in the future, external consumers would see/use its new features in their default mode, until one day possibly extended to call the program with newly available options. (As mentioned elsewhere, initially it was helpful to have BPF programs easily available to allow figuring out how to access different type of data structures from PCP side but now when that's all done, there's no real reason for this any more from development perspective.)

So, so far so good, however there's still the concern / trade-off with proposals seen so far as ...

often version-dependent kernel code from bcc/tools into PCP

... should a BPF program change the data structures it uses that would force updating PCP code accordingly since the PCP BCC modules directly access the data structures in use. So in practice if BCC with a changed program gets updated in Fedora but not in Ubuntu, the corresponding PCP module becomes broken in Fedora (possibly initially unbeknownst for PCP developers) and at least PCP upstream version would be broken in Ubuntu if it would be updated for Fedora. Unless of course it would be somehow possible to detect in the PCP module whether to use the old or new way to access BPF data. In simple cases that should certainly be doable, not sure how it would work over the time in more complex cases, I presume it wouldn't be impossible.

@natoscott
Copy link
Contributor Author

@myllynen yep, there's certainly tradeoffs. The flip side of your final point/concern is that eBPF fixes and updates for new kernels arrive out-of-band, and there's no maintenance overhead on the part of you or I for that. The PMDA allows arbitrary eBPF code from outside of both PCP and bcc/tools to be used of course, so its not like we are locking anyone into anything here. Let's take PCP specific discussion over to the PMDA github pull request though.

@yonghong-song @brendangregg I think all earlier feedback was included in that subsequent update - if there's no other changes needed, could this pull request be merged? thanks!

@myllynen
Copy link
Contributor

The PMDA allows arbitrary eBPF code from outside of both PCP and bcc/tools to be used of course, so its not like we are locking anyone into anything here.

Yes, just for the record earlier the options that could be used with PCP BCC PMDA were:

  • embedded custom eBPF programs
  • embedded BCC programs

Now with this patch it'll be also possible to use:

  • non-embedded BCC programs (still requires the corresponding PCP BCC side module, of course)

And since these options are not mutually exclusive, we can decide the best approach on case by case basis if a need arises (for example, if a particular program is constantly changing while others remain perfectly stable). Thanks.

@yonghong-song
Copy link
Collaborator

@myllynen @natoscott The tests failed due to the newly introduced flag allow_abbrev=False. Could you take a look? Goto Details, select a platform, and then check console output and you will see which test failed and why. Thanks.

Use of this argparse constructor keyword is causing regression
test failures, and its use was nice-to-have - this means the -e
shorthand for --ebpf will be available iff its not been used in
another add_argument call.  Neither -e/--ebpf are advertised in
the usage message anyway.
@natoscott
Copy link
Contributor Author

@yonghong-song it looks like "allow_abbrev" is python 3.5 or later only - so I've removed that part. This means -e will be available as a shorthand for --ebpf but only if its not been used by an explicit add_argument() call. Neither -e/--ebpf is advertised in the usage, so I guess this will still be OK for the review comment about -e clutter that @brendangregg made initially.

@yonghong-song yonghong-song merged commit bb95ef2 into iovisor:master Jan 22, 2018
banh-gao pushed a commit to banh-gao/bcc that referenced this pull request Jun 21, 2018
As discussed in iovisor#1531
review comments.

Signed-off-by: Nathan Scott <nathans@redhat.com>
CrackerCat pushed a commit to CrackerCat/bcc that referenced this pull request Jul 31, 2024
As discussed in iovisor#1531
review comments.

Signed-off-by: Nathan Scott <nathans@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants