Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dtrace: could not enable tracing: BPF program load for '...' failed: No space left on device #6

Closed
euloh opened this issue Jan 8, 2021 · 3 comments
Assignees

Comments

@euloh
Copy link
Member

euloh commented Jan 8, 2021

A script like
x = 1;
@ = quantize(x);
@ = quantize(x);
[....97 similar lines...]
@ = quantize(x);
fails, printing about 16 Mbyte of BPF verifier log and ending in:
dtrace: could not enable tracing: BPF program load for '...' failed:
No space left on device

@euloh euloh self-assigned this Jan 8, 2021
@euloh
Copy link
Member Author

euloh commented Jan 8, 2021

The BPF verifier walks code paths to ensure that BPF code that is run is safe. An action like quantize() must quantize values into one of 127 different bins, representing numerous possible code paths. This taxes the BPF verifier.

While that is in itself a challenge, the problem in this bug is unnecessary. A 16-Mbyte buffer is passed to the kernel, and the error is simply that the buffer we pass is too small.

@euloh
Copy link
Member Author

euloh commented Jan 8, 2021

One can imagine several fixes.

One is not to pass a buffer! The script in question would pass! Never passing a buffer, however, would mean that BPF verifier problems would never be reported.

Another is to pass a larger buffer, perhaps under user control.

A hybrid solution seems to make most sense. Try to load the BPF program without specifying a log buffer. If the load is successful, one is done. If there is a problem, retry the load, this time with a log buffer. If the error is ENOSPC, warn the user that a larger buffer is needed to capture the problem. Otherwise, simply just report the buffer.

euloh added a commit that referenced this issue Jan 20, 2021
A D script that produces BPF code with many code paths can result
in 16 Mbytes of BPF log file, ending with the above error message.
The log file really says nothing about why the BPF load failed.
The actual problem is that we supplied a 16-Mbyte log buffer that
is too small.

If no log buffer is supplied, this problem is not encountered.

Change DTrace's BPF program load to use no log buffer at first.
If the load fails, then retry with a log buffer.  The load should
again fail, but if the failure is not ENOSPC, we can simply report
the log and be done.  If the failure becomes ENOSPC, inform the
user of the problem and what action can be taken to increase the
buffer size.

Provide a new DTrace option to control the log buffer size.

Add tests for this fix.  Specifically, the aggregation function
quantize() can be used, since it must quantize a value into one
of 127 different bins.  The algorithm used has many code paths
and thereby exercises the BPF verifier well.

#6
Signed-off-by: Eugene Loh <eugene.loh@oracle.com>
Reviewed-by: Kris Van Hees <kris.van.hees@oracle.com>
@kvanhees
Copy link
Member

Verified as fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants