Skip to content

Conversation

@gnurizen
Copy link
Contributor

@gnurizen gnurizen commented Jan 16, 2026

Ignore first commit, that's PR #6

  • Implement simple rate limiting scheme to not overwhelm the server

@gnurizen gnurizen requested review from brancz and umanwizard January 16, 2026 07:54
@gnurizen
Copy link
Contributor Author

Gotta say I don't love this:

  1. There's plenty of bandwidth via ebpf perf events so we could easily push this into the agent instead of the shim.
  2. With some workloads we can miss important events (like rare but higher impact graph launches in a sea of individual kernel launches)
  3. With bursty activity that isn't constant we can miss a lot of stuff (inference workloads)
  4. We can still bust the server if we have a graph launch that has many kernels as we'll generate a sample for each kernel under the 1 graph launch (in practice I usually only see on the order of a dozen or so kernels per graph launch but I suspect it can get into the hundreds).

That said we need some form of rate limiting in the shim because some workloads will generate so much unwind activity that we'll exceed desired CPU limits from that alone.

So I think I'm gonna keep this but change it so that the rate limit isn't applied to graph launches.

I'm also gonna add a env switch to disable it for testing/diagnostic purposes.

@gnurizen gnurizen merged commit a86187f into main Jan 20, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants