Skip to content

put interrupt monitor behind a feature flag#243

Merged
Nieuwejaar merged 1 commit intomainfrom
debug
Mar 31, 2026
Merged

put interrupt monitor behind a feature flag#243
Nieuwejaar merged 1 commit intomainfrom
debug

Conversation

@Nieuwejaar
Copy link
Copy Markdown
Collaborator

There is still no root cause for #241, but it certainly seems likely that interrupts are involved. The kernel code handling tofino/tfpkt interrupts hasn't changed in years, but dpd recently added code that monitors/manipulates interrupts in userspace.

Until the specific problem is identified, I would like to disable the userspace interrupt code with the hope that it will address the issue, allowing us to ship R19. Unfortunately, since the appearance of the performance issue seems to be non-deterministic, I can't prove that this change will be effective even in the short term. The best I can say is that I haven't seen a recurrence on madrid with these changes in place.

@Nieuwejaar Nieuwejaar requested a review from rcgoodfellow March 31, 2026 15:58
@rcgoodfellow
Copy link
Copy Markdown
Contributor

The primary thing we lose here is the ability to detect TCAM parity errors. We should work toward being able to detect at least that, and if possible to enable/detect all RAS interrupts. But given the the issues we've seen in the lab that seem to be interrupt related, this seems prudent.

@Nieuwejaar
Copy link
Copy Markdown
Collaborator Author

The primary thing we lose here is the ability to detect TCAM parity errors. We should work toward being able to detect at least that, and if possible to enable/detect all RAS interrupts. But given the the issues we've seen in the lab that seem to be interrupt related, this seems prudent.

Agreed. I'm going to keep digging for a root cause.

@Nieuwejaar Nieuwejaar merged commit 44a949c into main Mar 31, 2026
6 checks passed
@Nieuwejaar Nieuwejaar deleted the debug branch March 31, 2026 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants