Sidecar: avoid Monorail restart if Tofino PCIe link not up #1510
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Monorail monitors the 10G link with Tofino and periodically restarts the task if this link is not up. This in turn causes the technician port PHY to flap which is disruptive while working with
pilot racktest
.In order to reduce spurious restarts the PCIe link with Tofino is monitored and restarts are only performed if the link is up, assuming the 10G port is down otherwise. In addition logging in the sequencer task is improved, making the events more accurate and cutting down on noise.
The following ringbuf snippets are shown when a chassis is freshly booted and the PCIe link with a Gimlet is not (yet) up:
As shown, 116 sequencer ticks have elapsed and Monorail has not restarted itself, which is expected to happen every ~25 ticks/seconds.
After attaching a Gimlet:
Monorail now monitors the 10G link and restarts itself every ~25 ticks/seconds.
And after disconnecting the PCIe link 228 ticks/seconds pass without Monorail restarting itself, only to resume this behavior when the PCIe link comes back up: