Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: RX activity watchdog #867

Merged
merged 3 commits into from Oct 24, 2022
Merged

net: RX activity watchdog #867

merged 3 commits into from Oct 24, 2022

Conversation

cbiffle
Copy link
Collaborator

@cbiffle cbiffle commented Oct 20, 2022

We're hitting an intermittent bug on the stm32h7 where the Ethernet MAC
stops delivering packets from the MTL, without any status bits to
indicate it or any obvious way to get it unstuck. Resetting the device
works, however!

This changes the netstack to reset itself (by panicking) if no traffic
has emerged from the MAC within a time window (currently 10 seconds).
Note that this should cover any traffic on the wire, not just traffic
addressed to the node, so 10 seconds isn't as aggressive as it might
sound.

It would be nice to reset just the Ethernet driver instead of losing
netstack state, which includes any packets received to sockets but not
yet handed out to tasks. Maybe later.

The basic Smoltcp trait impl for the stm32h7 ethernet driver had been
living in the stm32h7 ethernet driver itself. However, we had a _second_
one for handling our Weird VLAN Shenanigans, and that had wound up in
task/net (probably rightfully so).

I need to add some task/net specific behavior to the trait impl, so this
seemed like a good time to pull both implementations into the same crate
and align them.

This removes stm32h7-eth's dependency on smoltcp.
@cbiffle cbiffle requested a review from mkeeter October 20, 2022 18:43
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 374 files.

Valid Invalid Ignored Fixed
373 1 0 0
Click to see the invalid file list
  • lib/multitimer/src/lib.rs

lib/multitimer/src/lib.rs Show resolved Hide resolved
@cbiffle cbiffle force-pushed the net-watchdog branch 2 times, most recently from 133edc2 to 871ea9f Compare October 20, 2022 18:48
// If the timer has previously fired without us noticing it, preserve
// that across set.
let fired_but_not_observed = self.timers[which].fired_but_not_observed;
self.timers[which] = Timer {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vague question: is it better to save fired_but_not_observed then build a whole new Timer, vs modifying the deadline on the existing timer, e.g.

self.timers[which].deadline = Some((deadline, repeat))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could go either way -- my current approach feels more robust if Timer grows new fields in the future, while your approach is simpler with the current definition.

}
}

trait SomeTimer {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this trait used anywhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! That's a leftover from my early prototyping. Will remove.


// Syscall fakes for testing!

#[cfg(not(target_os = "none"))]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could put all this into a single mod fakes (then import super::fakes::* in the test suite), to reduce repeated #[cfg(not(target_os = "none"))]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required a top-level conditional use, but, I think this is cleaner. Will change.

We're hitting an intermittent bug on the stm32h7 where the Ethernet MAC
stops delivering packets from the MTL, without any status bits to
indicate it or any obvious way to get it unstuck. Resetting the device
works, however!

This changes the netstack to reset itself (by panicking) if no traffic
has emerged from the MAC within a time window (currently 60 seconds).
Note that this should cover any traffic on the wire, _not just_ traffic
addressed to the node. The watchdog interval needs to be short enough to
make it practical for an operator to get control over a wedged machine,
but doesn't have to be "machine-fast" because we don't expect this to
happen very often. (If it turns out to happen a lot, we should
reconsider this.)

It would be nice to reset just the Ethernet driver instead of losing
netstack state, which includes any packets received to sockets but not
yet handed out to tasks. Maybe later.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants