-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rack 2 powered off and left blinking power sequencers behind #1800
Comments
Alright, I think we've managed to tease this one out. When the PSUs hit certain fault conditions they drop their "OK" (active high) line. Up until this month, nobody had written code to actually monitor that line, and we learned of fault conditions in which the PSUs required active intervention to turn back on. In that state they would hang out with an amber light lit. I added code to the PSC to attempt to cycle the PSUs and clear faults like this, which has been released. Because we don't have a power shelf for testing in EMY, I did all the testing of that change with a hand-wired mockup. It appears my hand-wired mockup got one of the PSU behaviors wrong: It turns out the PSUs require you to re-enable them before they will stop indicating a fault condition. I had added logic to try to avoid cycling them on and off unnecessarily, which in practice has the effect of never turning them back on in this class of fault condition. We need to change this logic to turn the PSU on and wait a bit before deciding if it's back or not. While a PSU is disabled in this manner, it blinks its light green at about 1Hz. This means "I'm off," confusingly. This is the signal we've been seeing: it's a sign that the PSC is commanding the PSU off. Due to my misunderstanding of the behavior of the PSU fault signals, it unfortunately never turns it back on. It turns out that this class of fault condition is relatively easy to reproduce on a lab rack: sneak in via Humility and alter the PSU enable line state. So we have a way to test this in Dogfood now that we have an extender card mounted. |
See #1800. In brief: the PSU won't start asserting OK again until it's re-enabled, whereas we were waiting to see OK before we would re-enable. This produced something of a stalemate.
See #1800. In brief: the PSU won't start asserting OK again until it's re-enabled, whereas we were waiting to see OK before we would re-enable. This produced something of a stalemate.
See #1800. In brief: the PSU won't start asserting OK again until it's re-enabled, whereas we were waiting to see OK before we would re-enable. This produced something of a stalemate.
It was noticed that rack2 (dogfood) was not responding.
This rack has a single power whip connected to it.
PSC-blinks.mp4
The rack had powered off, and three of the sequencers were blinking green.
The PSC was removed and re-inserted into the chassis.
From chat, @cbiffle wrote this summary:
Alright, what we know / don't about that PSC behavior:
The text was updated successfully, but these errors were encountered: