-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Romulus / Talos does not IPL on DD2.2 #10
Comments
After extensive testing, one of the known working versions SBE versions is 9b78381 . However, there is a confounding factor in all of this: a physical power cycle to at least our Talos boards is required to recover from a "bad" SBE version, or even after a standard SBE update from hostboot. This poorly-understood issue means that the "bad" SBE versions also need to be power-cycle tested to see if they recover. Power cycle here means pulling standby power to the entire mainboard, not just cycling the host power via the BMC. This was not seen until recently, so something seems to have changed in newer SBE code and/or the DD2.2 silicon itself. |
Currently almost all zz/ws/zaius systems are DD2.1 or DD2.2 system. Issues has not been reported anywhere. I doubt it is related with SBE code. Can you please give us some system where this issue is reproducing. |
Talos has this issue, and IBM Austin has replicated on Romulus |
Do u have system on which this issue is coming. we will need live system for any debug as currently BMC does not capture any debug data for sbe fails |
@sgupta2m I have direct access to the DD2.2 system showing the problem and a Cronus box. Just let me know what you need to see / have run on the system. |
To start with can u please give us output of this after failure |
I don't have the sbe-debug.py script, but here are the status fields over CFAM:
|
@sgupta2m Found the debug-sbe.py script and ran as requested:
|
sbe looks good here. we need to understand from HB team where they are failing. you need to send this issue to HB team. |
@sgupta2m Why would downgrading the SBE alone with Cronus (without changing hostboot) cause the system to work again if this is not an SBE issue? FWIW since I have the debug box up here is full trace from the SBE:
|
SBE image has HBBL ( owned by HB team ) . If you are failing after istep 5.2 but before HB isteps starts , most probably issue is in HBBL |
OK, that helps. Thanks! |
Migrated back to open-power/hostboot#128 |
On the latest op-build and DD2.2 the SBE hangs on ISTEP 5. Nothing is printed to console. IBM Austin has reproduced this issue on Romulus; we see it on Talos.
We've also reproduced the issue on this end with SBE hash 75ddac2. Working through a bisect / regression test as the SBE images from November 2017 allow hostboot to initialize.
The text was updated successfully, but these errors were encountered: