-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBRT infinite loop on ECC error during startup #67
Comments
Ironically enough we just encountered this in a different environment. I'm surprised it took this long. I've got some hacks that need to be ironed out. Since this is a secondary fail it will go on the backburner for a little while. |
FWIW, the workaround is to erase the HBEL partition:
and then you'll be okay. |
Or, rather, okay until you reboot, then you fail to IPL:
|
and then you get into this stupid inescapable mess that is the golden side and your life is horrible. But it seems that by flashing zeros over rather than just erasing will do the trick. |
Changes Included for package witherspoon-xml, branch master: 7bec10c - Erich Hauptli - 2017-09-26 - Adding new WOF data 3d66657 - e-liner - 2017-09-21 - Merge pull request open-power#69 from e-liner/memd_binary 8b9fa55 - Elizabeth Liner - 2017-09-21 - Adding witherspoon MEMD binary ac74311 - Erich Hauptli - 2017-09-21 - Updating Memory Attributes c0b9bc1 - William Hoffa - 2017-09-15 - Mark Xbus Targets Deconfigurable (open-power#67) 5736f3e - Prachi Gupta - 2017-09-08 - sync with common_mrw_xml -- 09/07 (open-power#66) Changes Included for package hostboot, branch master: 7f59b42 - Jacob Harvey - 2017-09-26 - Increment red_waterfall for low vdn fix ad079f5 - Zane Shelley - 2017-09-26 - PRD: Nimbus DD2.0.1 workaround for nce/tce/mpe/impe 49d2286 - Matt K. Light - 2017-09-25 - remove cas_latency.H include from p9_mss_freq.H 4930d04 - Luke Mulkey - 2017-09-25 - Memory buffer vpd accessor functions 3027cb5 - Thi Tran - 2017-09-25 - L3 Update - p9_hcd_cache_stopclocks HWP 72b46fb - Ben Gass - 2017-09-25 - Fix DMI scom translation. 190d346 - Prem Shanker Jha - 2017-09-25 - 24x7: Corrected handling of MCA on a direct attached systems. 3245f4f - Louis Stermole - 2017-09-25 - Restore original training settings if mss_draminit_training_adv fails ecb8cf7 - Sachin Gupta - 2017-09-25 - Added comment for INVALID enum value 7085e6b - Andre Marin - 2017-09-25 - Add Write CRC attributes to xml and eff_dimm 84e9979 - Andre Marin - 2017-09-25 - Modify VPD decoder to take into account deconfigured ports b6c7737 - Matthew Hickman - 2017-09-25 - Changed two symbol correction disable to mnfg flag DISABLE_DRAM_REPAIRS f0e99cd - Nick Klazynski - 2017-09-25 - Core workarounds for multiple issues. d58dbd6 - Soma BhanuTej - 2017-09-25 - Nimbus DD22 support updates to ekb c0719c3 - Ben Gass - 2017-09-25 - Updates for HW416934 and HW417233 bc88548 - Elizabeth Liner - 2017-09-25 - Removing first byte from MEMD binary 0e38c62 - Nick Bofferding - 2017-09-25 - Secure Boot: Direct signature temp files to specific scratch dir 54c1fc7 - Nick Bofferding - 2017-09-25 - Secure Boot: Support open signing with component IDs 1b3b999 - Christian Geddes - 2017-09-25 - Set variables to nullptr after they are deleted
Resolved with 1e784c0. HBRT will now assert and crash if we get early life PNOR failures instead of doing an infinite loop that pegs the cpu. |
Start opal-prd and observe this log before opal-prd gets stuck at 100% CPU.
(at which point everything stops with opal-prd chewing 100% CPU)
Which ends up being a fairly classic race in trying to log an error before everything has been initialized.
Consequently, opal-prd spins a core and is right off into the weeds.
The text was updated successfully, but these errors were encountered: