New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCC sensor can not be activated #58
Comments
It sounds like the hardware is definitely bad, so I'm calling this an error handling enhancement as opposed to a bug. The hwmon driver should have a better error message. |
I'd propose a workaround just for testing (still now clear about what is the root cause):
|
Have you attempted to work out why the sysfs entry already exists? |
This is a bug in the occ driver that needs to be resolved. |
@shenki , Thanks and I now know how to paste code in comments correctly :-). |
Hi @adamliyi and @shenki 59.83819|ISTEP 21. 1 |
@Kenthliu , the error message in your last comment looks like an issue of the OCC itself, not BMC. |
I tried @adamliyi solution. It will fix this issue. Here is the log. I found another error message in that system, it seems another issue,please help to check it. @nkskjames 117.60312|================================================ |
@Kenthliu you need to open a new bug with this error, it's not related to the original bug. @williamspatrick how do you want these to be tracked? Should we get people to report the issue against open-power/hostboot? |
Submitted a pull request to fix this issue: openbmc/linux#65 |
…upport more sensors This patch fixes issue: openbmc/skeleton#58 The hwmon sys attributes are created using statically defined arrays. Some POWER CPU has 10-core, while some POWER CPU has 12-core. The more cores, the more OCC sensors. E.g, for 12-core CPU, there will be 28 temperature sensors. The statically defined array will overflow in this case. This is a temporary fix. Will need to generate the hwmon sysfs attributes dynamically. Signed-off-by: Yi Li <adamliyi@msn.com>
Errors reported by "0xE500" correspond to the diagnostics component, so Mba.prf.err.C: PRDR_ERROR_SIGNATURE ( 0xffff0010, "", "Maintenance UE") On Tue, Mar 22, 2016 at 12:26:28AM -0700, KenLiu wrote:
Patrick Williams |
@williamspatrick So you think this is a another HW issue that memory has something wrong? I think this is related to the SEL log we saw as attachment. |
@adamliyi image-bmc_occ_0322_final works fine. This issue can be closed now. For the DIMM issue, I will open another issue if needed. Thanks. |
@Kenthliu I would like to close this, but looks like you need to open another issue against the DIMM fail. I will wait to close until other issue is opened so we don't forget. |
@nkskjames I open the issue at #71. But the symptom could not 100% reproduce in every system. |
This issue happened again in 0.7. Please help to notice. |
obmc_v0.7 does not include my temporary fix in: openbmc/linux#65. So this bug appears again. |
This patch is a linux-4.4 port of previous patch: openbmc@f8087df It is created as the backup workaround in order to catch 5/19 release, for the blocking occ issue: openbmc/skeleton#58 I am working on a new fix to dynamically create those sysfs attributes. If the new fix cannot catch 5/19 release, we can use this patch temporarily. When the new fix come out, this patch can be replaced. Signed-off-by: Yi Li <adamliyi@msn.com>
This patch fixes issue: openbmc/skeleton#58. OCC sensor number varies for different platforms. The patch creates hwmon sysfs attributes dynamically, using sensor information get from OCC. Previously the sysfs attributes are created using statically defined data structures. Signed-off-by: Yi Li <adamliyi@msn.com>
This patch fixes issue: openbmc/skeleton#58. OCC sensor number varies for different platforms. The patch creates hwmon sysfs attributes dynamically, using sensor information get from OCC. Previously the sysfs attributes are created using statically defined data structures. Signed-off-by: Yi Li <adamliyi@msn.com> Signed-off-by: Joel Stanley <joel@jms.id.au>
I believe this is resolved. Please re-open if observed in latest tag. |
This patch fixes issue: openbmc/skeleton#58. OCC sensor number varies for different platforms. The patch creates hwmon sysfs attributes dynamically, using sensor information get from OCC. Previously the sysfs attributes are created using statically defined data structures. Signed-off-by: Yi Li <adamliyi@msn.com> Signed-off-by: Joel Stanley <joel@jms.id.au>
We found some board can not get OCC sensor.
Hwmon driver did not appear in /sys/class/hwmon
Attachment is the hconsole log and journalctrl log.
OCCfail.txt
Linux version 4.3.6-openbmc-20160222-1 (openpower@openpower-VirtualBox) (gcc version 4.9.3 (GCC) ) #1 Tue Mar 8 14:03:29 CST 2016
The text was updated successfully, but these errors were encountered: