Consider adding pcm-memory option to display ECC Correctable Errors #603
Unanswered
Chester-Gillon
asked this question in
Q&A
Replies: 1 comment
-
thanks for sharing your change. I am not sure if this is a right path as I don't see this event in the documentation (of more recent processors). May be using pcm-raw is a better way?
pcm-raw documentation: https://github.com/opcm/pcm/blob/master/PCM_RAW_README.md |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Having one Intel® Xeon® Processor E5 v3 Family server class system in which mcelog was reporting ECC Correctable Errors, in https://github.com/Chester-Gillon/pcm hacked a quick change which added an option to report the ECC Correctable Errors counter. On the problematic server this showed one memory channel was having continuous ECC Correctable Errors, where the error rate varied according to the workload.
It might be worthwhile adding this as a pull-request for a permanent feature. However, not sure how to test this on all server class processors which support ECC Correctable Errors in the
ServerUncoreMemoryMetrics
Beta Was this translation helpful? Give feedback.
All reactions