An OCP SEL decoder
- Operation at scale with Open Compute Project (OCP) / White boxes
When you adopt Open Compute, instead of getting the benefit out of "part level" replacement for a better standardization and operational efficiency, the onsite service team might face to a challenge in the troubleshooting to identify what component needs to be replaced. MoxSpec-OCPSEL is developed and allows SEL (System Event Log) decoded to human readable information, which allows onsite service team easily identify what to be replaced such as a DIMM slot. Democratizing the chance to gain another level of operational scalability to us by adopting open technologies like OCP was crucial and a holistic approach to even a 19” OEM servers was needed.
$ make bin
$ bin/ocpsel
$ ocpsel -s 40 -g 0001 -e ab0700
Decoded Info:
Summary : Machine Chk Err
Event Data 1 : 0xAB, 1010 1011 ... Uncorrectable
Event Data 2 : 0x07, 0000 0111 ... Machine Check bank Number: 7
Event Data 3 : 0x00, 0000 0000 ... CPU Number: 0, Core Number: 0
Generator : 0x0001 ... BIOS/UEFI system Firmware
$ sudo ipmitool sel elist -v > selelistv.log
$ ocpsel -f selelistv.log
SEL Record ID : 00a1
Record Type : 02
Timestamp : 08/08/2019 10:37:17
Generator ID : 0001
EvM Revision : 04
Sensor Type : Processor
Sensor Number : 40
Event Type : Sensor-specific Discrete
Event Direction : Assertion Event
Event Data : ab0700
Description : Uncorrectable machine check exception
Decoded Info:
Summary : Machine Chk Err
Event Data 1 : 0xAB, 1010 1011 ... Uncorrectable
Event Data 2 : 0x07, 0000 0111 ... Machine Check bank Number: 7
Event Data 3 : 0x00, 0000 0000 ... CPU Number: 0, Core Number: 0
Generator : 0x0001 ... BIOS/UEFI system Firmware