-
Notifications
You must be signed in to change notification settings - Fork 177
Send HW inventory #5535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Send HW inventory #5535
Conversation
acd78e4 to
2b189a7
Compare
| func getKernelFlavor() string { | ||
| // Try to deduce from kernel version or /proc/version | ||
| version := getKernelVersion() | ||
| if strings.Contains(version, "-rt") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is also -hwe flavor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the implementation to extracting flavor from kernel version string.
|
why not upstreaming these changes? why we want to start maintaining a library? |
we want to upstream when it done. It's just faster to iterate on our own fork until we get the changes working and we get the info that we need |
35e04fc to
45ee957
Compare
|
|
||
| func getFirmwareAttributes() map[string]string { | ||
| settings := make(map[string]string) | ||
| root := "/sys/class/firmware-attributes" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which hw/BIOS vendors populate this? (I don't see it on an old Supermicro nor on my laptop).
Does something need to be enabled in the kernel to get this info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We tied it on our ThinkPad laptops and it was working well, but it looks like this is not available on most servers.
The AI tells me that's it's mostly populated on Lenovo and Dell systems and needs CONFIG_THINK_LMI or CONFIG_DELL_WMI_SYSMAN respectively.
But I don't think there is a better unified solution to get the BIOS settings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have those options in our current kconfig?
We can separately check if there are other kconfig options needed which will provide this across more manufacturers. @rucoder mentioned that there might be a tool to get this, but unless I misunderstood him. even that tool relies on /sys/class/firware-attributes being populated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we seem to have the support for Dell, but not for Lenovo
pkg/pillar/hardware/leds.go
Outdated
| InitStrategy: StrategyDellCmd, | ||
| DisplayStrategy: StrategyLedCmd, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the introduction of "Strategy", why do you need a separate strategy for init and for display?
Here it could be StrategyDell and you set an init function and display function baseed on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes more sense 👍
45ee957 to
0a4c1dd
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5535 +/- ##
==========================================
+ Coverage 19.52% 27.99% +8.46%
==========================================
Files 19 19
Lines 3021 2322 -699
==========================================
+ Hits 590 650 +60
+ Misses 2310 1528 -782
- Partials 121 144 +23 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| for _, disk := range disksInfo.Disks { | ||
| stDiskInfo := new(info.StorageDiskInfo) | ||
| if disk.CollectingStatus != types.SmartCollectingStatusSuccess { | ||
| stDiskInfo.DiskName = *proto.String(disk.DiskName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this *proto.String for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's just useless boilerplate that we have across the code base - removing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR integrates the ghw library (github.com/zededa/ghw) to collect and report comprehensive hardware inventory data from EVE devices. The main changes move SMART disk health data from periodic hardware info messages to the hardware health reporting, deprecate the timer.hardwareinfo.interval configuration parameter, and introduce hardware inventory capabilities.
Changes:
- Added hardware inventory collection using ghw library for PCI, USB, CPU, memory, storage, serial, network, CAN, BIOS, TPM, and watchdog devices
- Moved SMART disk information from hardwareinfo message to hardwarehealth message
- Deprecated timer.hardwareinfo.interval configuration parameter as hardwareinfo is no longer sent periodically
Reviewed changes
Copilot reviewed 20 out of 237 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/pillar/types/global.go | Removes deprecated HardwareInfoInterval configuration key and adds Failure() method to SenderStatus |
| pkg/pillar/types/global_test.go | Updates test to remove HardwareInfoInterval from expected config items |
| pkg/pillar/hardware/smart.go | Updates import to use zededa/ghw instead of jaypipes/ghw |
| pkg/pillar/hardware/inventory.go | New file implementing hardware inventory collection using ghw library |
| pkg/pillar/hardware/inventory_test.go | New test file for inventory collection functionality |
| pkg/pillar/hardware/kernel.go | New file adding kernel version and flavor detection functions |
| pkg/pillar/hardware/kernel_test.go | New test file for kernel flavor detection |
| pkg/pillar/hardware/leds.go | New file extracting LED control functionality from ledmanager |
| pkg/pillar/hardware/leds_test.go | New test file for LED status presence detection |
| pkg/pillar/evetpm/tpm.go | Updates TPM property constants to use tpm2 library definitions and adds GetSpecVersion function |
| pkg/pillar/cmd/zedagent/hardwareinfo.go | Refactored to use inventory collection and remove SMART data handling |
| pkg/pillar/cmd/zedagent/handlemetrics.go | Moves SMART disk data collection to hardware health reporting |
| pkg/pillar/cmd/zedagent/reportinfo.go | Adds HwInventorySupport capability flag |
| pkg/pillar/cmd/ledmanager/ledmanager.go | Refactored to use LED functions from hardware package |
| pkg/pillar/controllerconn/send.go | Adds URLPath function and updates URLPathString documentation |
| pkg/pillar/go.mod | Updates dependencies including ghw, eve-api, and related packages |
| pkg/pillar/Dockerfile | Adds dmidecode and pciutils for test tools |
| pkg/installer/Dockerfile | Updates eve-debug base image version |
| pkg/debug/spec.sh | Adds error handling for missing hardwaremodel file |
| docs/CONFIG-PROPERTIES.md | Updates documentation to reflect timer.hardwareinfo.interval deprecation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pkg/pillar/controllerconn/send.go
Outdated
| // URLPath - generate url for either v1 or v1 API path | ||
| func URLPath(server string, isV2api bool, devUUID uuid.UUID, action string) *url.URL { | ||
| u, err := url.Parse(URLPathString(server, isV2api, devUUID, action)) | ||
| if err != nil { | ||
| return nil | ||
| } | ||
|
|
||
| return u | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // URLPath - generate url for either v1 or v1 API path | |
| func URLPath(server string, isV2api bool, devUUID uuid.UUID, action string) *url.URL { | |
| u, err := url.Parse(URLPathString(server, isV2api, devUUID, action)) | |
| if err != nil { | |
| return nil | |
| } | |
| return u | |
| } |
I think this is unused now - was in the old PR that sent the inventory to a different http endpoint
pkg/pillar/hardware/inventory.go
Outdated
| var errStr string | ||
| for key, err := range errs { | ||
| if err != nil { | ||
| errStr += fmt.Sprintf("failed to query for %s: %v", key, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use %w here for the error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we cannot use %w here, Sprintf doesn't support this and we concatenate the errors.
but the errors are nested nicely:
querying for hardware failed: failed to query for PCI: could not retrieve PCI information: pcidb: No pci-ids DB files found (and network fetch disabled)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but fmt.Errorf can do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but then we cannot concatenate the errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
package main
import (
"errors"
"fmt"
"os"
)
func main() {
errs := []error{
fmt.Errorf("error one"),
os.ErrNotExist,
os.ErrPermission,
}
var cumulatedError error
for _, e := range errs {
cumulatedError = fmt.Errorf("%w\n%w", cumulatedError, e)
}
fmt.Printf("cumulated error: %+v\n", cumulatedError)
fmt.Println("-----------")
if errors.Is(cumulatedError, os.ErrNotExist) {
fmt.Println("is ENOENT")
}
if errors.Is(cumulatedError, os.ErrPermission) {
fmt.Println("is EPERM")
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Turns out there is already a solution for this in the standard library: errors.Join!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented the change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Turns out there is already a solution for this in the standard library:
errors.Join!
Cool, that's even better then.
| return err | ||
| } | ||
|
|
||
| func stringToPCIAddress(str string) *info.PCIAddress { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we use this implementation here: https://github.com/zededa/ghw/blob/main/pkg/pci/address/address.go#L38 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but we are already using it inside the function. the rest is just conversion to uint32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks right then.
pkg/pillar/hardware/inventory.go
Outdated
| IoportRange: serial.IO, | ||
| } | ||
| var irq uint64 | ||
| fmt.Sscanf(serial.IRQ, "%d", &irq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use strconv.ParseUint?
Also the returned error is not checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense
| if strings.Contains(s, "gb") { | ||
| mult = 1000 | ||
| } | ||
| // Remove non-digits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this work with ethernet adapters providing 2.5gbit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be honest I've never seen this format with gb. On my laptop and qemu it's always just numbers like 10000
| return nil | ||
| } | ||
|
|
||
| func getFirmwareAttributes() map[string]string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is also https://github.com/fenglyu/go-dmidecode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as far as I can see it provides only
bios-vendor
bios-version
bios-release-date
bios-revision
that we already have. Besides as discussed in #5535 (comment) this function might not be that useful for actual edge devices...
pkg/pillar/types/global.go
Outdated
| func (status SenderStatus) Failure() bool { | ||
| if status == SenderStatusNone { | ||
| return false | ||
| } | ||
| if status == SenderStatusDebug { | ||
| return false | ||
| } | ||
|
|
||
| return true | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| func (status SenderStatus) Failure() bool { | |
| if status == SenderStatusNone { | |
| return false | |
| } | |
| if status == SenderStatusDebug { | |
| return false | |
| } | |
| return true | |
| } |
I think, this is not used anymore
eriknordmark
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the controllers already handle retrieving the SMART from the health message?
We'd like to have that in Eden (if it uses SMART) and in the commercial controller before we stop sending it in the hardware info message.
I informed @zed-rishabh about the API change and am finishing the changes in Eden |
0a4c1dd to
5477c1d
Compare
pkg/pillar/hardware/inventory.go
Outdated
| } | ||
| } | ||
| if usb.Parent.PCI != nil { | ||
| parent.PciParent = stringToPCIAddress(fmt.Sprintf("%s:%s:%s.%s", usb.Parent.PCI.Domain, usb.Parent.PCI.Bus, usb.Parent.PCI.Device, usb.Parent.PCI.Function)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first this string is concatenated and then a regex is used to split it up again with the same components as string and then those strings are converted to uint32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, changed it!
5477c1d to
823ef07
Compare
Let's also keep that info in ZInfoHardware for the moment until all controllers support the new EVE API. I already made the changes to the PR. |
- use eve-api version that includes the hardware inventory protobufs - use our fork of ghw library Signed-off-by: Christoph Ostarek <christoph@zededa.com>
Report is based on the info from ghw library. Also add EVE/kernel release, flavor, version. Signed-off-by: Paul Gaiduk <paulg@zededa.com>
According the the EVE API changes, SMART info is now part of the hardwarehealth message leaving hardwareinfo with only static hardware inventory data. We'll also keep the SMART info in ZInfoHardware for a while for backward compatibility. Signed-off-by: Paul Gaiduk <paulg@zededa.com>
Report HW inventory support capability as part of device info message. Signed-off-by: Paul Gaiduk <paulg@zededa.com>
The hardware info message will only be sent after onboarding and after reboots. The hardware health message will continue to be sent periodically as before. Therefore there is no need for a separate interval parameter for hardware info. Signed-off-by: Paul Gaiduk <paulg@zededa.com>
Also add tests for kernel info retrieval. Signed-off-by: Paul Gaiduk <paulg@zededa.com>
Since we already have this capability in evetpm package, use it instead of ghw. Signed-off-by: Paul Gaiduk <paulg@zededa.com>
Previously, the knowledge of which hardware models possess Status LEDs (and how to drive them) was encapsulated entirely within `ledmanager`. To report the presence of a Status LED in the inventory, this logic needed to be accessible to the `hardware` package. Signed-off-by: Paul Gaiduk <paulg@zededa.com>
Add logic to set StatusLedPresent based on hardware model. Signed-off-by: Paul Gaiduk <paulg@zededa.com>
Collect BIOS/firmware settings from the Linux `/sys/class/firmware-attributes` class. Signed-off-by: Paul Gaiduk <paulg@zededa.com>
823ef07 to
50cc172
Compare
|
I added an Eden test for this functionality lf-edge/eden#1111 |
Description
The main part of this PR is the integration of https://github.com/jaypipes/ghw/ library into zedagent. We use it to collect info about the hardware that EVE is running on and to send this data in the HW inventory message.
Other changes include:
timer.hardwareinfo.intervalconfig parameter, because the hardwareinfo message is not sent periodically anymorePR dependencies
How to test and validate this PR
For now only visual inspection of the HW inventory message and comparison with the actual HW configuration
Also I added the following Eden test: lf-edge/eden#1111
Changelog notes
Make EVE report the HW it's running on for easier onboarding.
PR Backports
No.
Checklist