Skip to content

[GSD-12570] Most counters in Sysman are broken #932

@ProjectPhysX

Description

@ProjectPhysX

Pre-submission Checklist

  • I am using the latest GPU driver version (releases)
  • I have searched for similar issues and found none

GPU Hardware

Intel Arc B580

DRI Devices Information

See #926

GPU Detailed Information (lspci output)

See #926

Driver Version

26.18.38308.1

Installed GPU Driver Packages

No response

Driver Installation Details

Followed installation instructions from here: https://github.com/intel/compute-runtime/releases/tag/26.18.38308.1

Linux Distribution

Ubuntu 24.04 LTS

Other Linux Distribution

No response

Kernel Version & Boot Parameters

kernel 6.17.0-22-generic

Actual Behavior

Issue moved from oneapi-src/level-zero#434

Hi all,

I'm developing a universally compatible CPU/GPU monitoring tool hw-smi, and for Intel GPU support on Linux I'm testing Sysman.

The data I want to get:

  • GPU name
  • GPU usage
  • VRAM bandwidth
  • memory use
  • temperature
  • power draw
  • fan speed
  • core/memory clock speed
  • PCIe throughput

My system configuration is:

  • Intel Arc B580, Intel UHD Graphics 770

Unfortunately the majority of counters do not work:

Sysman counter Windows Linux
zes_device_properties_t::.core.name ❌ returns "Intel(R) Graphics [0xe20b]" ✅ returns Arc B580
ze_device_memory_properties_t::maxBusWidth ❌ returns 0 ❌ returns 0
zes_mem_properties_t::busWidth ✅ works correctly ❌returned value is 2x too large
ZES_FREQ_DOMAIN_MEMORY zes_freq_properties_t::max ❌ returns 0 ZES_FREQ_DOMAIN_MEMORY unavailable
ZES_FREQ_DOMAIN_MEMORY zes_freq_range_t::max ❌ returns 0 ZES_FREQ_DOMAIN_MEMORY unavailable
ZES_FREQ_DOMAIN_MEMORY zes_freq_state_t::actual ⚠ only available with Administrator permissions
❌ returns frequency in MT/s, not MHz (a factor 8 too large for GDDR6)
ZES_FREQ_DOMAIN_MEMORY unavailable
zes_temp_properties_t::maxTemperature ✅ works correctly ❌ returns 0.0
zesTemperatureGetState() ⚠ only available with Administrator permissions
✅ works correctly
⚠ only available with sudo permissions
✅ works correctly
zes_power_properties_t::defaultLimit/maxLimit/minLimit ✅ work correctly ❌ returns -1
zes_power_sustained_limit_t::power ❌ returns 0 ❌ returns 0
zes_power_properties_t::defaultLimit/maxLimit ✅ work correctly ❌ returns -1
zes_power_energy_counter_t::energy/timestamp ⚠ only available with Administrator permissions
✅ works correctly
✅ works correctly
zes_fan_handle_t ✅ works correctly ❌ broken (no fans available)
zes_fan_properties.maxRPM ❌ returns -1 ❌ broken (no fans available)
zes_fan_config.speedFixed.speed ✅ works correctly ❌ broken (no fans available)
zes_pci_stats_t::txCounter/rxCounter/timestamp ⚠ only available with Administrator permissions
✅ works correctly
⚠ root required
✅ works correctly
zes_pci_stats_t::speed.gen/width ✅ has been resolved ✅ has been resolved
zes_pci_stats_t::speed.maxBandwidth ❌ returns 64, instead of max bandwidth in Bytes/s ❌ returns 64, instead of max bandwidth in Bytes/s
zes_pci_state_t::speed.gen/width/maxBandwidth ✅ work correctly ❌ return -1
zes_pci_properties_t ✅ works correctly ✅ works correctly

Please remove Windows administrator requirement and Linux sudo requirement for reading counters!

  • affected functions on Windows:
    • zesDevicePciGetStats()
    • zesDevicePciGetState()
    • zesDevicePciGetProperties()
    • zesTemperatureGetState()
    • zesPowerGetEnergyCounter()
    • zesDevicePciGetStats()
  • affected functions on Linux:
    • zesDeviceEnumEngineGroups()
    • zesEngineGetProperties()
    • zesEngineGetActivity()
    • zesDevicePciGetStats()
    • zesDevicePciGetState()
    • zesDevicePciGetProperties()
    • zesTemperatureGetState()
    • zesPowerGetEnergyCounter()
    • zesDevicePciGetStats()

Expected Behavior

All counters should be available on Windows and Linux, without admin/sudo permissions, and return valid data.

Reproduction Rate

Always reproduces - 100%

Steps to Reproduce

For debugging you may use https://github.com/ProjectPhysX/hw-smi

Is this a regression?

  • Yes, this is a regression - functionality that previously worked is now broken

Last Known Working Driver Version

No response

First Known Failing Driver Version

No response

API Call Logs

No response

strace Logs

No response

System Logs / dmesg Output

No response

Backtrace (if crash or hang occurred)

No response

Source Code / Reproducer

No response

Command Line / Application Details

No response

oneAPI Version (if applicable)

No response

Screenshots / Video

No response

Additional Notes

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    OS: LinuxIssue specific to Linux distributions (Ubuntu, Fedora, RHEL, etc.)Type: BugGeneral bug report, unexpected behavior or crash

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions