Skip to content

cleanup OSDEV subtypes #540

@bgoglin

Description

@bgoglin

For 3.0, I'd like to cleanup object subtypes.

OS devs have an integer attribute that can be:

  • OFED and NET: The separation isn't clear when Ethernet NICs can do RDMA. IB PCI devs usually have one NET osdev (eg "ib0") and one OFED osdev (eg "mlx5_0"). If we merge, we'd have two NET devices?
  • since 3.0, BLOCK is STORAGE or MEMORY (since Split OS device "Block" type into "Storage" and "Memory" #563) since BLOCK didn't make sense for PMEM and even less for volatile memory (HBM or CXL DAX).
  • GPU and COPROC are similar but different. COPROC is basically things that can be used for computing (OpenCL, CUDA, L0, VectorEngine). GPU is rather other OSdevs (NVML and RSMI for management instead of computing, DRM devices, GL devices). A single PCI devices often contains both.

Subtype is current an integer, but the obj->subtype string is also used as a subsubtype (eg "RSMI" subsubtype for "GPU" subtype). Other object types use the subtype string only. NUMA has "DRAM"/"NVM"/"HBM" and things may get more complicated with CXL memory in the near future. Also related to #498

It looks like the obj->type string is here to stay because it can be flexible and precise. The integer subtype is osdev specific, less flexible, but allows quick filtering ("I want GPUs, I don't care if it's AMD or NVIDIA"). Maybe the string would be enough if it looked like "GPU:RSMI" or "NET:OFED".

Maybe the integer subtype should be a bitmask:

  • NET remains NET
  • OFED becomes NET|OFED
  • STORAGE remains STORAGE
  • MEM becomes MEM (when volatile) and maybe MEM|STORAGE for some non-volatile cases
  • GPU becomes GPU|COPROC except for GL devices
  • COPROC becomes COPROC|GPU in most cases

Side note, when we have multiple OSdevs inside a PCIdev, there was some discussion in the past about merging them and just have all OpenCL/RSMI/... attributes in the same object. But we'd need a way to expose multiple names ("ib0" and "mlx5_0") and likely merge some info attributes. Maybe rather merge nvml back into cuda but keep OpenCL separated, etc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions