-
Notifications
You must be signed in to change notification settings - Fork 199
Description
For 3.0, I'd like to cleanup object subtypes.
OS devs have an integer attribute that can be:
- OFED and NET: The separation isn't clear when Ethernet NICs can do RDMA. IB PCI devs usually have one NET osdev (eg "ib0") and one OFED osdev (eg "mlx5_0"). If we merge, we'd have two NET devices?
- since 3.0, BLOCK is STORAGE or MEMORY (since Split OS device "Block" type into "Storage" and "Memory" #563) since BLOCK didn't make sense for PMEM and even less for volatile memory (HBM or CXL DAX).
- GPU and COPROC are similar but different. COPROC is basically things that can be used for computing (OpenCL, CUDA, L0, VectorEngine). GPU is rather other OSdevs (NVML and RSMI for management instead of computing, DRM devices, GL devices). A single PCI devices often contains both.
Subtype is current an integer, but the obj->subtype string is also used as a subsubtype (eg "RSMI" subsubtype for "GPU" subtype). Other object types use the subtype string only. NUMA has "DRAM"/"NVM"/"HBM" and things may get more complicated with CXL memory in the near future. Also related to #498
It looks like the obj->type string is here to stay because it can be flexible and precise. The integer subtype is osdev specific, less flexible, but allows quick filtering ("I want GPUs, I don't care if it's AMD or NVIDIA"). Maybe the string would be enough if it looked like "GPU:RSMI" or "NET:OFED".
Maybe the integer subtype should be a bitmask:
- NET remains NET
- OFED becomes NET|OFED
- STORAGE remains STORAGE
- MEM becomes MEM (when volatile) and maybe MEM|STORAGE for some non-volatile cases
- GPU becomes GPU|COPROC except for GL devices
- COPROC becomes COPROC|GPU in most cases
Side note, when we have multiple OSdevs inside a PCIdev, there was some discussion in the past about merging them and just have all OpenCL/RSMI/... attributes in the same object. But we'd need a way to expose multiple names ("ib0" and "mlx5_0") and likely merge some info attributes. Maybe rather merge nvml back into cuda but keep OpenCL separated, etc.