Expose sub-device exposed by ZE_AFFINITY_MASK as devices #1

jandres742 · 2022-03-22T17:22:59Z

jandres742 commented 5 days ago
From customer feedback:

Currently with a device with two sub-devices, following mask exposes the root device and bot sub-devices:

ZE_AFFINITY_MASK=0.0,0.1

Request is to have these exposed as two separate root devices. In other words, that each sub-device exposed in the mask is presented by Level Zero as a device, with no sub-devices.

@jandres742

Author
jandres742 commented 5 days ago
When you use the affinity mask, we expose the parent device when at least 2 sub-devices are selected with the mask. From https://spec.oneapi.io/level-zero/latest/core/PROG.html?highlight=affinity#affinity-mask[](https://github.com/servesh): See here how for a 4 sub-device system, when you have 1.3 and 1.0 in the mask, then we expose the root device and two subdevices for it (see below).

The following examples demonstrate proper usage for a system configuration of two devices, each with four sub-devices:
• …
• 0.2, 1.3, 1.0, 0.3: both device 0 and 1 are reported; device 0 reports sub-devices 2 and 3 as sub-devices 0 and 1, >respectively; device 1 reports sub-devices 0 and 3 as sub-devices 0 and 1, respectively; the order is unchanged.

Now, the reason we do that, instead of exposing 1.3 and 1.0 as separate devices is threefold:

Flexibility:
a. It exposes everything to the application, letting it to decide what to use and what not. If the application wants to see each sub-device as a device, then middleware library (DPC++, OpenMP) or the application can use the sub-device handles, but if other application wants to use the hierarchy of root and sub-device handles, then it would be also available. Limiting to exposing sub-devices as devices always, would limit applications who want to see the hierarchy.

Implicit scaling:
a. By exposing the root device, we allow for implicit scaling to be supported with a sub-set of tiles. In the sample below, we would have implicit scaling with the two-out-of-four tiles 1.3, and 1.0. The application then would decide whether to use the root device with a 2T implicit scaling, or just use the tiles directly. If we exposed each sub-device as a device, then implicit scaling wouldn’t’ be possible with a sub-set of tiles.

Scalability:
a. In the future we could have further levels in the device hierarchy, with sub-devices inside sub-devices. In this case, it would become difficult to decide what a device is. Imagine the case where you have this:

1 root device
2 tiles
Each tile with 4 sub-sub-devices.
Now imagine the user pass this mask:

MASK=0.0,0.1.2,0.1.3

In this case, if we exposed each as a sub-device, then we would have each device with a different set of capabilities, which may further complicate things. However, by exposing in this case

MASK=0.0,0.1.2,0.1.3 =>

root device handle 0
sub device handle 0: representing 0.0
sub-device handle 1: representing 0.1
sub-device handle 0: representing 0.1.2
sub-device handle 1: representing 0.1.3
it is clear, and easier for the application, to traverse the device hierarchy and understand what each device handle represents.

In this case, if the application, DPC++, or OpenMP, wants to see 0.0, 0.1.2, and 0.1.3, as separate devices, can do it by just selecting the right-most leaves in the trees, and if other application wants to see the whole hierarchy, and use implicit scaling, then it would use the device handle that they need.

Now, one proposal from customers is to either change the meaning of the affinity mask, or to define a new one, like ZE_VISIBLE_DEVICES, which allows for this model.

@servesh

servesh commented 4 days ago
@jandres742 Would it make sense to be more pragmatic in the way root devices are shown to the programming layer above?

The current issue seems to stem from, "we expose the parent device when at least 2 sub-devices are selected with the mask"

My thinking here is,

MASK=0.0,0.1.2,0.1.3=>

sub device handle 0: representing 0.0 (Default device 0 from programming layer, if chosen implicitly scale across 0.1.2 and 0.1.3. Memory allocations should be split across the closest domain to these devices, i.e 0.1's global memory)
sub-device handle 0: representing 0.1.2
sub-device handle 1: representing 0.1.3
MASK=0,0.0,0.1,0.1.2,0.1.3=>

root device handle 0 (Default device 0 from programming layer, if chosen implicitly scale across 0.1 and 0.2. Memory allocations should be split across the closest domain to these devices, i.e 0's global memory)
sub device handle 0: representing 0.0
sub-device handle 1: representing 0.1 (device 2 from programming layer, if chosen implicitly scale across 0.1.2 and 0.1.3. Memory allocations should be split across the closest domain to these devices, i.e 0.1's global memory)
sub-device handle 0: representing 0.1.2
sub-device handle 1: representing 0.1.3
And if the application chooses a device handle with subdevice, then implicitly scale the workload across its subdevices.

@jandres742

Author
jandres742 commented 4 days ago
thanks @servesh . I think what you are saying is the same as me, no? The way we have the affinity mask defined allows for allowing users to programmatically select the device handle in the hierarchy that fits their needs, depending on the mask passed. The behavior you showed in your example is exactly that. We would expose several device handles in the hierarchy, and do implicit scaling and color the allocations accordingly, and as you say, the application can programmatically select the handle it wants.

MASK=0,0.0,0.1,0.1.2,0.1.3=>

root device handle 0 (Default device 0 from programming layer, if chosen implicitly scale across 0.1 and 0.2. Memory > allocations should be split across the closest domain to these devices, i.e 0's global memory)
sub device handle 0: representing 0.0
sub-device handle 1: representing 0.1 (device 2 from programming layer, if chosen implicitly scale across 0.1.2 and 0.1.3. Memory allocations should be split across the closest domain to these devices, i.e 0.1's global memory)
sub-device handle 0: representing 0.1.2
sub-device handle 1: representing 0.1.3
If instead of that, we would expose each of this comma-separated masks as a single device, then no memory coloring nor implicit scaling would be possible.

That's why I dont think we should change the meaning of the mask, and previous suggestion from your team about having a separate environment variable to say whether or not we want the comma-separated masks as devices might be more viable.

@TApplencourt

TApplencourt commented 4 days ago •
That's why I dont think we should change the meaning of the mask, and previous suggestion from your team about having a separate environment variable to say whether or not we want the comma-separated masks as devices might be more viable.

I agree. HavingZE_VISIBILE_DEVICES or another ENV, will be maybe more tractable. Look like both behaviors (the visibly and the masking) are needed.

Some users definitely want the same behavior as ROCR_VISIBLE_DEVICES. So not giving a mask, just an "expose was I pass you as a device".

So having 2 different ENV seems to be a good idea!

TApplencourt · 2022-06-24T18:25:36Z

On oneAPI at least 2 products implemented this feature:

OpenMP via LIBOMPTARGET_DEVICES=subdevice
Intel tensorflow via some ITEX_ENABLE_TILE_AS_DEVICE=1

So it look like multiple projects have this requirement of exposing tile as device

jandres742 · 2022-08-14T01:51:15Z

FYI: we have a debug key in compute-runtime that provides that functionality:

https://github.com/intel/compute-runtime/blob/0101e80b00e5884fe26fbe58acfdd71bc7814670/shared/source/debug_settings/debug_variables_base.inl#L338

DECLARE_DEBUG_VARIABLE(int32_t, ReturnSubDevicesAsApiDevices, -1, "Expose each subdevice as a separate device during clGetDeviceIDs or zeDeviceGet API call")

servesh · 2022-08-25T19:13:29Z

Can confirm this atleast enumerates correctly with L0. Needed the NEOReadDebugKeys to be set as well.

MichalMrozek · 2023-01-26T11:39:00Z

ReturnSubDevicesAsApiDevices currently works only with NEOReadDebugKeys=1.

Resolves: oneapi-src#1 Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>

Resolves: #1 Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>

jandres742 mentioned this issue Mar 22, 2022

Expose sub-device exposed by ZE_AFFINITY_MASK as devices oneapi-src/level-zero#86

Closed

jandres742 added this to the v2.0 Release milestone Jan 24, 2023

jandres742 added enhancement New feature or request API: Core labels Jan 24, 2023

jandres742 pushed a commit to jandres742/level-zero-spec that referenced this issue Jun 27, 2023

Add support for flexible device hierarchy model

cb72954

Resolves: oneapi-src#1 Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>

jandres742 mentioned this issue Jun 27, 2023

Add support for flexible device hierarchy model #169

Merged

wdamon-intel closed this as completed in #169 Jul 7, 2023

wdamon-intel pushed a commit that referenced this issue Jul 7, 2023

Add support for flexible device hierarchy model (#169)

05e8e15

Resolves: #1 Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose sub-device exposed by ZE_AFFINITY_MASK as devices #1

Expose sub-device exposed by ZE_AFFINITY_MASK as devices #1

jandres742 commented Mar 22, 2022 •

edited

Loading

TApplencourt commented Jun 24, 2022 •

edited

Loading

jandres742 commented Aug 14, 2022

servesh commented Aug 25, 2022

MichalMrozek commented Jan 26, 2023

Expose sub-device exposed by ZE_AFFINITY_MASK as devices #1

Expose sub-device exposed by ZE_AFFINITY_MASK as devices #1

Comments

jandres742 commented Mar 22, 2022 • edited Loading

TApplencourt commented Jun 24, 2022 • edited Loading

jandres742 commented Aug 14, 2022

servesh commented Aug 25, 2022

MichalMrozek commented Jan 26, 2023

jandres742 commented Mar 22, 2022 •

edited

Loading

TApplencourt commented Jun 24, 2022 •

edited

Loading