Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose sub-device exposed by ZE_AFFINITY_MASK as devices #1

Closed
jandres742 opened this issue Mar 22, 2022 · 4 comments · Fixed by #169
Closed

Expose sub-device exposed by ZE_AFFINITY_MASK as devices #1

jandres742 opened this issue Mar 22, 2022 · 4 comments · Fixed by #169
Labels
API: Core enhancement New feature or request
Milestone

Comments

@jandres742
Copy link

jandres742 commented Mar 22, 2022

Moved from oneapi-src/level-zero#86


jandres742 commented 5 days ago
From customer feedback:

Currently with a device with two sub-devices, following mask exposes the root device and bot sub-devices:

ZE_AFFINITY_MASK=0.0,0.1

Request is to have these exposed as two separate root devices. In other words, that each sub-device exposed in the mask is presented by Level Zero as a device, with no sub-devices.


@jandres742

Author
jandres742 commented 5 days ago
When you use the affinity mask, we expose the parent device when at least 2 sub-devices are selected with the mask. From https://spec.oneapi.io/level-zero/latest/core/PROG.html?highlight=affinity#affinity-mask[](https://github.com/servesh): See here how for a 4 sub-device system, when you have 1.3 and 1.0 in the mask, then we expose the root device and two subdevices for it (see below).

The following examples demonstrate proper usage for a system configuration of two devices, each with four sub-devices:
• …
• 0.2, 1.3, 1.0, 0.3: both device 0 and 1 are reported; device 0 reports sub-devices 2 and 3 as sub-devices 0 and 1, >respectively; device 1 reports sub-devices 0 and 3 as sub-devices 0 and 1, respectively; the order is unchanged.

Now, the reason we do that, instead of exposing 1.3 and 1.0 as separate devices is threefold:

Flexibility:
a. It exposes everything to the application, letting it to decide what to use and what not. If the application wants to see each sub-device as a device, then middleware library (DPC++, OpenMP) or the application can use the sub-device handles, but if other application wants to use the hierarchy of root and sub-device handles, then it would be also available. Limiting to exposing sub-devices as devices always, would limit applications who want to see the hierarchy.

Implicit scaling:
a. By exposing the root device, we allow for implicit scaling to be supported with a sub-set of tiles. In the sample below, we would have implicit scaling with the two-out-of-four tiles 1.3, and 1.0. The application then would decide whether to use the root device with a 2T implicit scaling, or just use the tiles directly. If we exposed each sub-device as a device, then implicit scaling wouldn’t’ be possible with a sub-set of tiles.

Scalability:
a. In the future we could have further levels in the device hierarchy, with sub-devices inside sub-devices. In this case, it would become difficult to decide what a device is. Imagine the case where you have this:

1 root device
2 tiles
Each tile with 4 sub-sub-devices.
Now imagine the user pass this mask:

MASK=0.0,0.1.2,0.1.3

In this case, if we exposed each as a sub-device, then we would have each device with a different set of capabilities, which may further complicate things. However, by exposing in this case

MASK=0.0,0.1.2,0.1.3 =>

root device handle 0
sub device handle 0: representing 0.0
sub-device handle 1: representing 0.1
sub-device handle 0: representing 0.1.2
sub-device handle 1: representing 0.1.3
it is clear, and easier for the application, to traverse the device hierarchy and understand what each device handle represents.

In this case, if the application, DPC++, or OpenMP, wants to see 0.0, 0.1.2, and 0.1.3, as separate devices, can do it by just selecting the right-most leaves in the trees, and if other application wants to see the whole hierarchy, and use implicit scaling, then it would use the device handle that they need.

Now, one proposal from customers is to either change the meaning of the affinity mask, or to define a new one, like ZE_VISIBLE_DEVICES, which allows for this model.


@servesh

servesh commented 4 days ago
@jandres742 Would it make sense to be more pragmatic in the way root devices are shown to the programming layer above?

The current issue seems to stem from, "we expose the parent device when at least 2 sub-devices are selected with the mask"

My thinking here is,

MASK=0.0,0.1.2,0.1.3=>

sub device handle 0: representing 0.0 (Default device 0 from programming layer, if chosen implicitly scale across 0.1.2 and 0.1.3. Memory allocations should be split across the closest domain to these devices, i.e 0.1's global memory)
sub-device handle 0: representing 0.1.2
sub-device handle 1: representing 0.1.3
MASK=0,0.0,0.1,0.1.2,0.1.3=>

root device handle 0 (Default device 0 from programming layer, if chosen implicitly scale across 0.1 and 0.2. Memory allocations should be split across the closest domain to these devices, i.e 0's global memory)
sub device handle 0: representing 0.0
sub-device handle 1: representing 0.1 (device 2 from programming layer, if chosen implicitly scale across 0.1.2 and 0.1.3. Memory allocations should be split across the closest domain to these devices, i.e 0.1's global memory)
sub-device handle 0: representing 0.1.2
sub-device handle 1: representing 0.1.3
And if the application chooses a device handle with subdevice, then implicitly scale the workload across its subdevices.


@jandres742

Author
jandres742 commented 4 days ago
thanks @servesh . I think what you are saying is the same as me, no? The way we have the affinity mask defined allows for allowing users to programmatically select the device handle in the hierarchy that fits their needs, depending on the mask passed. The behavior you showed in your example is exactly that. We would expose several device handles in the hierarchy, and do implicit scaling and color the allocations accordingly, and as you say, the application can programmatically select the handle it wants.

MASK=0,0.0,0.1,0.1.2,0.1.3=>

root device handle 0 (Default device 0 from programming layer, if chosen implicitly scale across 0.1 and 0.2. Memory > allocations should be split across the closest domain to these devices, i.e 0's global memory)
sub device handle 0: representing 0.0
sub-device handle 1: representing 0.1 (device 2 from programming layer, if chosen implicitly scale across 0.1.2 and 0.1.3. Memory allocations should be split across the closest domain to these devices, i.e 0.1's global memory)
sub-device handle 0: representing 0.1.2
sub-device handle 1: representing 0.1.3
If instead of that, we would expose each of this comma-separated masks as a single device, then no memory coloring nor implicit scaling would be possible.

That's why I dont think we should change the meaning of the mask, and previous suggestion from your team about having a separate environment variable to say whether or not we want the comma-separated masks as devices might be more viable.


@TApplencourt

TApplencourt commented 4 days ago
That's why I dont think we should change the meaning of the mask, and previous suggestion from your team about having a separate environment variable to say whether or not we want the comma-separated masks as devices might be more viable.

I agree. HavingZE_VISIBILE_DEVICES or another ENV, will be maybe more tractable. Look like both behaviors (the visibly and the masking) are needed.

Some users definitely want the same behavior as ROCR_VISIBLE_DEVICES. So not giving a mask, just an "expose was I pass you as a device".

So having 2 different ENV seems to be a good idea!

@TApplencourt
Copy link
Contributor

TApplencourt commented Jun 24, 2022

On oneAPI at least 2 products implemented this feature:

  • OpenMP via LIBOMPTARGET_DEVICES=subdevice
  • Intel tensorflow via some ITEX_ENABLE_TILE_AS_DEVICE=1

So it look like multiple projects have this requirement of exposing tile as device

@jandres742
Copy link
Author

FYI: we have a debug key in compute-runtime that provides that functionality:

https://github.com/intel/compute-runtime/blob/0101e80b00e5884fe26fbe58acfdd71bc7814670/shared/source/debug_settings/debug_variables_base.inl#L338

DECLARE_DEBUG_VARIABLE(int32_t, ReturnSubDevicesAsApiDevices, -1, "Expose each subdevice as a separate device during clGetDeviceIDs or zeDeviceGet API call")

@servesh
Copy link

servesh commented Aug 25, 2022

Can confirm this atleast enumerates correctly with L0. Needed the NEOReadDebugKeys to be set as well.

@jandres742 jandres742 added this to the v2.0 Release milestone Jan 24, 2023
@jandres742 jandres742 added enhancement New feature or request API: Core labels Jan 24, 2023
@MichalMrozek
Copy link

ReturnSubDevicesAsApiDevices currently works only with NEOReadDebugKeys=1.

jandres742 pushed a commit to jandres742/level-zero-spec that referenced this issue Jun 27, 2023
Resolves: oneapi-src#1

Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
wdamon-intel pushed a commit that referenced this issue Jul 7, 2023
Resolves: #1

Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API: Core enhancement New feature or request
Projects
Development

Successfully merging a pull request may close this issue.

4 participants