Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose sub-device exposed by ZE_AFFINITY_MASK as devices #86

Closed
jandres742 opened this issue Mar 18, 2022 · 5 comments
Closed

Expose sub-device exposed by ZE_AFFINITY_MASK as devices #86

jandres742 opened this issue Mar 18, 2022 · 5 comments

Comments

@jandres742
Copy link

From customer feedback:

Currently with a device with two sub-devices, following mask exposes the root device and bot sub-devices:

ZE_AFFINITY_MASK=0.0,0.1

Request is to have these exposed as two separate root devices. In other words, that each sub-device exposed in the mask is presented by Level Zero as a device, with no sub-devices.

@jandres742
Copy link
Author

jandres742 commented Mar 18, 2022

When you use the affinity mask, we expose the parent device when at least 2 sub-devices are selected with the mask. From https://spec.oneapi.io/level-zero/latest/core/PROG.html?highlight=affinity#affinity-mask: See here how for a 4 sub-device system, when you have 1.3 and 1.0 in the mask, then we expose the root device and two subdevices for it (see below).

The following examples demonstrate proper usage for a system configuration of two devices, each with four sub-devices:
• …
• 0.2, 1.3, 1.0, 0.3: both device 0 and 1 are reported; device 0 reports sub-devices 2 and 3 as sub-devices 0 and 1, >respectively; device 1 reports sub-devices 0 and 3 as sub-devices 0 and 1, respectively; the order is unchanged.

Now, the reason we do that, instead of exposing 1.3 and 1.0 as separate devices is threefold:

  1. Flexibility:
    a. It exposes everything to the application, letting it to decide what to use and what not. If the application wants to see each sub-device as a device, then middleware library (DPC++, OpenMP) or the application can use the sub-device handles, but if other application wants to use the hierarchy of root and sub-device handles, then it would be also available. Limiting to exposing sub-devices as devices always, would limit applications who want to see the hierarchy.

  2. Implicit scaling:
    a. By exposing the root device, we allow for implicit scaling to be supported with a sub-set of tiles. In the sample below, we would have implicit scaling with the two-out-of-four tiles 1.3, and 1.0. The application then would decide whether to use the root device with a 2T implicit scaling, or just use the tiles directly. If we exposed each sub-device as a device, then implicit scaling wouldn’t’ be possible with a sub-set of tiles.

  3. Scalability:
    a. In the future we could have further levels in the device hierarchy, with sub-devices inside sub-devices. In this case, it would become difficult to decide what a device is. Imagine the case where you have this:

  • 1 root device
    • 2 tiles
      • Each tile with 4 sub-sub-devices.

Now imagine the user pass this mask:

MASK=0.0,0.1.2,0.1.3

In this case, if we exposed each as a sub-device, then we would have each device with a different set of capabilities, which may further complicate things. However, by exposing in this case

MASK=0.0,0.1.2,0.1.3 =>

  • root device handle 0
    • sub device handle 0: representing 0.0
    • sub-device handle 1: representing 0.1
      • sub-device handle 0: representing 0.1.2
      • sub-device handle 1: representing 0.1.3

it is clear, and easier for the application, to traverse the device hierarchy and understand what each device handle represents.

In this case, if the application, DPC++, or OpenMP, wants to see 0.0, 0.1.2, and 0.1.3, as separate devices, can do it by just selecting the right-most leaves in the trees, and if other application wants to see the whole hierarchy, and use implicit scaling, then it would use the device handle that they need.

Now, one proposal from customers is to either change the meaning of the affinity mask, or to define a new one, like ZE_VISIBLE_DEVICES, which allows for this model.

@servesh
Copy link

servesh commented Mar 18, 2022

@jandres742 Would it make sense to be more pragmatic in the way root devices are shown to the programming layer above?

The current issue seems to stem from, "we expose the parent device when at least 2 sub-devices are selected with the mask"

My thinking here is,

MASK=0.0,0.1.2,0.1.3=>

  • sub device handle 0: representing 0.0 (Default device 0 from programming layer, if chosen implicitly scale across 0.1.2 and 0.1.3. Memory allocations should be split across the closest domain to these devices, i.e 0.1's global memory)
    • sub-device handle 0: representing 0.1.2
    • sub-device handle 1: representing 0.1.3

MASK=0,0.0,0.1,0.1.2,0.1.3=>

  • root device handle 0 (Default device 0 from programming layer, if chosen implicitly scale across 0.1 and 0.2. Memory allocations should be split across the closest domain to these devices, i.e 0's global memory)
    • sub device handle 0: representing 0.0
    • sub-device handle 1: representing 0.1 (device 2 from programming layer, if chosen implicitly scale across 0.1.2 and 0.1.3. Memory allocations should be split across the closest domain to these devices, i.e 0.1's global memory)
      • sub-device handle 0: representing 0.1.2
      • sub-device handle 1: representing 0.1.3

And if the application chooses a device handle with subdevice, then implicitly scale the workload across its subdevices.

@jandres742
Copy link
Author

thanks @servesh . I think what you are saying is the same as me, no? The way we have the affinity mask defined allows for allowing users to programmatically select the device handle in the hierarchy that fits their needs, depending on the mask passed. The behavior you showed in your example is exactly that. We would expose several device handles in the hierarchy, and do implicit scaling and color the allocations accordingly, and as you say, the application can programmatically select the handle it wants.

MASK=0,0.0,0.1,0.1.2,0.1.3=>

  • root device handle 0 (Default device 0 from programming layer, if chosen implicitly scale across 0.1 and 0.2. Memory > allocations should be split across the closest domain to these devices, i.e 0's global memory)
  • sub device handle 0: representing 0.0
  • sub-device handle 1: representing 0.1 (device 2 from programming layer, if chosen implicitly scale across 0.1.2 and 0.1.3. Memory allocations should be split across the closest domain to these devices, i.e 0.1's global memory)
    • sub-device handle 0: representing 0.1.2
    • sub-device handle 1: representing 0.1.3

If instead of that, we would expose each of this comma-separated masks as a single device, then no memory coloring nor implicit scaling would be possible.

That's why I dont think we should change the meaning of the mask, and previous suggestion from your team about having a separate environment variable to say whether or not we want the comma-separated masks as devices might be more viable.

@TApplencourt
Copy link

TApplencourt commented Mar 18, 2022

That's why I dont think we should change the meaning of the mask, and previous suggestion from your team about having a separate environment variable to say whether or not we want the comma-separated masks as devices might be more viable.

I agree. HavingZE_VISIBILE_DEVICES or another ENV, will be maybe more tractable. Look like both behaviors (the visibly and the masking) are needed.

Some users definitely want the same behavior as ROCR_VISIBLE_DEVICES. So not giving a mask, just an "expose was I pass you as a device".

So having 2 different ENV seems to be a good idea!

@jandres742
Copy link
Author

Moved to public spec repo we have now:

oneapi-src/level-zero-spec#1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants