Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 16 additions & 16 deletions sycl/doc/extensions/supported/sycl_ext_intel_cslice.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
== Notice

[%hardbreaks]
Copyright (C) 2022-2022 Intel Corporation. All rights reserved.
Copyright (C) 2022-2023 Intel Corporation. All rights reserved.

Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
Expand Down Expand Up @@ -65,7 +65,7 @@ new partitioning type
`info::partition_property::ext_intel_partition_by_cslice`.

The only Intel GPU devices that currently support this type of partitioning
are the Data Center GPU Max series (aka PVC), and this support is only
are the Intel(R) Data Center GPU Max Series (aka PVC), and this support is only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
are the Intel(R) Data Center GPU Max Series (aka PVC), and this support is only
are the Intel(R) Data Center GPU Max Series, and this support is only

available when the device driver is configured in {multi-CCS}[multi-CCS] mode.
See that documentation for instructions on how to enable this mode and for
other important information. Currently, it is only possible to partition a
Expand All @@ -83,20 +83,20 @@ be further partitioned by `ext_intel_partition_by_cslice`.

It is important to understand that the device driver virtualizes work
submission to the cslice sub-devices. (More specifically, the device driver
virtualizes work submission to different CCS-es, and this means that on Data
Center GPU Max series devices the work submission to a cslice is virtualized.)
This virtualization happens only between processes, and not within a single
process. For example, consider a single process that constructs two SYCL
queues on cslice sub-device #0. Kernels submitted to these two queues are
guaranteed to conflict, both using the same set of execution units. Therefore,
if a single process wants to explicitly submit kernels to cslice sub-devices
and it wants to avoid conflict, it should create queues on different
sub-devices. By contrast, consider an example where two separate processes
create a SYCL queue on cslice sub-device #0. In this case, the device driver
virtualizes access to this cslice, and kernels submitted from the first process
may run on different execution units than kernels submitted from the second
process. In this second case, the device driver binds the process's requested
cslice to a physical cslice according to the overall system load.
virtualizes work submission to different CCS-es, and this means that on
Intel(R) Data Center GPU Max Series devices the work submission to a cslice is
virtualized.) This virtualization happens only between processes, and not
within a single process. For example, consider a single process that
constructs two SYCL queues on cslice sub-device #0. Kernels submitted to these
two queues are guaranteed to conflict, both using the same set of execution
units. Therefore, if a single process wants to explicitly submit kernels to
cslice sub-devices and it wants to avoid conflict, it should create queues on
different sub-devices. By contrast, consider an example where two separate
processes create a SYCL queue on cslice sub-device #0. In this case, the
device driver virtualizes access to this cslice, and kernels submitted from the
first process may run on different execution units than kernels submitted from
the second process. In this second case, the device driver binds the process's
requested cslice to a physical cslice according to the overall system load.

Note that this extension can be supported by any implementation. If an
implementation supports a backend or device without the concept of cslice
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
== Notice

[%hardbreaks]
Copyright (C) 2022-2022 Intel Corporation. All rights reserved.
Copyright (C) 2022-2023 Intel Corporation. All rights reserved.

Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
Expand Down Expand Up @@ -181,7 +181,7 @@ other devices.
On many Intel devices, there is just one available queue index, and there is
therefore no advantage to using the `compute_index` property. However, this
property can sometimes be useful when running on Data Center GPU Flex series
devices (aka ATS-M) or Data Center GPU Max series devices (aka PVC).
devices (aka ATS-M) or Intel(R) Data Center GPU Max Series devices (aka PVC).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
devices (aka ATS-M) or Intel(R) Data Center GPU Max Series devices (aka PVC).
devices or Intel(R) Data Center GPU Max Series devices.


Some models of ATS-M support multiple queue indices with the semantics
described in the sections above. When a single process submits kernels to
Expand Down