[XPU] XPU accelerator support for Intel GPU device #4547

delock · 2023-10-20T03:11:17Z

This PR includes XPU support for Intel GPU. With this PR, DeepSpeed can support XPU devices without install Intel Extension for DeepSpeed.

* add aio in xpu_upstream * Update async_io.py deleting private path

* add aio in xpu_upstream * Update async_io.py deleting private path * Update async_io.py

* add sylomatic code into upstream enable jit_load for sycl kernels * find Python.h using general code * * add SYCLAutoOpBuilder to support InferenceOpBuilder * move scripts path to op_builder/xpu * only change cuda files extension * delete unused code in inferenceBuilder * change third-party relative path to enabel python install * extracty smaller functions from sycl_extension * change from_blob in source code to avoid big part post processing * run pre-commit * add BF16 support * add license to csrc/xpu code

delock · 2023-10-30T07:38:38Z

SYCLAutoOPBuilder is integrated to convert CUDA kernels into SYCL kernels. Currently transformer inference kernels will be converted automatically during installation time. We are investigating whether we can expand this builder to other kernels so we can reduce SYCL kernel files.

@baodii who is working on SYCLAutoOPBuilder.

* add sylomatic code into upstream enable jit_load for sycl kernels * find Python.h using general code * * add SYCLAutoOpBuilder to support InferenceOpBuilder * move scripts path to op_builder/xpu * only change cuda files extension * change third-party relative path to enabel python install * extracty smaller functions from sycl_extension * change from_blob in source code to avoid big part post processing * run pre-commit * add BF16 support * add other OPBuilder. fused_adam done * cpu_adam done * all xpu OpBuilder done, need more test * delete csrc/xpu * delete useless files

delock · 2023-11-08T03:19:31Z

@tjruwase With @baodii 's contribution we have SYCLAutoBuilder which converts CUDA kernel into SYCL kernel used by Intel GPU. Now we can remove most manually written SYCL kernel in this PR (only one left and fix is on the way).

CaoZhongZ · 2023-11-13T07:31:55Z

oneapi-src/SYCLomatic#1398 Issue for last residual SYCL porting, when it's done we'll fully migrate all kernels. @delock @baodii

* fix xpu builder to make install sucessfully * fix AT_CUDA_CHECK error

* * delete SYCLAutoOpBuilder * add optimizer SYCLOpBuilder * delete transformer_inference op * fix format error

delock · 2023-11-29T03:29:38Z

Hi @tjruwase @jeffra, after internal discussion, we removed SyclAutoOpBuilder due to consideration that altough we have validated automatically converted SYCL kernel at this time, if there is new CUDA code change in existing CUDA kernel and these new changes had been automatically converted with SyclAutoOPBuilder, we might have broken functionality and performance in converted SYCL code, if this process is fully automated.

Instead, we prefer to 1) convert CUDA kernels with SyclAutoOpBuilder offline, 2) validate these converted SYCL code 3) upstream the valdiated SYCL kernel. Current SYCL kernels in this PR is the result of this process.

Let us know your thoughts and comments and we can discussion how to go forward. Thanks!

tjruwase · 2023-12-04T17:36:42Z

@delock, thanks for the update. If I understand correctly, you plan to upstream new SYCL kernels via PRs?

delock · 2023-12-05T02:11:48Z

@delock, thanks for the update. If I understand correctly, you plan to upstream new SYCL kernels via PRs?

Hi @tjruwase , it depends. For supporting DeepSpeed OpBuilder we have two methods:

If the functionality in OpBuilder is relative specific to DeepSpeed, we plan to upstream SYCL kernels for these OpBuilder via PRs. Optimizer for DeepSpeed is the set of SYCL kernels we are upstreaming.
If the functionality in OpBuilder is relative generic, we plan to build the kernel inside Intel Extension for PyTorch, and reuse the functionality inside OpBuilder implementation. This is something similiar to NPU's implementation https://github.com/microsoft/DeepSpeed/blob/master/op_builder/npu/fused_adam.py , in this case no SYCL kernel will be upstreamed.

On a big picture, we expect most DeepSpeed feature needs OpBuilder supported through method 2 for XPU, with the intention that the functionality could also be reused elsewhere. We may see method 1 be used in two situations:

If we see the kernel function is DeepSpeed specific.
If there is contribution from other party, being able to implement through SYCLOpBuilder is a more direct way to contribute.

)

tjruwase · 2024-01-04T16:47:01Z

@delock, apologies for the delay in reviewing PR. We will prioritize this now.

mrwyattii · 2024-01-04T18:15:15Z

Thanks @delock this all looks great! Do you have tests that you are running internally to verify this code?

delock · 2024-01-05T01:11:59Z

Thanks @delock this all looks great! Do you have tests that you are running internally to verify this code?

Hi @mrwyattii yes we validate this code on XPU device regulary for inference and training workloads.

This PR includes XPU support for Intel GPU. With this PR, DeepSpeed can support XPU devices without install Intel Extension for DeepSpeed. --------- Co-authored-by: Liangliang-Ma <1906710196@qq.com> Co-authored-by: baodi <di.bao@intel.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Yizhou Wang <yizhou.wang@intel.com> Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>

delock and others added 20 commits September 5, 2023 11:09

initial merge of Intel Extension for DeepSpeed

27cb0ab

Remove intel_extension_for_deepspeed from xpu accelerator

a98bf46

remove sycl_kernel_path and sycl_kernel_include

28b4871

better support for external XPU_Accelerator from IDEX

d736af7

Remove FlashAttention op

01b8049

workaround for ipex.has_xpu()

3df5fc8

op builder clean up

ae1364a

fix format

f0f7a4c

remove TransformerBuilder

0b25435

fix source file path in transformer_inference

28f9feb

remove inference builder

a786286

Merge branch 'up-master' into gma/xpu_upstream

cde57d2

add new missing accelerator interfaces

5a54238

fix syntax error

27e07d6

add aio in xpu_upstream (#17)

32ecdae

remove duplicate is_pinned

e3daa59

deleting private path in aio builder (#20)

6024e05

* add aio in xpu_upstream * Update async_io.py deleting private path

aio op_builder delete unused method importing (#22)

beab27b

* add aio in xpu_upstream * Update async_io.py deleting private path * Update async_io.py

remove white changes

674907c

delock and others added 4 commits October 30, 2023 07:41

Merge branch 'master' into gma/xpu_upstream

289c88b

add available_memory in XPU accelerator

f9da802

Merge branch 'master' into gma/xpu_upstream

f6ae4b5

delock marked this pull request as ready for review November 8, 2023 06:40

delock requested review from jeffra, RezaYazdaniAminabadi and cmikeh2 as code owners November 8, 2023 06:40

baodii and others added 2 commits November 16, 2023 08:32

Cpuinfo fix (#30)

09fe24b

* fix xpu builder to make install sucessfully * fix AT_CUDA_CHECK error

Merge branch 'master' into gma/xpu_upstream

41a2539

delock mentioned this pull request Nov 27, 2023

(Do not merge) (CPU) aggregation of few recent fixes/optimizations #3920

Draft

25 tasks

baodii and others added 2 commits November 29, 2023 11:03

upstream xpu support (#36)

6ab7849

* * delete SYCLAutoOpBuilder * add optimizer SYCLOpBuilder * delete transformer_inference op * fix format error

remove unnecessary change in cuda code

6222291

Merge branch 'master' into gma/xpu_upstream

f2c60a8

add export_env in accordance with PR#4830

5d8f0bd

delock requested a review from mrwyattii as a code owner January 3, 2024 07:01

YizhouZ and others added 2 commits January 4, 2024 11:22

xpu_accelerator.py: add graph operations in accordance with PR#4318 (#40

2665cfd

)

Merge branch 'master' into gma/xpu_upstream

8b9e2c7

tjruwase requested review from ShadenSmith and tjruwase and removed request for jeffra, cmikeh2, awan-10, RezaYazdaniAminabadi and arashb January 4, 2024 16:45

tjruwase assigned mrwyattii Jan 4, 2024

mrwyattii approved these changes Jan 4, 2024

View reviewed changes

Merge branch 'master' into gma/xpu_upstream

790ee18

mrwyattii merged commit f4f3131 into microsoft:master Jan 5, 2024
14 checks passed

mrwyattii mentioned this pull request Jan 20, 2024

[BUG] pip install deepspeed==0.13.0 fails #4984

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[XPU] XPU accelerator support for Intel GPU device #4547

[XPU] XPU accelerator support for Intel GPU device #4547

delock commented Oct 20, 2023 •

edited

Loading

delock commented Oct 30, 2023

delock commented Nov 8, 2023

CaoZhongZ commented Nov 13, 2023

delock commented Nov 29, 2023 •

edited

Loading

tjruwase commented Dec 4, 2023

delock commented Dec 5, 2023 •

edited

Loading

tjruwase commented Jan 4, 2024

mrwyattii commented Jan 4, 2024

delock commented Jan 5, 2024

[XPU] XPU accelerator support for Intel GPU device #4547

[XPU] XPU accelerator support for Intel GPU device #4547

Conversation

delock commented Oct 20, 2023 • edited Loading

delock commented Oct 30, 2023

delock commented Nov 8, 2023

CaoZhongZ commented Nov 13, 2023

delock commented Nov 29, 2023 • edited Loading

tjruwase commented Dec 4, 2023

delock commented Dec 5, 2023 • edited Loading

tjruwase commented Jan 4, 2024

mrwyattii commented Jan 4, 2024

delock commented Jan 5, 2024

delock commented Oct 20, 2023 •

edited

Loading

delock commented Nov 29, 2023 •

edited

Loading

delock commented Dec 5, 2023 •

edited

Loading