[1/2] Intel GPU Runtime Upstreaming for Generator #118528

guangyey · 2024-01-29T14:47:11Z

Stack from ghstack (oldest at bottom):

Motivation

As mentioned in [RFC] Intel GPU Runtime Upstreaming, the last runtime component we would like to upstream is Generator which is responsible for the pseudo-random number generation. To facilitate the code review, we split the code changes into 2 PRs. This is one of the 2 PRs and covers the changes under aten.

Design

Following the previous design, c10::GeneratorImpl is the device-agnostic abstraction of a random number generator. So we will introduce an XPU generator XPUGeneratorImpl, inheriting from c10::GeneratorImpl, to manage random states on an Intel GPU device. Intel GPU runtime Generator adopts the same algorithm as CPU. The corresponding C++ file should be placed in aten/src/ATen/xpu/ folder and is built in libtorch_xpu.so.
This PR provide the list of APIs:

getDefaultXPUGenerator
createXPUGenerator

Additional Context

The 2nd PR will cover python frontend.

The differences with CUDA:
The generator-related ATen CPP APIs are 1:1 mapping with CUDA.
The XPUGeneratorImpl's member functions have slight differences with CUDA.
lack of CUDA-related counterpart APIs listed below:

capture_prologue
capture_epilogue
philox_cuda_state
reset_rnn_state

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

pytorch-bot · 2024-01-29T14:47:14Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118528

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 03217cd with merge base 685d862 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / linux-focal-py3.11-clang10 / test (dynamo, 3, 3, linux.2xlarge) (gh)
test_sparse.py::TestSparseMeta::test_basic_SparseCOO_float64
pull / linux-focal-py3.8-clang10 / test (dynamo, 3, 3, linux.2xlarge) (gh)
test_sparse.py::TestSparseMeta::test_basic_SparseCOO_float64

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

ghstack-source-id: 2384013 Pull Request resolved: #118528

[ghstack-poisoned]

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

pytorchmergebot · 2024-02-26T01:26:37Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-02-26T01:28:32Z

Merge failed

Reason: 1 jobs have failed, first few of them are: xpu / linux-jammy-xpu-py3.8 / build

Details for Dev Infra team

Raised by workflow job

…nerator" # Motivation According to [[1/2] Intel GPU Runtime Upstreaming for Generator](#118528), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the second PR covers the changes under `python frontend`. # Design Currently, it primarily offers geneartor-related APIs, including - `torch.xpu.default_generators` - `torch.xpu.get_rng_state` - `torch.xpu.get_rng_state_all` - `torch.xpu.initial_seed` - `torch.xpu.manual_seed` - `torch.xpu.manual_seed_all` - `torch.xpu.seed` - `torch.xpu.seed_all` - `torch.xpu.set_rng_state` - `torch.xpu.set_rng_state_all` # Additional Context The differences with CUDA: The generator-related frontend python APIs are 1:1 mapping with CUDA. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

# Motivation According to [[1/2] Intel GPU Runtime Upstreaming for Generator](#118528), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the second PR covers the changes under `python frontend`. # Design Currently, it primarily offers geneartor-related APIs, including - `torch.xpu.default_generators` - `torch.xpu.get_rng_state` - `torch.xpu.get_rng_state_all` - `torch.xpu.initial_seed` - `torch.xpu.manual_seed` - `torch.xpu.manual_seed_all` - `torch.xpu.seed` - `torch.xpu.seed_all` - `torch.xpu.set_rng_state` - `torch.xpu.set_rng_state_all` # Additional Context The differences with CUDA: The generator-related frontend python APIs are 1:1 mapping with CUDA. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

…nerator" # Motivation According to [[1/2] Intel GPU Runtime Upstreaming for Generator](#118528), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the second PR covers the changes under `python frontend`. # Design Currently, it primarily offers geneartor-related APIs, including - `torch.xpu.default_generators` - `torch.xpu.get_rng_state` - `torch.xpu.get_rng_state_all` - `torch.xpu.initial_seed` - `torch.xpu.manual_seed` - `torch.xpu.manual_seed_all` - `torch.xpu.seed` - `torch.xpu.seed_all` - `torch.xpu.set_rng_state` - `torch.xpu.set_rng_state_all` # Additional Context The differences with CUDA: The generator-related frontend python APIs are 1:1 mapping with CUDA. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

# Motivation According to [[1/2] Intel GPU Runtime Upstreaming for Generator](#118528), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the second PR covers the changes under `python frontend`. # Design Currently, it primarily offers geneartor-related APIs, including - `torch.xpu.default_generators` - `torch.xpu.get_rng_state` - `torch.xpu.get_rng_state_all` - `torch.xpu.initial_seed` - `torch.xpu.manual_seed` - `torch.xpu.manual_seed_all` - `torch.xpu.seed` - `torch.xpu.seed_all` - `torch.xpu.set_rng_state` - `torch.xpu.set_rng_state_all` # Additional Context The differences with CUDA: The generator-related frontend python APIs are 1:1 mapping with CUDA. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

…nerator" # Motivation According to [[1/2] Intel GPU Runtime Upstreaming for Generator](#118528), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the second PR covers the changes under `python frontend`. # Design Currently, it primarily offers geneartor-related APIs, including - `torch.xpu.default_generators` - `torch.xpu.get_rng_state` - `torch.xpu.get_rng_state_all` - `torch.xpu.initial_seed` - `torch.xpu.manual_seed` - `torch.xpu.manual_seed_all` - `torch.xpu.seed` - `torch.xpu.seed_all` - `torch.xpu.set_rng_state` - `torch.xpu.set_rng_state_all` # Additional Context The differences with CUDA: The generator-related frontend python APIs are 1:1 mapping with CUDA. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

# Motivation According to [[1/2] Intel GPU Runtime Upstreaming for Generator](#118528), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the second PR covers the changes under `python frontend`. # Design Currently, it primarily offers geneartor-related APIs, including - `torch.xpu.default_generators` - `torch.xpu.get_rng_state` - `torch.xpu.get_rng_state_all` - `torch.xpu.initial_seed` - `torch.xpu.manual_seed` - `torch.xpu.manual_seed_all` - `torch.xpu.seed` - `torch.xpu.seed_all` - `torch.xpu.set_rng_state` - `torch.xpu.set_rng_state_all` # Additional Context The differences with CUDA: The generator-related frontend python APIs are 1:1 mapping with CUDA. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

…nerator" # Motivation According to [[1/2] Intel GPU Runtime Upstreaming for Generator](#118528), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the second PR covers the changes under `python frontend`. # Design Currently, it primarily offers geneartor-related APIs, including - `torch.xpu.default_generators` - `torch.xpu.get_rng_state` - `torch.xpu.get_rng_state_all` - `torch.xpu.initial_seed` - `torch.xpu.manual_seed` - `torch.xpu.manual_seed_all` - `torch.xpu.seed` - `torch.xpu.seed_all` - `torch.xpu.set_rng_state` - `torch.xpu.set_rng_state_all` # Additional Context The differences with CUDA: The generator-related frontend python APIs are 1:1 mapping with CUDA. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

# Motivation According to [[1/2] Intel GPU Runtime Upstreaming for Generator](#118528), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the second PR covers the changes under `python frontend`. # Design Currently, it primarily offers geneartor-related APIs, including - `torch.xpu.default_generators` - `torch.xpu.get_rng_state` - `torch.xpu.get_rng_state_all` - `torch.xpu.initial_seed` - `torch.xpu.manual_seed` - `torch.xpu.manual_seed_all` - `torch.xpu.seed` - `torch.xpu.seed_all` - `torch.xpu.set_rng_state` - `torch.xpu.set_rng_state_all` # Additional Context The differences with CUDA: The generator-related frontend python APIs are 1:1 mapping with CUDA. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

# Motivation As mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the last runtime component we would like to upstream is `Generator` which is responsible for the pseudo-random number generation. To facilitate the code review, we split the code changes into 2 PRs. This is one of the 2 PRs and covers the changes under `aten`. # Design Following the previous design, `c10::GeneratorImpl` is the device-agnostic abstraction of a random number generator. So we will introduce an XPU generator `XPUGeneratorImpl`, inheriting from `c10::GeneratorImpl`, to manage random states on an Intel GPU device. Intel GPU runtime `Generator` adopts the same algorithm as CPU. The corresponding C++ file should be placed in aten/src/ATen/xpu/ folder and is built in `libtorch_xpu.so`. This PR provide the list of APIs: - `getDefaultXPUGenerator` - `createXPUGenerator` # Additional Context The 2nd PR will cover `python frontend`. The differences with CUDA: The generator-related ATen CPP APIs are 1:1 mapping with CUDA. The XPUGeneratorImpl's member functions have slight differences with CUDA. lack of CUDA-related counterpart APIs listed below: - capture_prologue - capture_epilogue - philox_cuda_state - reset_rnn_state cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

albanD

SGTM !

albanD · 2024-02-26T23:06:26Z

aten/src/ATen/xpu/XPUGeneratorImpl.cpp

+ */
+c10::once_flag init_flag;
+DeviceIndex num_gpus = -1;
+std::deque<c10::once_flag> xpu_gens_init_flag;


Ho interesting. Sounds good!

guangyey · 2024-02-27T01:36:33Z

@pytorchbot merge

pytorchmergebot · 2024-02-27T01:38:21Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…nerator" # Motivation According to [[1/2] Intel GPU Runtime Upstreaming for Generator](#118528), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the second PR covers the changes under `python frontend`. # Design Currently, it primarily offers geneartor-related APIs, including - `torch.xpu.default_generators` - `torch.xpu.get_rng_state` - `torch.xpu.get_rng_state_all` - `torch.xpu.initial_seed` - `torch.xpu.manual_seed` - `torch.xpu.manual_seed_all` - `torch.xpu.seed` - `torch.xpu.seed_all` - `torch.xpu.set_rng_state` - `torch.xpu.set_rng_state_all` # Additional Context The differences with CUDA: The generator-related frontend python APIs are 1:1 mapping with CUDA. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

# Motivation According to [[1/2] Intel GPU Runtime Upstreaming for Generator](#118528), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the second PR covers the changes under `python frontend`. # Design Currently, it primarily offers geneartor-related APIs, including - `torch.xpu.default_generators` - `torch.xpu.get_rng_state` - `torch.xpu.get_rng_state_all` - `torch.xpu.initial_seed` - `torch.xpu.manual_seed` - `torch.xpu.manual_seed_all` - `torch.xpu.seed` - `torch.xpu.seed_all` - `torch.xpu.set_rng_state` - `torch.xpu.set_rng_state_all` # Additional Context The differences with CUDA: The generator-related frontend python APIs are 1:1 mapping with CUDA. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

…nerator" # Motivation According to [[1/2] Intel GPU Runtime Upstreaming for Generator](#118528), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the second PR covers the changes under `python frontend`. # Design Currently, it primarily offers geneartor-related APIs, including - `torch.xpu.default_generators` - `torch.xpu.get_rng_state` - `torch.xpu.get_rng_state_all` - `torch.xpu.initial_seed` - `torch.xpu.manual_seed` - `torch.xpu.manual_seed_all` - `torch.xpu.seed` - `torch.xpu.seed_all` - `torch.xpu.set_rng_state` - `torch.xpu.set_rng_state_all` # Additional Context The differences with CUDA: The generator-related frontend python APIs are 1:1 mapping with CUDA. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

# Motivation According to [[1/2] Intel GPU Runtime Upstreaming for Generator](#118528), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the second PR covers the changes under `python frontend`. # Design Currently, it primarily offers geneartor-related APIs, including - `torch.xpu.default_generators` - `torch.xpu.get_rng_state` - `torch.xpu.get_rng_state_all` - `torch.xpu.initial_seed` - `torch.xpu.manual_seed` - `torch.xpu.manual_seed_all` - `torch.xpu.seed` - `torch.xpu.seed_all` - `torch.xpu.set_rng_state` - `torch.xpu.set_rng_state_all` # Additional Context The differences with CUDA: The generator-related frontend python APIs are 1:1 mapping with CUDA. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

# Motivation According to [[1/2] Intel GPU Runtime Upstreaming for Generator](#118528), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the second PR covers the changes under `python frontend`. # Design Currently, it primarily offers geneartor-related APIs, including - `torch.xpu.default_generators` - `torch.xpu.get_rng_state` - `torch.xpu.get_rng_state_all` - `torch.xpu.initial_seed` - `torch.xpu.manual_seed` - `torch.xpu.manual_seed_all` - `torch.xpu.seed` - `torch.xpu.seed_all` - `torch.xpu.set_rng_state` - `torch.xpu.set_rng_state_all` # Additional Context The differences with CUDA: The generator-related frontend python APIs are 1:1 mapping with CUDA. Pull Request resolved: #118613 Approved by: https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/jgong5, https://github.com/albanD

guangyey marked this pull request as draft January 29, 2024 14:47

guangyey changed the title ~~[1/2] Intel GPU Runtime Upstreaming for Generator~~ [WIP] [1/2] Intel GPU Runtime Upstreaming for Generator Jan 29, 2024

pytorchbot added the open source label Jan 29, 2024

guangyey added 2 commits January 29, 2024 22:37

[1/2] Intel GPU Runtime Upstreaming for Generator

8115ce3

[ghstack-poisoned]

Update on "[WIP] [1/2] Intel GPU Runtime Upstreaming for Generator"

1e7a78a

[ghstack-poisoned]

guangyey added ciflow/xpu Run XPU CI tasks intel This tag is for PR from Intel topic: new features topic category labels Jan 30, 2024

guangyey added the release notes: xpu release notes category label Jan 30, 2024

guangyey added a commit that referenced this pull request Jan 30, 2024

[1/2] Intel GPU Runtime Upstreaming for Generator

9f5a2c0

ghstack-source-id: 2384013 Pull Request resolved: #118528

guangyey mentioned this pull request Jan 30, 2024

[2/2] Intel GPU Runtime Upstreaming for Generator #118613

Closed

guangyey added 6 commits January 30, 2024 10:19

Update on "[WIP] [1/2] Intel GPU Runtime Upstreaming for Generator"

b79871d

[ghstack-poisoned]

Update on "[WIP] [1/2] Intel GPU Runtime Upstreaming for Generator"

cfa8d20

[ghstack-poisoned]

Update on "[WIP] [1/2] Intel GPU Runtime Upstreaming for Generator"

6e04041

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

Update on "[WIP] [1/2] Intel GPU Runtime Upstreaming for Generator"

f326cac

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

Update on "[WIP] [1/2] Intel GPU Runtime Upstreaming for Generator"

015afbc

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

Update on "[WIP] [1/2] Intel GPU Runtime Upstreaming for Generator"

b124234

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 26, 2024

pytorchmergebot added the merging label Feb 26, 2024

pytorchmergebot removed the merging label Feb 26, 2024

guangyey requested a review from albanD February 26, 2024 08:30

guangyey mentioned this pull request Feb 26, 2024

refactor code to share across different devices #120602

Closed

albanD approved these changes Feb 26, 2024

View reviewed changes

pytorchmergebot added the merging label Feb 27, 2024

pytorchmergebot closed this in b3fe53e Feb 27, 2024

pytorchmergebot added Merged and removed merging labels Feb 27, 2024

github-actions bot deleted the gh/guangyey/9/head branch March 28, 2024 01:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[1/2] Intel GPU Runtime Upstreaming for Generator #118528

[1/2] Intel GPU Runtime Upstreaming for Generator #118528

Uh oh!

guangyey commented Jan 29, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jan 29, 2024 •

edited

Loading

Uh oh!

pytorchmergebot commented Feb 26, 2024

Uh oh!

pytorchmergebot commented Feb 26, 2024

Uh oh!

albanD left a comment

Uh oh!

albanD Feb 26, 2024

Uh oh!

guangyey commented Feb 27, 2024

Uh oh!

pytorchmergebot commented Feb 27, 2024

Uh oh!

Uh oh!

[1/2] Intel GPU Runtime Upstreaming for Generator #118528

[1/2] Intel GPU Runtime Upstreaming for Generator #118528

Uh oh!

Conversation

guangyey commented Jan 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Design

Additional Context

Uh oh!

pytorch-bot bot commented Jan 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118528

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

pytorchmergebot commented Feb 26, 2024

Merge started

Uh oh!

pytorchmergebot commented Feb 26, 2024

Merge failed

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

albanD Feb 26, 2024

Choose a reason for hiding this comment

Uh oh!

guangyey commented Feb 27, 2024

Uh oh!

pytorchmergebot commented Feb 27, 2024

Merge started

Uh oh!

Uh oh!

guangyey commented Jan 29, 2024 •

edited

Loading

pytorch-bot bot commented Jan 29, 2024 •

edited

Loading