Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1/2] Intel GPU Runtime Upstreaming for Generator #118528

Closed
wants to merge 65 commits into from

Conversation

guangyey
Copy link
Collaborator

@guangyey guangyey commented Jan 29, 2024

Stack from ghstack (oldest at bottom):

Motivation

As mentioned in [RFC] Intel GPU Runtime Upstreaming, the last runtime component we would like to upstream is Generator which is responsible for the pseudo-random number generation. To facilitate the code review, we split the code changes into 2 PRs. This is one of the 2 PRs and covers the changes under aten.

Design

Following the previous design, c10::GeneratorImpl is the device-agnostic abstraction of a random number generator. So we will introduce an XPU generator XPUGeneratorImpl, inheriting from c10::GeneratorImpl, to manage random states on an Intel GPU device. Intel GPU runtime Generator adopts the same algorithm as CPU. The corresponding C++ file should be placed in aten/src/ATen/xpu/ folder and is built in libtorch_xpu.so.
This PR provide the list of APIs:

  • getDefaultXPUGenerator
  • createXPUGenerator

Additional Context

The 2nd PR will cover python frontend.

The differences with CUDA:
The generator-related ATen CPP APIs are 1:1 mapping with CUDA.
The XPUGeneratorImpl's member functions have slight differences with CUDA.
lack of CUDA-related counterpart APIs listed below:

  • capture_prologue
  • capture_epilogue
  • philox_cuda_state
  • reset_rnn_state

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

Copy link

pytorch-bot bot commented Jan 29, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118528

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 03217cd with merge base 685d862 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@guangyey guangyey marked this pull request as draft January 29, 2024 14:47
guangyey added a commit that referenced this pull request Jan 29, 2024
ghstack-source-id: 84837bf48446dafba2376a60502d1d578e38416a
Pull Request resolved: #118528
@guangyey guangyey changed the title [1/2] Intel GPU Runtime Upstreaming for Generator [WIP] [1/2] Intel GPU Runtime Upstreaming for Generator Jan 29, 2024
guangyey added a commit that referenced this pull request Jan 29, 2024
ghstack-source-id: e99b8f26a89895d4d3473bdd6d40c348ddb5f3b2
Pull Request resolved: #118528
guangyey added a commit that referenced this pull request Jan 30, 2024
ghstack-source-id: 03e07858cc3a7c26e9e62e97b562450dee013e5c
Pull Request resolved: #118528
@guangyey guangyey added ciflow/xpu Run XPU CI tasks intel This tag is for PR from Intel topic: new features topic category labels Jan 30, 2024
guangyey added a commit that referenced this pull request Jan 30, 2024
ghstack-source-id: 155181e7fcb0cdfb42e00e1050dd7196be43a809
Pull Request resolved: #118528
@guangyey guangyey added the release notes: xpu release notes category label Jan 30, 2024
guangyey added a commit that referenced this pull request Jan 30, 2024
ghstack-source-id: 238401393fe0ee08bfcc6e840c2b466c4523886b
Pull Request resolved: #118528
guangyey added a commit that referenced this pull request Jan 30, 2024
ghstack-source-id: 67bb9442cf18d440f4bb136ae4e80b1efc3bc7d0
Pull Request resolved: #118528
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 26, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: xpu / linux-jammy-xpu-py3.8 / build

Details for Dev Infra team Raised by workflow job

guangyey added a commit that referenced this pull request Feb 26, 2024
…nerator"


# Motivation
According to [[1/2] Intel GPU Runtime Upstreaming for Generator](#118528), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the second PR covers the changes under `python frontend`.

# Design
Currently, it primarily offers geneartor-related APIs, including

- `torch.xpu.default_generators`
- `torch.xpu.get_rng_state`
- `torch.xpu.get_rng_state_all`
- `torch.xpu.initial_seed`
- `torch.xpu.manual_seed`
- `torch.xpu.manual_seed_all`
- `torch.xpu.seed`
- `torch.xpu.seed_all`
- `torch.xpu.set_rng_state`
- `torch.xpu.set_rng_state_all`

# Additional Context
The differences with CUDA:
The generator-related frontend python APIs are 1:1 mapping with CUDA.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
guangyey added a commit that referenced this pull request Feb 26, 2024
# Motivation
According to [[1/2] Intel GPU Runtime Upstreaming for Generator](#118528), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the second PR covers the changes under `python frontend`.

# Design
Currently, it primarily offers geneartor-related APIs, including

- `torch.xpu.default_generators`
- `torch.xpu.get_rng_state`
- `torch.xpu.get_rng_state_all`
- `torch.xpu.initial_seed`
- `torch.xpu.manual_seed`
- `torch.xpu.manual_seed_all`
- `torch.xpu.seed`
- `torch.xpu.seed_all`
- `torch.xpu.set_rng_state`
- `torch.xpu.set_rng_state_all`

# Additional Context
The differences with CUDA:
The generator-related frontend python APIs are 1:1 mapping with CUDA.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
guangyey added a commit that referenced this pull request Feb 26, 2024
…nerator"


# Motivation
According to [[1/2] Intel GPU Runtime Upstreaming for Generator](#118528), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the second PR covers the changes under `python frontend`.

# Design
Currently, it primarily offers geneartor-related APIs, including

- `torch.xpu.default_generators`
- `torch.xpu.get_rng_state`
- `torch.xpu.get_rng_state_all`
- `torch.xpu.initial_seed`
- `torch.xpu.manual_seed`
- `torch.xpu.manual_seed_all`
- `torch.xpu.seed`
- `torch.xpu.seed_all`
- `torch.xpu.set_rng_state`
- `torch.xpu.set_rng_state_all`

# Additional Context
The differences with CUDA:
The generator-related frontend python APIs are 1:1 mapping with CUDA.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
guangyey added a commit that referenced this pull request Feb 26, 2024
# Motivation
According to [[1/2] Intel GPU Runtime Upstreaming for Generator](#118528), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the second PR covers the changes under `python frontend`.

# Design
Currently, it primarily offers geneartor-related APIs, including

- `torch.xpu.default_generators`
- `torch.xpu.get_rng_state`
- `torch.xpu.get_rng_state_all`
- `torch.xpu.initial_seed`
- `torch.xpu.manual_seed`
- `torch.xpu.manual_seed_all`
- `torch.xpu.seed`
- `torch.xpu.seed_all`
- `torch.xpu.set_rng_state`
- `torch.xpu.set_rng_state_all`

# Additional Context
The differences with CUDA:
The generator-related frontend python APIs are 1:1 mapping with CUDA.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
guangyey added a commit that referenced this pull request Feb 26, 2024
…nerator"


# Motivation
According to [[1/2] Intel GPU Runtime Upstreaming for Generator](#118528), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the second PR covers the changes under `python frontend`.

# Design
Currently, it primarily offers geneartor-related APIs, including

- `torch.xpu.default_generators`
- `torch.xpu.get_rng_state`
- `torch.xpu.get_rng_state_all`
- `torch.xpu.initial_seed`
- `torch.xpu.manual_seed`
- `torch.xpu.manual_seed_all`
- `torch.xpu.seed`
- `torch.xpu.seed_all`
- `torch.xpu.set_rng_state`
- `torch.xpu.set_rng_state_all`

# Additional Context
The differences with CUDA:
The generator-related frontend python APIs are 1:1 mapping with CUDA.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
guangyey added a commit that referenced this pull request Feb 26, 2024
# Motivation
According to [[1/2] Intel GPU Runtime Upstreaming for Generator](#118528), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the second PR covers the changes under `python frontend`.

# Design
Currently, it primarily offers geneartor-related APIs, including

- `torch.xpu.default_generators`
- `torch.xpu.get_rng_state`
- `torch.xpu.get_rng_state_all`
- `torch.xpu.initial_seed`
- `torch.xpu.manual_seed`
- `torch.xpu.manual_seed_all`
- `torch.xpu.seed`
- `torch.xpu.seed_all`
- `torch.xpu.set_rng_state`
- `torch.xpu.set_rng_state_all`

# Additional Context
The differences with CUDA:
The generator-related frontend python APIs are 1:1 mapping with CUDA.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@guangyey guangyey requested a review from albanD February 26, 2024 08:30
guangyey added a commit that referenced this pull request Feb 26, 2024
…nerator"


# Motivation
According to [[1/2] Intel GPU Runtime Upstreaming for Generator](#118528), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the second PR covers the changes under `python frontend`.

# Design
Currently, it primarily offers geneartor-related APIs, including

- `torch.xpu.default_generators`
- `torch.xpu.get_rng_state`
- `torch.xpu.get_rng_state_all`
- `torch.xpu.initial_seed`
- `torch.xpu.manual_seed`
- `torch.xpu.manual_seed_all`
- `torch.xpu.seed`
- `torch.xpu.seed_all`
- `torch.xpu.set_rng_state`
- `torch.xpu.set_rng_state_all`

# Additional Context
The differences with CUDA:
The generator-related frontend python APIs are 1:1 mapping with CUDA.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
guangyey added a commit that referenced this pull request Feb 26, 2024
# Motivation
According to [[1/2] Intel GPU Runtime Upstreaming for Generator](#118528), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the second PR covers the changes under `python frontend`.

# Design
Currently, it primarily offers geneartor-related APIs, including

- `torch.xpu.default_generators`
- `torch.xpu.get_rng_state`
- `torch.xpu.get_rng_state_all`
- `torch.xpu.initial_seed`
- `torch.xpu.manual_seed`
- `torch.xpu.manual_seed_all`
- `torch.xpu.seed`
- `torch.xpu.seed_all`
- `torch.xpu.set_rng_state`
- `torch.xpu.set_rng_state_all`

# Additional Context
The differences with CUDA:
The generator-related frontend python APIs are 1:1 mapping with CUDA.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
# Motivation
As mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the last runtime component we would like to upstream is `Generator` which is responsible for the pseudo-random number generation. To facilitate the code review, we split the code changes into 2 PRs. This is one of the 2 PRs and covers the changes under `aten`.

# Design
Following the previous design, `c10::GeneratorImpl` is the device-agnostic abstraction of a random number generator. So we will introduce an XPU generator `XPUGeneratorImpl`, inheriting from `c10::GeneratorImpl`, to manage random states on an Intel GPU device. Intel GPU runtime `Generator` adopts the same algorithm as CPU. The corresponding C++ file should be placed in aten/src/ATen/xpu/ folder and is built in `libtorch_xpu.so`.
This PR provide the list of APIs:
- `getDefaultXPUGenerator`
- `createXPUGenerator`

# Additional Context
The 2nd PR will cover `python frontend`.

The differences with CUDA:
The generator-related ATen CPP APIs are 1:1 mapping with CUDA.
The XPUGeneratorImpl's member functions have slight differences with CUDA.
lack of CUDA-related counterpart APIs listed below:
- capture_prologue
- capture_epilogue
- philox_cuda_state
- reset_rnn_state

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM !

*/
c10::once_flag init_flag;
DeviceIndex num_gpus = -1;
std::deque<c10::once_flag> xpu_gens_init_flag;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ho interesting. Sounds good!

@guangyey
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

guangyey added a commit that referenced this pull request Feb 27, 2024
…nerator"


# Motivation
According to [[1/2] Intel GPU Runtime Upstreaming for Generator](#118528), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the second PR covers the changes under `python frontend`.

# Design
Currently, it primarily offers geneartor-related APIs, including

- `torch.xpu.default_generators`
- `torch.xpu.get_rng_state`
- `torch.xpu.get_rng_state_all`
- `torch.xpu.initial_seed`
- `torch.xpu.manual_seed`
- `torch.xpu.manual_seed_all`
- `torch.xpu.seed`
- `torch.xpu.seed_all`
- `torch.xpu.set_rng_state`
- `torch.xpu.set_rng_state_all`

# Additional Context
The differences with CUDA:
The generator-related frontend python APIs are 1:1 mapping with CUDA.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
guangyey added a commit that referenced this pull request Feb 27, 2024
# Motivation
According to [[1/2] Intel GPU Runtime Upstreaming for Generator](#118528), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the second PR covers the changes under `python frontend`.

# Design
Currently, it primarily offers geneartor-related APIs, including

- `torch.xpu.default_generators`
- `torch.xpu.get_rng_state`
- `torch.xpu.get_rng_state_all`
- `torch.xpu.initial_seed`
- `torch.xpu.manual_seed`
- `torch.xpu.manual_seed_all`
- `torch.xpu.seed`
- `torch.xpu.seed_all`
- `torch.xpu.set_rng_state`
- `torch.xpu.set_rng_state_all`

# Additional Context
The differences with CUDA:
The generator-related frontend python APIs are 1:1 mapping with CUDA.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
guangyey added a commit that referenced this pull request Feb 28, 2024
…nerator"


# Motivation
According to [[1/2] Intel GPU Runtime Upstreaming for Generator](#118528), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the second PR covers the changes under `python frontend`.

# Design
Currently, it primarily offers geneartor-related APIs, including

- `torch.xpu.default_generators`
- `torch.xpu.get_rng_state`
- `torch.xpu.get_rng_state_all`
- `torch.xpu.initial_seed`
- `torch.xpu.manual_seed`
- `torch.xpu.manual_seed_all`
- `torch.xpu.seed`
- `torch.xpu.seed_all`
- `torch.xpu.set_rng_state`
- `torch.xpu.set_rng_state_all`

# Additional Context
The differences with CUDA:
The generator-related frontend python APIs are 1:1 mapping with CUDA.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
guangyey added a commit that referenced this pull request Feb 28, 2024
# Motivation
According to [[1/2] Intel GPU Runtime Upstreaming for Generator](#118528), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the second PR covers the changes under `python frontend`.

# Design
Currently, it primarily offers geneartor-related APIs, including

- `torch.xpu.default_generators`
- `torch.xpu.get_rng_state`
- `torch.xpu.get_rng_state_all`
- `torch.xpu.initial_seed`
- `torch.xpu.manual_seed`
- `torch.xpu.manual_seed_all`
- `torch.xpu.seed`
- `torch.xpu.seed_all`
- `torch.xpu.set_rng_state`
- `torch.xpu.set_rng_state_all`

# Additional Context
The differences with CUDA:
The generator-related frontend python APIs are 1:1 mapping with CUDA.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Feb 28, 2024
# Motivation
According to [[1/2] Intel GPU Runtime Upstreaming for Generator](#118528), as mentioned in [[RFC] Intel GPU Runtime Upstreaming](#114842), the second PR covers the changes under `python frontend`.

# Design
Currently, it primarily offers geneartor-related APIs, including

- `torch.xpu.default_generators`
- `torch.xpu.get_rng_state`
- `torch.xpu.get_rng_state_all`
- `torch.xpu.initial_seed`
- `torch.xpu.manual_seed`
- `torch.xpu.manual_seed_all`
- `torch.xpu.seed`
- `torch.xpu.seed_all`
- `torch.xpu.set_rng_state`
- `torch.xpu.set_rng_state_all`

# Additional Context
The differences with CUDA:
The generator-related frontend python APIs are 1:1 mapping with CUDA.

Pull Request resolved: #118613
Approved by: https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/jgong5, https://github.com/albanD
@github-actions github-actions bot deleted the gh/guangyey/9/head branch March 28, 2024 01:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks intel This tag is for PR from Intel Merged open source release notes: xpu release notes category topic: new features topic category
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

6 participants