Skip to content

Conversation

guangyey
Copy link
Collaborator

@guangyey guangyey commented Jan 29, 2024

Stack from ghstack (oldest at bottom):

Motivation

According to [RFC] Intel GPU Runtime Upstreaming, the 5th runtime component we would like to upstream is Guard. We will cover device guard and stream guard in this PR.

Design

Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction c10::impl::DeviceGuardImplInterface. In our design, we will introduce an XPUGuardImpl class inherits from c10::impl::DeviceGuardImplInterface. Register XPUGuardImpl to PyTorch after we implement the device switch management mechanism in XPUGuardImpl. Besides, we will introduce XPUGuard, OptionalXPUGuard, XPUStreamGuard, and OptionalXPUStreamGuard. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder.

Additional Context

It is unnecessary to add Guard code to PyTorch frontend.

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

Copy link

pytorch-bot bot commented Jan 29, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118523

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 014dfdb with merge base fff9d98 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@guangyey guangyey added topic: new features topic category intel This tag is for PR from Intel ciflow/xpu Run XPU CI tasks release notes: xpu release notes category labels Jan 30, 2024
@guangyey guangyey changed the title [WIP] Intel GPU Runtime Upstreaming for Guard Intel GPU Runtime Upstreaming for Guard Jan 30, 2024
@guangyey guangyey marked this pull request as ready for review January 30, 2024 15:30
# Motivation
According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR.

# Design
Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder.

# Additional Context
It is unnecessary to add `Guard` code to PyTorch frontend.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
# Motivation
According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR.

# Design
Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder.

# Additional Context
It is unnecessary to add `Guard` code to PyTorch frontend.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
# Motivation
According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR.

# Design
Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder.

# Additional Context
It is unnecessary to add `Guard` code to PyTorch frontend.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@guangyey guangyey requested a review from albanD February 13, 2024 02:16
# Motivation
According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR.

# Design
Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder.

# Additional Context
It is unnecessary to add `Guard` code to PyTorch frontend.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
# Motivation
According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR.

# Design
Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder.

# Additional Context
It is unnecessary to add `Guard` code to PyTorch frontend.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain why "XPUGuard is more efficient than DeviceGuard" and why this efficiency can not be retained while inheriting from DeviceGuard and implementing only more efficient methods, while not duplicating an implementation boilerplate like

void set_device(Device device) {
    guard_.set_device(device);
  }

# Motivation
According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR.

# Design
Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder.

# Additional Context
It is unnecessary to add `Guard` code to PyTorch frontend.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@guangyey
Copy link
Collaborator Author

Please explain why "XPUGuard is more efficient than DeviceGuard" and why this efficiency can not be retained while inheriting from DeviceGuard and implementing only more efficient methods, while not duplicating an implementation boilerplate like

void set_device(Device device) {
    guard_.set_device(device);
  }

I think XPUGuard is implemented by inlineDeviceGuard and XPUGuardImpl. The former is a template class. It gives XPUGuard the opportunity to inline these functions. see

/// Sets the device to the given one.
template <
typename U = T,
typename std::enable_if_t<!std::is_same_v<U, VirtualGuardImpl>, int> = 0>
void set_device(at::Device device) {
AT_ASSERT(
(U::static_type == DeviceType::HIP && device.is_cuda()) ||
device.type() == U::static_type);
auto index = device.index();
if (index == -1)
return;
impl_.setDevice(device);
current_device_ = device;
}

And If we directly use DeviceGuard, there is an additional device type indexing overhead and no inline function. see
DeviceGuardImplRegistrar::DeviceGuardImplRegistrar(
DeviceType type,
const DeviceGuardImplInterface* impl) {
device_guard_impl_registry[static_cast<size_t>(type)].store(impl);
}

I can't ensure how much removing XPUGuard will impact performance. I will remove it first and talk about it later if any performance issues happen.

@guangyey guangyey requested a review from malfet February 21, 2024 17:15
# Motivation
According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR.

# Design
Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder.

# Additional Context
It is unnecessary to add `Guard` code to PyTorch frontend.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
# Motivation
According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR.

# Design
Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder.

# Additional Context
It is unnecessary to add `Guard` code to PyTorch frontend.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
# Motivation
According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR.

# Design
Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder.

# Additional Context
It is unnecessary to add `Guard` code to PyTorch frontend.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
# Motivation
According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR.

# Design
Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder.

# Additional Context
It is unnecessary to add `Guard` code to PyTorch frontend.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for the refactors


Device exchangeDevice(Device d) const override {
TORCH_INTERNAL_ASSERT(d.is_xpu());
auto old_device_index = c10::xpu::exchange_device(d.index());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit (though it does not really matter

Suggested change
auto old_device_index = c10::xpu::exchange_device(d.index());
const auto old_device_index = c10::xpu::exchange_device(d.index());

Copy link
Collaborator Author

@guangyey guangyey Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, addressed these comments. Add the const keyword.

# Motivation
According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR.

# Design
Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder.

# Additional Context
It is unnecessary to add `Guard` code to PyTorch frontend.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
# Motivation
According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR.

# Design
Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder.

# Additional Context
It is unnecessary to add `Guard` code to PyTorch frontend.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@guangyey
Copy link
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 22, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

# Motivation
According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR.

# Design
Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder.

# Additional Context
It is unnecessary to add `Guard` code to PyTorch frontend.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
# Motivation
According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR.

# Design
Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder.

# Additional Context
It is unnecessary to add `Guard` code to PyTorch frontend.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
# Motivation
According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR.

# Design
Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder.

# Additional Context
It is unnecessary to add `Guard` code to PyTorch frontend.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@github-actions github-actions bot deleted the gh/guangyey/8/head branch March 24, 2024 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks intel This tag is for PR from Intel Merged open source release notes: xpu release notes category topic: new features topic category
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

8 participants