-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Intel GPU Runtime Upstreaming for Guard #118523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118523
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 014dfdb with merge base fff9d98 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
# Motivation According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR. # Design Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder. # Additional Context It is unnecessary to add `Guard` code to PyTorch frontend. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
# Motivation According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR. # Design Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder. # Additional Context It is unnecessary to add `Guard` code to PyTorch frontend. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
# Motivation According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR. # Design Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder. # Additional Context It is unnecessary to add `Guard` code to PyTorch frontend. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
# Motivation According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR. # Design Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder. # Additional Context It is unnecessary to add `Guard` code to PyTorch frontend. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
# Motivation According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR. # Design Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder. # Additional Context It is unnecessary to add `Guard` code to PyTorch frontend. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain why "XPUGuard is more efficient than DeviceGuard" and why this efficiency can not be retained while inheriting from DeviceGuard and implementing only more efficient methods, while not duplicating an implementation boilerplate like
void set_device(Device device) {
guard_.set_device(device);
}
# Motivation According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR. # Design Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder. # Additional Context It is unnecessary to add `Guard` code to PyTorch frontend. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
I think pytorch/c10/core/impl/InlineDeviceGuard.h Lines 116 to 129 in cfddfce
And If we directly use DeviceGuard , there is an additional device type indexing overhead and no inline function. see pytorch/c10/core/impl/DeviceGuardImplInterface.cpp Lines 10 to 14 in cfddfce
I can't ensure how much removing XPUGuard will impact performance. I will remove it first and talk about it later if any performance issues happen.
|
# Motivation According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR. # Design Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder. # Additional Context It is unnecessary to add `Guard` code to PyTorch frontend. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
# Motivation According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR. # Design Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder. # Additional Context It is unnecessary to add `Guard` code to PyTorch frontend. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
# Motivation According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR. # Design Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder. # Additional Context It is unnecessary to add `Guard` code to PyTorch frontend. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
# Motivation According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR. # Design Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder. # Additional Context It is unnecessary to add `Guard` code to PyTorch frontend. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you for the refactors
c10/xpu/impl/XPUGuardImpl.h
Outdated
|
||
Device exchangeDevice(Device d) const override { | ||
TORCH_INTERNAL_ASSERT(d.is_xpu()); | ||
auto old_device_index = c10::xpu::exchange_device(d.index()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit (though it does not really matter
auto old_device_index = c10::xpu::exchange_device(d.index()); | |
const auto old_device_index = c10::xpu::exchange_device(d.index()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, addressed these comments. Add the const keyword.
# Motivation According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR. # Design Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder. # Additional Context It is unnecessary to add `Guard` code to PyTorch frontend. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
# Motivation According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR. # Design Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder. # Additional Context It is unnecessary to add `Guard` code to PyTorch frontend. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
# Motivation According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR. # Design Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder. # Additional Context It is unnecessary to add `Guard` code to PyTorch frontend. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
# Motivation According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR. # Design Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder. # Additional Context It is unnecessary to add `Guard` code to PyTorch frontend. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
# Motivation According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR. # Design Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder. # Additional Context It is unnecessary to add `Guard` code to PyTorch frontend. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]
Stack from ghstack (oldest at bottom):
Motivation
According to [RFC] Intel GPU Runtime Upstreaming, the 5th runtime component we would like to upstream is
Guard
. We will cover device guard and stream guard in this PR.Design
Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction
c10::impl::DeviceGuardImplInterface
. In our design, we will introduce anXPUGuardImpl
class inherits fromc10::impl::DeviceGuardImplInterface
. RegisterXPUGuardImpl
to PyTorch after we implement the device switch management mechanism inXPUGuardImpl
. Besides, we will introduceXPUGuard
,OptionalXPUGuard
,XPUStreamGuard
, andOptionalXPUStreamGuard
. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder.Additional Context
It is unnecessary to add
Guard
code to PyTorch frontend.cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10