Intel GPU Runtime Upstreaming for Guard #118523

guangyey · 2024-01-29T14:36:03Z

Stack from ghstack (oldest at bottom):

Motivation

According to [RFC] Intel GPU Runtime Upstreaming, the 5th runtime component we would like to upstream is Guard. We will cover device guard and stream guard in this PR.

Design

Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction c10::impl::DeviceGuardImplInterface. In our design, we will introduce an XPUGuardImpl class inherits from c10::impl::DeviceGuardImplInterface. Register XPUGuardImpl to PyTorch after we implement the device switch management mechanism in XPUGuardImpl. Besides, we will introduce XPUGuard, OptionalXPUGuard, XPUStreamGuard, and OptionalXPUStreamGuard. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder.

Additional Context

It is unnecessary to add Guard code to PyTorch frontend.

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

pytorch-bot · 2024-01-29T14:36:06Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118523

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 014dfdb with merge base fff9d98 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

# Motivation According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR. # Design Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder. # Additional Context It is unnecessary to add `Guard` code to PyTorch frontend. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

malfet

Please explain why "XPUGuard is more efficient than DeviceGuard" and why this efficiency can not be retained while inheriting from DeviceGuard and implementing only more efficient methods, while not duplicating an implementation boilerplate like

void set_device(Device device) {
    guard_.set_device(device);
  }

# Motivation According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR. # Design Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder. # Additional Context It is unnecessary to add `Guard` code to PyTorch frontend. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

guangyey · 2024-02-21T17:15:37Z

Please explain why "XPUGuard is more efficient than DeviceGuard" and why this efficiency can not be retained while inheriting from DeviceGuard and implementing only more efficient methods, while not duplicating an implementation boilerplate like
void set_device(Device device) {
    guard_.set_device(device);
  }

I think XPUGuard is implemented by inlineDeviceGuard and XPUGuardImpl. The former is a template class. It gives XPUGuard the opportunity to inline these functions. see

pytorch/c10/core/impl/InlineDeviceGuard.h

Lines 116 to 129 in cfddfce

    
           /// Sets the device to the given one. 
        
           template < 
        
               typename U = T, 
        
               typename std::enable_if_t<!std::is_same_v<U, VirtualGuardImpl>, int> = 0> 
        
           void set_device(at::Device device) { 
        
             AT_ASSERT( 
        
                 (U::static_type == DeviceType::HIP && device.is_cuda()) || 
        
                 device.type() == U::static_type); 
        
             auto index = device.index(); 
        
             if (index == -1) 
        
               return; 
        
             impl_.setDevice(device); 
        
             current_device_ = device; 
        
           }

And If we directly use DeviceGuard, there is an additional device type indexing overhead and no inline function. see

pytorch/c10/core/impl/DeviceGuardImplInterface.cpp

Lines 10 to 14 in cfddfce

    
           DeviceGuardImplRegistrar::DeviceGuardImplRegistrar( 
        
               DeviceType type, 
        
               const DeviceGuardImplInterface* impl) { 
        
             device_guard_impl_registry[static_cast<size_t>(type)].store(impl); 
        
           }

I can't ensure how much removing XPUGuard will impact performance. I will remove it first and talk about it later if any performance issues happen.

# Motivation According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR. # Design Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder. # Additional Context It is unnecessary to add `Guard` code to PyTorch frontend. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

malfet

LGTM, thank you for the refactors

malfet · 2024-02-21T23:39:31Z

c10/xpu/impl/XPUGuardImpl.h

+
+  Device exchangeDevice(Device d) const override {
+    TORCH_INTERNAL_ASSERT(d.is_xpu());
+    auto old_device_index = c10::xpu::exchange_device(d.index());


Nit (though it does not really matter

Suggested change

auto old_device_index = c10::xpu::exchange_device(d.index());

const auto old_device_index = c10::xpu::exchange_device(d.index());

Thanks, addressed these comments. Add the const keyword.

# Motivation According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR. # Design Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder. # Additional Context It is unnecessary to add `Guard` code to PyTorch frontend. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

guangyey · 2024-02-22T11:06:54Z

@pytorchbot merge

pytorchmergebot · 2024-02-22T11:08:59Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

# Motivation According to [[RFC] Intel GPU Runtime Upstreaming](#114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR. # Design Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder. # Additional Context It is unnecessary to add `Guard` code to PyTorch frontend. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

guangyey marked this pull request as draft January 29, 2024 14:36

guangyey changed the title ~~Intel GPU Runtime Upstreaming for Guard~~ [WIP] Intel GPU Runtime Upstreaming for Guard Jan 29, 2024

pytorchbot added the open source label Jan 29, 2024

guangyey mentioned this pull request Jan 29, 2024

[1/2] Intel GPU Runtime Upstreaming for Generator #118528

Closed

Intel GPU Runtime Upstreaming for Guard

b8fe3e0

[ghstack-poisoned]

guangyey mentioned this pull request Jan 30, 2024

[2/2] Intel GPU Runtime Upstreaming for Generator #118613

Closed

guangyey added topic: new features topic category intel This tag is for PR from Intel ciflow/xpu Run XPU CI tasks release notes: xpu release notes category labels Jan 30, 2024

Update on "[WIP] Intel GPU Runtime Upstreaming for Guard"

a4c292b

[ghstack-poisoned]

guangyey requested review from EikanWang, jgong5 and gujinghui January 30, 2024 13:52

guangyey changed the title ~~[WIP] Intel GPU Runtime Upstreaming for Guard~~ Intel GPU Runtime Upstreaming for Guard Jan 30, 2024

guangyey marked this pull request as ready for review January 30, 2024 15:30

guangyey added 6 commits January 30, 2024 15:41

Update on "[WIP] Intel GPU Runtime Upstreaming for Guard"

a99d54c

[ghstack-poisoned]

Update on "[WIP] Intel GPU Runtime Upstreaming for Guard"

4515213

[ghstack-poisoned]

Update on "[WIP] Intel GPU Runtime Upstreaming for Guard"

3f01ef0

[ghstack-poisoned]

Update on "[WIP] Intel GPU Runtime Upstreaming for Guard"

882b3b6

[ghstack-poisoned]

Update on "[WIP] Intel GPU Runtime Upstreaming for Guard"

84797de

[ghstack-poisoned]

Update on "[WIP] Intel GPU Runtime Upstreaming for Guard"

9f2b05c

[ghstack-poisoned]

guangyey added 3 commits February 10, 2024 11:06

guangyey requested a review from albanD February 13, 2024 02:16

guangyey added 2 commits February 13, 2024 09:20

malfet requested changes Feb 14, 2024

View reviewed changes

guangyey mentioned this pull request Feb 21, 2024

fix xpu build failure #120315

Closed

guangyey requested a review from malfet February 21, 2024 17:15

guangyey added 4 commits February 21, 2024 17:47

malfet approved these changes Feb 21, 2024

View reviewed changes

guangyey added 2 commits February 22, 2024 00:33

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 22, 2024

pytorchmergebot added the merging label Feb 22, 2024

pytorchmergebot added the Merged label Feb 22, 2024

pytorchmergebot closed this in c2b2e57 Feb 22, 2024

pytorchmergebot removed the merging label Feb 22, 2024

guangyey added 3 commits February 22, 2024 17:53

github-actions bot deleted the gh/guangyey/8/head branch March 24, 2024 01:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Intel GPU Runtime Upstreaming for Guard #118523

Intel GPU Runtime Upstreaming for Guard #118523

Uh oh!

guangyey commented Jan 29, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jan 29, 2024 •

edited

Loading

Uh oh!

malfet left a comment •

edited

Loading

Uh oh!

guangyey commented Feb 21, 2024

Uh oh!

malfet left a comment

Uh oh!

malfet Feb 21, 2024

Uh oh!

guangyey Feb 22, 2024 •

edited

Loading

Uh oh!

guangyey commented Feb 22, 2024

Uh oh!

pytorchmergebot commented Feb 22, 2024

Uh oh!

Uh oh!

	auto old_device_index = c10::xpu::exchange_device(d.index());
	const auto old_device_index = c10::xpu::exchange_device(d.index());

Intel GPU Runtime Upstreaming for Guard #118523

Intel GPU Runtime Upstreaming for Guard #118523

Uh oh!

Conversation

guangyey commented Jan 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Design

Additional Context

Uh oh!

pytorch-bot bot commented Jan 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118523

✅ No Failures

Uh oh!

malfet left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guangyey commented Feb 21, 2024

Uh oh!

malfet left a comment

Choose a reason for hiding this comment

Uh oh!

malfet Feb 21, 2024

Choose a reason for hiding this comment

Uh oh!

guangyey Feb 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guangyey commented Feb 22, 2024

Uh oh!

pytorchmergebot commented Feb 22, 2024

Merge started

Uh oh!

Uh oh!

guangyey commented Jan 29, 2024 •

edited

Loading

pytorch-bot bot commented Jan 29, 2024 •

edited

Loading

malfet left a comment •

edited

Loading

guangyey Feb 22, 2024 •

edited

Loading