[xpu][feature] Introduce ExpandableSegment for XPU #166299

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

guangyey wants to merge 8 commits into gh/guangyey/230/base from gh/guangyey/230/head

Collaborator

guangyey commented Oct 27, 2025 •

edited

Loading

Stack from ghstack (oldest at bottom):

Motivation

This PR intends to add ExpandableSegment struct, which is used to help support the expandable segment feature. I split it to a single PR to facilitate the code review.

guangyey requested review from EikanWang and gujinghui as code owners

October 27, 2025 11:53

pytorch-bot bot commented Oct 27, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166299

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCm failures during provisioning step due to network issues

✅ No Failures

As of commit f8699ce with merge base 3206677 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

guangyey mentioned this pull request

[xpu][feature] Support expandable segment feature for XPU #166292

Closed

guangyey added ciflow/trunk release notes: xpu labels

pytorchbot added the open source label


          Update

831a02a

[ghstack-poisoned]

etaf changed the title ~~Introduce ExpandableSegment for XPU~~ [xpu][enable_feature] Introduce ExpandableSegment for XPU

guangyey changed the title ~~[xpu][enable_feature] Introduce ExpandableSegment for XPU~~ [xpu][feature] Introduce ExpandableSegment for XPU

This was referenced Oct 28, 2025

[xpu][feature] Introduce PeerToPeerAccess API for XPU #166424

Closed

[xpu][test] Add UT for expandable segments #166495

Closed

guangyey added this to PyTorch Intel


          Update

01e522c

[ghstack-poisoned]

guangyey requested a review from albanD

October 29, 2025 16:38

guangyey added 3 commits

October 30, 2025 02:06


          Update

01f34b3

[ghstack-poisoned]


          Update

5f97824

[ghstack-poisoned]


          Update

7dde3f2

[ghstack-poisoned]

EikanWang requested a review from Copilot

October 31, 2025 02:20

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull Request Overview

This PR introduces the ExpandableSegment struct for XPU, which manages virtual memory segments that can be dynamically expanded by mapping physical memory on demand. This is part of supporting an expandable segment feature for XPU memory allocation.

Key changes:

Added SegmentRange struct to represent contiguous virtual memory segments
Implemented ExpandableSegment class with map/unmap operations for virtual memory management
Integrated SYCL's virtual memory APIs for reservation, mapping, and access control

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

c10/xpu/XPUCachingAllocator.cpp Outdated

    
                          .get_info<sycl::info::device::global_mem_size>();

                  // The extra 1/8 allows flexibility for remapping or moving pages within the

                  // segment when unmapping earlier regions.

                  max_handles_ = numSegments(device_total * (1 + 1.0 / 8));

Copilot AI Oct 31, 2025

[nitpick] The magic number 1.0 / 8 is hardcoded without a named constant. Consider defining a named constant like VIRTUAL_MEM_OVERSUBSCRIPTION_FACTOR to improve code readability and make the purpose of this calculation clearer.

Copilot uses AI. Check for mistakes.

Collaborator

gujinghui Oct 31, 2025

reasonable comment

Collaborator Author

guangyey Oct 31, 2025

Done

c10/xpu/XPUCachingAllocator.cpp

Comment on lines +274 to +275

    
                  size_t offset = p - ptr();

                  return offset / segment_size_;

Copilot AI Oct 31, 2025

Potential undefined behavior if p is less than ptr(), resulting in a negative offset that wraps around. Add a check to ensure p >= ptr() before computing the offset.

Copilot uses AI. Check for mistakes.

Collaborator

gujinghui Oct 31, 2025

Looks like this comment is valid? We should add assert to confirm the p is always greater than or equal to ptr?

Collaborator Author

guangyey Oct 31, 2025

Done

c10/xpu/XPUCachingAllocator.cpp

    
                // If `p` lies exactly on a segment boundary, this is equal to segmentLeft(p).

                // Otherwise, it rounds up and returns segmentLeft(p) + 1.

                size_t segmentRight(char* p) const {

                  size_t offset = p - ptr();

Copilot AI Oct 31, 2025

Same issue as in segmentLeft(): potential undefined behavior if p < ptr(). Add validation to ensure p >= ptr() before computing the offset.

Copilot uses AI. Check for mistakes.

Collaborator

gujinghui Oct 31, 2025

ditto

Collaborator Author

guangyey Oct 31, 2025

done

gujinghui reviewed

View reviewed changes

c10/xpu/XPUCachingAllocator.cpp Outdated

    
                  return numSegments(offset);

                }

                // Constructs a SegmentRange starting at [start, end) indices.

Collaborator

gujinghui Oct 31, 2025

Suggested change

      
              // Constructs a SegmentRange starting at [start, end) indices.
          
              // Constructs a SegmentRange in the range of [begin, end).

Collaborator Author

guangyey Oct 31, 2025

Done

gujinghui reviewed

View reviewed changes

c10/xpu/XPUCachingAllocator.cpp

    
                // bound, useful for [begin, end) style ranges.

                // If `p` lies exactly on a segment boundary, this is equal to segmentLeft(p).

                // Otherwise, it rounds up and returns segmentLeft(p) + 1.

                size_t segmentRight(char* p) const {

Collaborator

gujinghui Oct 31, 2025

we should define a specific type for segment index, instead of using size_t directly?
Otherwise, it's not easy to read, and prone to ambiguity?

Collaborator Author

guangyey Oct 31, 2025

Good suggestion — using SegmentIndex could improve readability. However, it’s used in very few places, and size_t is the standard type for array/container indices in C++. Since it specifically represents an index into handles_, I think it’s fine to keep it as is for now and refactor it if needed during the code unification.

gujinghui reviewed

View reviewed changes

c10/xpu/XPUCachingAllocator.cpp

    
                  // Ensure handles_ vector is large enough to hold all segments.

                  while (end > handles_.size()) {

                    handles_.emplace_back(std::nullopt);

                  }

Collaborator

gujinghui Oct 31, 2025

we can do same thing without the while loop?

Collaborator Author

guangyey Oct 31, 2025

Good idea, use resize instead.


          Update

fcd2c20

[ghstack-poisoned]

guangyey requested a review from gujinghui

October 31, 2025 11:24

guangyey added the ciflow/xpu label

guangyey added 2 commits

October 31, 2025 14:00


          Update

7c6ae5c

[ghstack-poisoned]


          Update

f8699ce

[ghstack-poisoned]

EikanWang approved these changes

View reviewed changes

albanD approved these changes

View reviewed changes

Collaborator

albanD left a comment

Sure

Collaborator

pytorchmergebot commented Nov 4, 2025

Starting merge as part of PR stack under #166424

gujinghui approved these changes

View reviewed changes

pytorchmergebot closed this in

875b18d

pytorchmergebot added the Merged label

pytorchmergebot pushed a commit that referenced this pull request


          [xpu][feature] Support expandable segment feature for XPU (#166292)

167e64b

# Motivation
This PR intends to add expandable segment feature support on XPU. This will help
- Reduce memory fragmentation;
- Gradually map physical pages into virtual address space as needed.

# Additional Context
The traditional caching allocator frequently allocates and frees device memory blocks. However, over time, with varying tensor size, the device address space becomes fragmented. Even when there's enough total free memory, a lack of contiguous space can cause large allocations to fail.
The **expandable segment** feature addresses this by dynamically extending physical memory within a reserved virtual address range, reducing fragmentation and minimizing reallocation overhead.
The potential drawbacks are
- Virtual memory overhead;
- Potential page mapping overhead;
- Increased complexity.

Pull Request resolved: #166292
Approved by: https://github.com/albanD, https://github.com/EikanWang, https://github.com/gujinghui
ghstack dependencies: #166299

pytorchmergebot pushed a commit that referenced this pull request


          [xpu][feature] Introduce PeerToPeerAccess API for XPU (#166424)

f70faf2

# Motivation
This PR introduces support for peer-to-peer (P2P) access between devices, including querying and enabling P2P connections between two devices.
It supports two categories of allocations:
- Regular allocations;
- Expandable segment allocations.

# Additional Context
The follow-up is that we should use this feature to optimize our copy kernel when P2P is supported.

Pull Request resolved: #166424
Approved by: https://github.com/gujinghui, https://github.com/albanD
ghstack dependencies: #166299, #166292

github-project-automation bot moved this to Done in PyTorch Intel

pytorchmergebot pushed a commit that referenced this pull request


          [xpu][test] Add UT for expandable segments (#166495)

8fff7e3

# Motivation
This PR aims to reuse some UT to validate the expandable segment feature.

# Additional Context
Currently, the failure is related to the internal track `GSD-11403`, we could get the fix when upgrading the driver to `ci-neo-master-034630` or greater
TODO: add test conv and gemm into this test case when upgrading the driver.

Pull Request resolved: #166495
Approved by: https://github.com/albanD, https://github.com/EikanWang, https://github.com/gujinghui
ghstack dependencies: #166299, #166292, #166424

pytorch-bot bot pushed a commit that referenced this pull request


          [xpu][feature] Introduce ExpandableSegment for XPU (#166299)

2520aa1

# Motivation
This PR intends to add `ExpandableSegment` struct, which is used to help support the expandable segment feature. I split it to a single PR to facilitate the code review.

Pull Request resolved: #166299
Approved by: https://github.com/EikanWang, https://github.com/albanD, https://github.com/gujinghui

pytorch-bot bot pushed a commit that referenced this pull request


          [xpu][feature] Support expandable segment feature for XPU (#166292)

f8f4b76

# Motivation
This PR intends to add expandable segment feature support on XPU. This will help
- Reduce memory fragmentation;
- Gradually map physical pages into virtual address space as needed.

# Additional Context
The traditional caching allocator frequently allocates and frees device memory blocks. However, over time, with varying tensor size, the device address space becomes fragmented. Even when there's enough total free memory, a lack of contiguous space can cause large allocations to fail.
The **expandable segment** feature addresses this by dynamically extending physical memory within a reserved virtual address range, reducing fragmentation and minimizing reallocation overhead.
The potential drawbacks are
- Virtual memory overhead;
- Potential page mapping overhead;
- Increased complexity.

Pull Request resolved: #166292
Approved by: https://github.com/albanD, https://github.com/EikanWang, https://github.com/gujinghui
ghstack dependencies: #166299

pytorch-bot bot pushed a commit that referenced this pull request


          [xpu][feature] Introduce PeerToPeerAccess API for XPU (#166424)

e345db2

# Motivation
This PR introduces support for peer-to-peer (P2P) access between devices, including querying and enabling P2P connections between two devices.
It supports two categories of allocations:
- Regular allocations;
- Expandable segment allocations.

# Additional Context
The follow-up is that we should use this feature to optimize our copy kernel when P2P is supported.

Pull Request resolved: #166424
Approved by: https://github.com/gujinghui, https://github.com/albanD
ghstack dependencies: #166299, #166292

pytorch-bot bot pushed a commit that referenced this pull request


          [xpu][test] Add UT for expandable segments (#166495)

59af98f

# Motivation
This PR aims to reuse some UT to validate the expandable segment feature.

# Additional Context
Currently, the failure is related to the internal track `GSD-11403`, we could get the fix when upgrading the driver to `ci-neo-master-034630` or greater
TODO: add test conv and gemm into this test case when upgrading the driver.

Pull Request resolved: #166495
Approved by: https://github.com/albanD, https://github.com/EikanWang, https://github.com/gujinghui
ghstack dependencies: #166299, #166292, #166424

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk ciflow/xpu Merged open source release notes: xpu