Skip to content

Conversation

@guangyey
Copy link
Collaborator

@guangyey guangyey commented Oct 27, 2025

Stack from ghstack (oldest at bottom):

Motivation

This PR intends to add ExpandableSegment struct, which is used to help support the expandable segment feature. I split it to a single PR to facilitate the code review.

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 27, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166299

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit f8699ce with merge base 3206677 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@guangyey guangyey added ciflow/trunk Trigger trunk jobs on your pull request release notes: xpu release notes category labels Oct 27, 2025
[ghstack-poisoned]
@etaf etaf changed the title Introduce ExpandableSegment for XPU [xpu][enable_feature] Introduce ExpandableSegment for XPU Oct 28, 2025
@guangyey guangyey changed the title [xpu][enable_feature] Introduce ExpandableSegment for XPU [xpu][feature] Introduce ExpandableSegment for XPU Oct 28, 2025
[ghstack-poisoned]
@guangyey guangyey requested a review from albanD October 29, 2025 16:38
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@EikanWang EikanWang requested a review from Copilot October 31, 2025 02:20
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces the ExpandableSegment struct for XPU, which manages virtual memory segments that can be dynamically expanded by mapping physical memory on demand. This is part of supporting an expandable segment feature for XPU memory allocation.

Key changes:

  • Added SegmentRange struct to represent contiguous virtual memory segments
  • Implemented ExpandableSegment class with map/unmap operations for virtual memory management
  • Integrated SYCL's virtual memory APIs for reservation, mapping, and access control

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

.get_info<sycl::info::device::global_mem_size>();
// The extra 1/8 allows flexibility for remapping or moving pages within the
// segment when unmapping earlier regions.
max_handles_ = numSegments(device_total * (1 + 1.0 / 8));
Copy link

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The magic number 1.0 / 8 is hardcoded without a named constant. Consider defining a named constant like VIRTUAL_MEM_OVERSUBSCRIPTION_FACTOR to improve code readability and make the purpose of this calculation clearer.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reasonable comment

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +274 to +275
size_t offset = p - ptr();
return offset / segment_size_;
Copy link

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential undefined behavior if p is less than ptr(), resulting in a negative offset that wraps around. Add a check to ensure p >= ptr() before computing the offset.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this comment is valid? We should add assert to confirm the p is always greater than or equal to ptr?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// If `p` lies exactly on a segment boundary, this is equal to segmentLeft(p).
// Otherwise, it rounds up and returns segmentLeft(p) + 1.
size_t segmentRight(char* p) const {
size_t offset = p - ptr();
Copy link

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as in segmentLeft(): potential undefined behavior if p < ptr(). Add validation to ensure p >= ptr() before computing the offset.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return numSegments(offset);
}

// Constructs a SegmentRange starting at [start, end) indices.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Constructs a SegmentRange starting at [start, end) indices.
// Constructs a SegmentRange in the range of [begin, end).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// bound, useful for [begin, end) style ranges.
// If `p` lies exactly on a segment boundary, this is equal to segmentLeft(p).
// Otherwise, it rounds up and returns segmentLeft(p) + 1.
size_t segmentRight(char* p) const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should define a specific type for segment index, instead of using size_t directly?
Otherwise, it's not easy to read, and prone to ambiguity?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion — using SegmentIndex could improve readability. However, it’s used in very few places, and size_t is the standard type for array/container indices in C++. Since it specifically represents an index into handles_, I think it’s fine to keep it as is for now and refactor it if needed during the code unification.

// Ensure handles_ vector is large enough to hold all segments.
while (end > handles_.size()) {
handles_.emplace_back(std::nullopt);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can do same thing without the while loop?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, use resize instead.

[ghstack-poisoned]
@guangyey guangyey requested a review from gujinghui October 31, 2025 11:24
@guangyey guangyey added the ciflow/xpu Run XPU CI tasks label Oct 31, 2025
[ghstack-poisoned]
[ghstack-poisoned]
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@pytorchmergebot
Copy link
Collaborator

Starting merge as part of PR stack under #166424

pytorchmergebot pushed a commit that referenced this pull request Nov 4, 2025
# Motivation
This PR intends to add expandable segment feature support on XPU. This will help
- Reduce memory fragmentation;
- Gradually map physical pages into virtual address space as needed.

# Additional Context
The traditional caching allocator frequently allocates and frees device memory blocks. However, over time, with varying tensor size, the device address space becomes fragmented. Even when there's enough total free memory, a lack of contiguous space can cause large allocations to fail.
The **expandable segment** feature addresses this by dynamically extending physical memory within a reserved virtual address range, reducing fragmentation and minimizing reallocation overhead.
The potential drawbacks are
- Virtual memory overhead;
- Potential page mapping overhead;
- Increased complexity.

Pull Request resolved: #166292
Approved by: https://github.com/albanD, https://github.com/EikanWang, https://github.com/gujinghui
ghstack dependencies: #166299
pytorchmergebot pushed a commit that referenced this pull request Nov 4, 2025
# Motivation
This PR introduces support for peer-to-peer (P2P) access between devices, including querying and enabling P2P connections between two devices.
It supports two categories of allocations:
- Regular allocations;
- Expandable segment allocations.

# Additional Context
The follow-up is that we should use this feature to optimize our copy kernel when P2P is supported.

Pull Request resolved: #166424
Approved by: https://github.com/gujinghui, https://github.com/albanD
ghstack dependencies: #166299, #166292
pytorchmergebot pushed a commit that referenced this pull request Nov 4, 2025
# Motivation
This PR aims to reuse some UT to validate the expandable segment feature.

# Additional Context
Currently, the failure is related to the internal track `GSD-11403`, we could get the fix when upgrading the driver to `ci-neo-master-034630` or greater
TODO: add test conv and gemm into this test case when upgrading the driver.

Pull Request resolved: #166495
Approved by: https://github.com/albanD, https://github.com/EikanWang, https://github.com/gujinghui
ghstack dependencies: #166299, #166292, #166424
pytorch-bot bot pushed a commit that referenced this pull request Nov 4, 2025
# Motivation
This PR intends to add `ExpandableSegment` struct, which is used to help support the expandable segment feature. I split it to a single PR to facilitate the code review.

Pull Request resolved: #166299
Approved by: https://github.com/EikanWang, https://github.com/albanD, https://github.com/gujinghui
pytorch-bot bot pushed a commit that referenced this pull request Nov 4, 2025
# Motivation
This PR intends to add expandable segment feature support on XPU. This will help
- Reduce memory fragmentation;
- Gradually map physical pages into virtual address space as needed.

# Additional Context
The traditional caching allocator frequently allocates and frees device memory blocks. However, over time, with varying tensor size, the device address space becomes fragmented. Even when there's enough total free memory, a lack of contiguous space can cause large allocations to fail.
The **expandable segment** feature addresses this by dynamically extending physical memory within a reserved virtual address range, reducing fragmentation and minimizing reallocation overhead.
The potential drawbacks are
- Virtual memory overhead;
- Potential page mapping overhead;
- Increased complexity.

Pull Request resolved: #166292
Approved by: https://github.com/albanD, https://github.com/EikanWang, https://github.com/gujinghui
ghstack dependencies: #166299
pytorch-bot bot pushed a commit that referenced this pull request Nov 4, 2025
# Motivation
This PR introduces support for peer-to-peer (P2P) access between devices, including querying and enabling P2P connections between two devices.
It supports two categories of allocations:
- Regular allocations;
- Expandable segment allocations.

# Additional Context
The follow-up is that we should use this feature to optimize our copy kernel when P2P is supported.

Pull Request resolved: #166424
Approved by: https://github.com/gujinghui, https://github.com/albanD
ghstack dependencies: #166299, #166292
pytorch-bot bot pushed a commit that referenced this pull request Nov 4, 2025
# Motivation
This PR aims to reuse some UT to validate the expandable segment feature.

# Additional Context
Currently, the failure is related to the internal track `GSD-11403`, we could get the fix when upgrading the driver to `ci-neo-master-034630` or greater
TODO: add test conv and gemm into this test case when upgrading the driver.

Pull Request resolved: #166495
Approved by: https://github.com/albanD, https://github.com/EikanWang, https://github.com/gujinghui
ghstack dependencies: #166299, #166292, #166424
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks Merged open source release notes: xpu release notes category

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

7 participants