Skip to content

Conversation

EikanWang
Copy link
Collaborator

Currently, the grouped_gemm_jagged_persistent in examples/grouped_gemm.py gets the number of workers by querying multi_processor_count of device properties for CUDA. XPU provides a similar API. This PR intends to extend the support to XPU.

@EikanWang EikanWang requested a review from Copilot October 3, 2025 17:59
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 3, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR extends XPU support in the Helion kernel by enabling torch.xpu._XpuDeviceProperties to work with the grouped GEMM functionality. The changes allow the kernel to query XPU device properties (specifically gpu_subslice_count) similar to how it currently queries CUDA device properties (multi_processor_count).

Key changes:

  • Added XPU device property support in the type propagation system
  • Updated grouped GEMM example to handle both CUDA and XPU devices
  • Modified test expectations to reflect the new conditional logic

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
examples/grouped_gemm.py Added XPU device detection and property querying logic, updated main function to prefer XPU when available
helion/_compiler/type_propagation.py Extended type propagation to support torch.xpu._XpuDeviceProperties alongside CUDA properties
test/test_examples.expected Updated test expectations to match the new conditional device property logic

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

device = A_packed.device
num_workers = torch.cuda.get_device_properties(device).multi_processor_count # type: ignore[arg-type]
if device.type == "xpu":
# TODO(EikanWang): gpu_subslice_count is an out-of-date term. we change update it to XeCore number.
Copy link

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected grammar in comment: 'we change update it' should be 'we should update it'.

Suggested change
# TODO(EikanWang): gpu_subslice_count is an out-of-date term. we change update it to XeCore number.
# TODO(EikanWang): gpu_subslice_count is an out-of-date term. we should update it to XeCore number.

Copilot uses AI. Check for mistakes.

Comment on lines 274 to 277
if type(value) in [
torch.cuda._CudaDeviceProperties,
torch.xpu._XpuDeviceProperties,
]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if type(value) in [
torch.cuda._CudaDeviceProperties,
torch.xpu._XpuDeviceProperties,
]:
if isinstance(value, (torch.cuda._CudaDeviceProperties,
torch.xpu._XpuDeviceProperties)):

why not this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@EikanWang EikanWang requested a review from oulgen October 3, 2025 18:35
@EikanWang EikanWang merged commit 53ed177 into pytorch:main Oct 3, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants