Skip to content

[OpenReg][distributed] Refactor OCCL backend registration#183257

Closed
KarhouTam wants to merge 2 commits into
pytorch:mainfrom
KarhouTam:openreg/distributed-init-fix
Closed

[OpenReg][distributed] Refactor OCCL backend registration#183257
KarhouTam wants to merge 2 commits into
pytorch:mainfrom
KarhouTam:openreg/distributed-init-fix

Conversation

@KarhouTam
Copy link
Copy Markdown
Collaborator

Summary

Refactors OCCL distributed backend registration to address code review feedback from PR #171250.

Changes

  1. Remove unnecessary library dependency: torch_python_library is no longer linked to the backend library - pybind11 bindings are handled by the separate torch_bindings library.

  2. Separate distributed bindings: ProcessGroupOCCL bindings moved from inline in Module.cpp to dedicated ProcessGroupInit.cpp/hpp files.

  3. Remove factory function: _createProcessGroupOCCL factory function removed - now exposing ProcessGroupOCCL constructor directly via pybind11.

  4. Clean up imports: Removed unnecessary <pybind11/chrono.h> from Module.cpp and <c10/util/intrusive_ptr.h> from ProcessGroupInit.cpp.

cc @fffrog

Separate distributed bindings from device module and remove unnecessary
factory API. Address review comments from PR pytorch#171250:

- Remove torch_python_library link from backend CMake (not needed)
- Move ProcessGroupOCCL bindings to dedicated ProcessGroupInit.cpp
- Remove _createProcessGroupOCCL factory, expose constructor directly
- Clean up unnecessary imports
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 11, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/183257

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit dc9f630 with merge base f8b2b60 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@KarhouTam KarhouTam added the ciflow/trunk Trigger trunk jobs on your pull request label May 11, 2026
Copy link
Copy Markdown
Collaborator

@fffrog fffrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGMT, thank you.

@fffrog
Copy link
Copy Markdown
Collaborator

fffrog commented May 11, 2026

@pytorchbot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request May 13, 2026
…3257)

## Summary
Refactors OCCL distributed backend registration to address code review feedback from PR pytorch#171250.

## Changes
1. **Remove unnecessary library dependency**: `torch_python_library` is no longer linked to the backend library - pybind11 bindings are handled by the separate `torch_bindings` library.

2. **Separate distributed bindings**: ProcessGroupOCCL bindings moved from inline in `Module.cpp` to dedicated `ProcessGroupInit.cpp/hpp` files.

3. **Remove factory function**: `_createProcessGroupOCCL` factory function removed - now exposing `ProcessGroupOCCL` constructor directly via pybind11.

4. **Clean up imports**: Removed unnecessary `<pybind11/chrono.h>` from Module.cpp and `<c10/util/intrusive_ptr.h>` from ProcessGroupInit.cpp.

Pull Request resolved: pytorch#183257
Approved by: https://github.com/fffrog
Alokksinha00 pushed a commit to Alokksinha00/pytorch that referenced this pull request May 15, 2026
…3257)

## Summary
Refactors OCCL distributed backend registration to address code review feedback from PR pytorch#171250.

## Changes
1. **Remove unnecessary library dependency**: `torch_python_library` is no longer linked to the backend library - pybind11 bindings are handled by the separate `torch_bindings` library.

2. **Separate distributed bindings**: ProcessGroupOCCL bindings moved from inline in `Module.cpp` to dedicated `ProcessGroupInit.cpp/hpp` files.

3. **Remove factory function**: `_createProcessGroupOCCL` factory function removed - now exposing `ProcessGroupOCCL` constructor directly via pybind11.

4. **Clean up imports**: Removed unnecessary `<pybind11/chrono.h>` from Module.cpp and `<c10/util/intrusive_ptr.h>` from ProcessGroupInit.cpp.

Pull Request resolved: pytorch#183257
Approved by: https://github.com/fffrog
mansiag05 added a commit to mansiag05/pytorch that referenced this pull request May 17, 2026
The OCCL backend registration was refactored: the standalone
createProcessGroupOCCL factory function was removed, pybind11
bindings moved from Module.cpp to a dedicated init.cpp, and Python
registration now imports ProcessGroupOCCL directly. Update the doc
to match.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged module: openreg open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants