Skip to content

Conversation

@YichengDWu
Copy link
Contributor

Closes #5471

Signed-off-by: Yicheng Wu <yichengdwu@outlook.com>
Copilot AI review requested due to automatic review settings October 21, 2025 03:45
@YichengDWu YichengDWu requested a review from a team as a code owner October 21, 2025 03:45
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds Apple GPU support for the syncwarp function by implementing a SIMDGROUP barrier. The implementation provides execution synchronization for Apple GPUs, complementing the existing NVIDIA and AMD GPU support.

  • Adds Apple GPU-specific barrier implementation using the llvm.air.simdgroup.barrier intrinsic
  • Updates documentation to clarify Apple GPU behavior and limitations
  • Notes that lane masks are not supported on Apple GPUs and only execution synchronization (not memory ordering) is provided

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@YichengDWu YichengDWu changed the title sync: add Apple SIMDGROUP barrier implementation for syncwarp [stdlib] Add Apple SIMDGROUP barrier implementation for syncwarp Oct 21, 2025
@JoeLoser JoeLoser requested review from lsh and npanchen October 21, 2025 11:48
Copy link
Contributor

@npanchen npanchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this!
Can you please add simple test for syncward similar to test_barrier.mojo ?

Also, since nightly compiler by default mangles AIR intrinsics, I'll need to update compiler internals not do to this for air.simdgroup.barrier. I'll share an update when it will be safe to merge

Co-authored-by: Nikolay Panchenko <npanchen@modular.com>
@YichengDWu
Copy link
Contributor Author

Thanks! A deterministic syncwarp-only test (no memory fence) relies on a warp-wide register op like shuffle, otherwise the test either becomes flaky or ends up testing a different primitive (e.g., threadgroup barrier + fence). I'll follow up with the test once shuffle lands.

@npanchen
Copy link
Contributor

Thanks for adding this! Can you please add simple test for syncward similar to test_barrier.mojo ?

Also, since nightly compiler by default mangles AIR intrinsics, I'll need to update compiler internals not do to this for air.simdgroup.barrier. I'll share an update when it will be safe to merge

Alright. I've merged compiler changes to avoid mangling of this intrinsic. After remaining things are done it's good for me to be merged

@JoeLoser
Copy link
Collaborator

Thanks for adding this! Can you please add simple test for syncward similar to test_barrier.mojo ?
Also, since nightly compiler by default mangles AIR intrinsics, I'll need to update compiler internals not do to this for air.simdgroup.barrier. I'll share an update when it will be safe to merge

Alright. I've merged compiler changes to avoid mangling of this intrinsic. After remaining things are done it's good for me to be merged

@YichengDWu I just released a new nightly release that has Kolya's change, can you try it out please? Mojo compiler won't mangle this AIR intrinsic now.

@BradLarson
Copy link
Collaborator

I tried this out locally, and it builds correctly on the latest nightly. We unfortunately don't have any direct tests of syncwarp(), but I tried a manual test of it and the intrinsic did compile and the code ran correctly as far as I can tell.

@BradLarson
Copy link
Collaborator

!sync

@modularbot modularbot added the imported-internally Signals that a given pull request has been imported internally. label Oct 22, 2025
@YichengDWu
Copy link
Contributor Author

Glad to hear that! I just pulled the latest nightly and can confirm syncwarp() now builds and runs on my Mac.

@modularbot
Copy link
Collaborator

✅🟣 This contribution has been merged 🟣✅

Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the main branch during the next Mojo nightly release, typically within the next 24-48 hours.

We use Copybara to merge external contributions, click here to learn more.

@modularbot modularbot added the merged-internally Indicates that this pull request has been merged internally label Oct 22, 2025
@JoeLoser
Copy link
Collaborator

Glad to hear that! I just pulled the latest nightly and can confirm syncwarp() now builds and runs on my Mac.

Nice! We just synced this in and landed it internally: it'll be available in tomorrow's nightly release. Great contribution!

If you're interested in more Apple Metal work, #5472 is an interesting one and would unlock several of the GPU puzzles to work on Apple Metal from https://github.com/modular/mojo-gpu-puzzles.

@modularbot modularbot added the merged-externally Merged externally in public mojo repo label Oct 23, 2025
@modularbot
Copy link
Collaborator

Landed in 98447e5! Thank you for your contribution 🎉

@github-actions github-actions bot locked and limited conversation to collaborators Oct 23, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

imported-internally Signals that a given pull request has been imported internally. merged-externally Merged externally in public mojo repo merged-internally Indicates that this pull request has been merged internally

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Support syncwarp on Apple GPU

5 participants