Skip to content

Conversation

yf225
Copy link
Contributor

@yf225 yf225 commented Oct 3, 2025

This fixes #755 and unblocks faster JSD kernel: #733.

@yf225 yf225 requested review from jansel and oulgen October 3, 2025 02:57
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 3, 2025
@yf225 yf225 changed the title Disable tensor_descriptor if range_num_stages > 1 Disable tensor_descriptor if range_num_stages > 1, to avoid CUDA misaligned address error Oct 3, 2025
@yf225 yf225 force-pushed the tensor_descriptor_alignment_fix_v3 branch from 32f110e to 06510db Compare October 3, 2025 02:58
Copy link
Contributor

@jansel jansel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do the opposite of this: disable range_num_stages if tensor_descriptor is used.

tensor_descriptor is a far more important config than range_num_stages

@yf225 yf225 force-pushed the tensor_descriptor_alignment_fix_v3 branch 5 times, most recently from 6641dee to 4bf782a Compare October 3, 2025 04:51
@yf225 yf225 requested a review from jansel October 3, 2025 04:51
@yf225 yf225 changed the title Disable tensor_descriptor if range_num_stages > 1, to avoid CUDA misaligned address error Set range_num_stages <= 1 if using tensor_descriptor, to avoid CUDA misaligned address error Oct 3, 2025
@yf225 yf225 force-pushed the tensor_descriptor_alignment_fix_v3 branch from 4bf782a to 862a242 Compare October 4, 2025 05:53
@yf225 yf225 force-pushed the tensor_descriptor_alignment_fix_v3 branch from 862a242 to dbe4916 Compare October 4, 2025 05:55
@yf225 yf225 merged commit 2d54358 into main Oct 4, 2025
13 checks passed
@yf225 yf225 mentioned this pull request Oct 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misaligned address in JSD kernel

3 participants