Skip to content

Conversation

wangkuiyi
Copy link
Contributor

@wangkuiyi wangkuiyi commented May 5, 2025

Fixes #152848

I didn't fix the bug earlier because the example script didn't exhaustively present all combinations of 1D/2D tensor, 1D/2D mesh, and all possible sharding specs. Therefore, in this PR, I enriched the example script to cover all possible combinations.

f

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k

Copy link

pytorch-bot bot commented May 5, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152871

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 1630ad1 with merge base a769114 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label May 5, 2025
@wangkuiyi wangkuiyi marked this pull request as draft May 5, 2025 21:54
@wangkuiyi
Copy link
Contributor Author

@pytorchbot label "release notes: distributed (dtensor)"

@wangkuiyi wangkuiyi marked this pull request as ready for review May 6, 2025 05:39
@wangkuiyi wangkuiyi changed the title [WIP] Fix bug visualizing 1D Tensor using rich Fix bug visualizing 1D Tensor using rich May 6, 2025
dtensor_height = shape[0] if len(shape) > 0 else 1
dtensor_width = shape[1] if len(shape) > 0 else shape[0]
dtensor_height = shape[0]
dtensor_width = shape[1] if len(shape) == 2 else 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I am fixing a bug that I created in the previous PR. When the tensor is 1D, consider it a column vector.

)
for device_index, (shape, offset) in device_shard_shape_and_offsets.items()
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I am fixing another bug. When the tensor is 1D, the shape and offset of each shard is a 1-tuple. As we want to draw them in the 2D screen space, we need to extend each 1-tuple into a 2-tuple. In particular, expand the width of shard to 1, and extend the offset on the x-axis of the screen to be 0.

"""
To run the example, use the following command:
torchrun --standalone --nnodes=1 --nproc-per-node=4 visualize_sharding_example.py
TERM=xterm-256color torchrun --nproc-per-node=4 visualize_sharding_example.py
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The environment variable XTERM controls the terminal's coloring capability. Change the default value to xterm-256color to release the power of your terminal app.

Copy link
Collaborator

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the fix!

@wanchaol
Copy link
Collaborator

wanchaol commented May 7, 2025

@pytorchbot merge -f "inductor lint error not related to the PR"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@wangkuiyi
Copy link
Contributor Author

follow up : #152027

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged oncall: distributed Add this issue/PR to distributed oncall triage queue open source release notes: distributed (dtensor) release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failed visualized 1D DTensor
4 participants