Skip to content

Conversation

wz337
Copy link
Contributor

@wz337 wz337 commented Oct 2, 2024

Stack from ghstack (oldest at bottom):

To unblock Llama 3.2 vision's use case to resize positional embeddings for fine-tuning. Context in workplace post.

cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wconstab @d4l3k @c-p-i-o @tianyu-l

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Oct 2, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137201

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4abdf2a with merge base 8962610 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue labels Oct 2, 2024
[ghstack-poisoned]
@wz337 wz337 added module: dtensor distributed tensor tag topic: not user facing topic category labels Oct 2, 2024
@wz337 wz337 changed the title add interpoloate [DTensor] Register replication strategy for a few upsampling interpolate ops Oct 2, 2024
@wz337 wz337 requested review from XilunWu and weifengpy October 2, 2024 18:48
Copy link
Contributor

@XilunWu XilunWu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replicate strategy should be fine for now.

[ghstack-poisoned]
wz337 added a commit that referenced this pull request Oct 2, 2024
ghstack-source-id: e7dd90a
Pull Request resolved: #137201
@wz337
Copy link
Contributor Author

wz337 commented Oct 2, 2024

Replicate strategy should be fine for now.

Ye. I look at the ops and it doesn't seem we can improve sharding strategy for these ones so it would always require redistributing to replicate.

@awgu
Copy link
Collaborator

awgu commented Oct 2, 2024

How would the replicated DTensor get converted back to sharded in the state dict load flow?

@wz337
Copy link
Contributor Author

wz337 commented Oct 2, 2024

How would the replicated DTensor get converted back to sharded in the state dict load flow?

Ah. Thanks for raising it. We won't be able to define the layout for the output, since it is determined by the next op.

In order to shard it back, users would have to do a dtensor.redistribute(placements=[Shard(0)]). This way there is no communication compared with using distribute_tensor().

@wz337
Copy link
Contributor Author

wz337 commented Oct 2, 2024

@pytorchmergebot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 2, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions github-actions bot deleted the gh/wz337/35/head branch November 6, 2024 02:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged module: dtensor distributed tensor tag oncall: distributed Add this issue/PR to distributed oncall triage queue topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants