[dtensor] make replicate -> partial do division instead #110898

wanchaol · 2023-10-09T21:56:31Z

Stack from ghstack (oldest at bottom):

This PR switches the replicate -> partial to do division instead of
zeroing out other ranks, it preserve same numerics, but avoid the
per-rank behavior difference, and friendly to torch compile

This PR switches the replicate -> partial to do division instead of zeroing out other ranks, it preserve same numerics, but avoid the per-rank behavior difference, and friendly to torch compile [ghstack-poisoned]

pytorch-bot · 2023-10-09T21:56:34Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110898

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 75a68e2 with merge base 201d02e ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

This PR switches the replicate -> partial to do division instead of zeroing out other ranks, it preserve same numerics, but avoid the per-rank behavior difference, and friendly to torch compile ghstack-source-id: d2ae8a10843e79a3cbebf1c1b34aba7a7a3027b3 Pull Request resolved: #110898

This PR switches the replicate -> partial to do division instead of zeroing out other ranks, it preserve same numerics, but avoid the per-rank behavior difference, and friendly to torch compile [ghstack-poisoned]

fduwjj

Thanks for doing this and this definitely makes TP (bias of rowwise linear) less complicated for users to understand.

make random ops be a set instead of list Pull Request resolved: #110900 Approved by: https://github.com/fduwjj ghstack dependencies: #110898

) as titled Pull Request resolved: #111091 Approved by: https://github.com/awgu, https://github.com/wz337 ghstack dependencies: #110898, #110900

…rch#111091) as titled Pull Request resolved: pytorch#111091 Approved by: https://github.com/awgu, https://github.com/wz337 ghstack dependencies: pytorch#110898, pytorch#110900

[dtensor] make replicate -> partial do division instead

e1b0aab

This PR switches the replicate -> partial to do division instead of zeroing out other ranks, it preserve same numerics, but avoid the per-rank behavior difference, and friendly to torch compile [ghstack-poisoned]

wanchaol requested review from mrshenli, zhaojuanmao, rohan-varma, H-Huang, awgu, kwen2501, fegin, fduwjj, wz337, kiukchung and d4l3k as code owners October 9, 2023 21:56

wanchaol requested a review from bdhirsh October 9, 2023 21:58

wanchaol mentioned this pull request Oct 9, 2023

[dtensor] small change to refactor random ops #110900

Closed

Update on "[dtensor] make replicate -> partial do division instead"

75a68e2

This PR switches the replicate -> partial to do division instead of zeroing out other ranks, it preserve same numerics, but avoid the per-rank behavior difference, and friendly to torch compile [ghstack-poisoned]

wanchaol added the release notes: distributed (dtensor) release notes category label Oct 9, 2023

fduwjj approved these changes Oct 10, 2023

View reviewed changes

pytorchmergebot added the Merged label Oct 11, 2023

pytorchmergebot closed this in 657e8f2 Oct 11, 2023

pytorchmergebot pushed a commit that referenced this pull request Oct 11, 2023

[dtensor] small change to refactor random ops (#110900)

de1ca4a

make random ops be a set instead of list Pull Request resolved: #110900 Approved by: https://github.com/fduwjj ghstack dependencies: #110898

wanchaol mentioned this pull request Oct 11, 2023

[device mesh] only check when world size > num_devices per host #111091

Closed

wanchaol mentioned this pull request Oct 12, 2023

[dtensor] fix dtype/device conversion on nn.Modules #111162

Closed

facebook-github-bot deleted the gh/wanchaol/366/head branch October 15, 2023 14:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dtensor] make replicate -> partial do division instead #110898

[dtensor] make replicate -> partial do division instead #110898

wanchaol commented Oct 9, 2023 •

edited

pytorch-bot bot commented Oct 9, 2023 •

edited

fduwjj left a comment

[dtensor] make replicate -> partial do division instead #110898

[dtensor] make replicate -> partial do division instead #110898

Conversation

wanchaol commented Oct 9, 2023 • edited

pytorch-bot bot commented Oct 9, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110898

✅ No Failures

fduwjj left a comment

Choose a reason for hiding this comment

wanchaol commented Oct 9, 2023 •

edited

pytorch-bot bot commented Oct 9, 2023 •

edited