Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] why torchrec explicit make dp lookup as DistributedDataParallel instead of letting DistributedModelParallel handle it? #1829

Open
shijieliu opened this issue Mar 27, 2024 · 0 comments

Comments

@shijieliu
Copy link

Hi, team,

In the ShardedEmbeddingBagCollection, I found torchrec explicit make dp lookup as DistributedDataParallel(code here). And I also know inside DistributedModelParallel we have ddp wrapper to warp the non-sharded part of model such as mlp as ddp. And ddp wrapper is also using DistributedDataParallel.

So I am wondering why we choose to explictly wrapping dp lookup instead of letting ddp wrapper in DistributedModelParallel process dp lookup and mlp together? Is there any hidden restriction?

Since DistributedDataParallel is relying on .name_parameters()(code here), I am not sure if overriding .name_parameters() for ShardedEmbeddingBagCollection can enable ddp wrapper in DistributedModelParallel to process dp lookup?

@shijieliu shijieliu changed the title [Question] why we explicit make dp lookup as DistributedDataParallel instead of letting DistributedModelParallel handle it? [Question] why torchrec explicit make dp lookup as DistributedDataParallel instead of letting DistributedModelParallel handle it? Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant