[Question] why torchrec explicit make dp lookup as DistributedDataParallel instead of letting DistributedModelParallel handle it? #1829

shijieliu · 2024-03-27T05:55:58Z

Hi, team,

In the ShardedEmbeddingBagCollection, I found torchrec explicit make dp lookup as DistributedDataParallel(code here). And I also know inside DistributedModelParallel we have ddp wrapper to warp the non-sharded part of model such as mlp as ddp. And ddp wrapper is also using DistributedDataParallel.

So I am wondering why we choose to explictly wrapping dp lookup instead of letting ddp wrapper in DistributedModelParallel process dp lookup and mlp together? Is there any hidden restriction?

Since DistributedDataParallel is relying on .name_parameters()(code here), I am not sure if overriding .name_parameters() for ShardedEmbeddingBagCollection can enable ddp wrapper in DistributedModelParallel to process dp lookup?

The text was updated successfully, but these errors were encountered:

shijieliu changed the title ~~[Question] why we explicit make dp lookup as DistributedDataParallel instead of letting DistributedModelParallel handle it?~~ [Question] why torchrec explicit make dp lookup as DistributedDataParallel instead of letting DistributedModelParallel handle it? Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] why torchrec explicit make dp lookup as DistributedDataParallel instead of letting DistributedModelParallel handle it? #1829

[Question] why torchrec explicit make dp lookup as DistributedDataParallel instead of letting DistributedModelParallel handle it? #1829

shijieliu commented Mar 27, 2024

[Question] why torchrec explicit make dp lookup as DistributedDataParallel instead of letting DistributedModelParallel handle it? #1829

[Question] why torchrec explicit make dp lookup as DistributedDataParallel instead of letting DistributedModelParallel handle it? #1829

Comments

shijieliu commented Mar 27, 2024