Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DTensor] High CPU memory usage/slowdown for aten._foreach_addcdiv_.ScalarList #123457

Closed
awgu opened this issue Apr 5, 2024 · 1 comment
Closed
Labels
module: dtensor distributed tensor tag

Comments

@awgu
Copy link
Contributor

awgu commented Apr 5, 2024

Repro script (internal only): P1206409663

For aten._foreach_addcdiv_.ScalarList:

Using 291 tensors
NCCL version 2.19.3+cuda12.0
Running optimizer!
Ran optimizer in 25.393 seconds
Peak start: 2956176
Peak end: 81930936

For aten._foreach_addcdiv_.Scalar:

Using 291 tensors
NCCL version 2.19.3+cuda12.0
Running optimizer!
Ran optimizer in 0.047 seconds
Peak start: 2951704
Peak end: 2955256

Some observations:

  • The issue is specific to the ScalarList variant.
  • The issue is proportional to the tensor size (if we instead use tiny tensors, there is no issue).
  • The issue cannot be repro'd with TwoTensor instead of DTensor (i.e. it is not common to all wrapper subclasses).

cc @wanchaol @XilunWu @tianyu-l @chauhang

@awgu
Copy link
Contributor Author

awgu commented Apr 5, 2024

Replace with single-GPU repro: #123461

@awgu awgu closed this as completed Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: dtensor distributed tensor tag
Projects
None yet
Development

No branches or pull requests

1 participant