Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Givewait_tensor() a schema that reflects its side effect #126773

Open
bdhirsh opened this issue May 21, 2024 · 4 comments
Open

Givewait_tensor() a schema that reflects its side effect #126773

bdhirsh opened this issue May 21, 2024 · 4 comments
Assignees
Labels
oncall: distributed Add this issue/PR to distributed oncall triage queue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@bdhirsh
Copy link
Contributor

bdhirsh commented May 21, 2024

wait_tensor currently has a schema that advertises as functional/pure (it takes in a tensor, returns a fresh tensor) - but this is not really true:

(1) it has the side effect of waiting for a pending collective to finish
(2) it returns its input directly : https://github.com/pytorch/pytorch/blob/main/torch/csrc/distributed/c10d/Functional.cpp#L307

One reason this is error prone is that e.g. if you use this API as if it had a side effect (and discard the result, e.g. the bug here), then we will DCE it during tracing (unless we land this PR).

The more proper thing to do would be to: either:

(1) tell functionalization to explicitly not functionalize the op.

(2) update the schema of this op to mark it as mutating its input (to reflect the side effect), and add a functional variant of the op so we can functionalize it, and "de-functionalize" it in inductor (although to be fair, this all amounts to a giant no-op).

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k

@bdhirsh bdhirsh added the oncall: distributed Add this issue/PR to distributed oncall triage queue label May 21, 2024
@zou3519
Copy link
Contributor

zou3519 commented May 21, 2024

What are the chances effect tokens would help here?

@wconstab
Copy link
Contributor

Just curious, is there such a notion today of a schema that marks a side effect? What would that look like?

@wanchaol
Copy link
Contributor

should this just be wait_tensor(Tensor(a!) self) -> Tensor(a!)?

@zou3519
Copy link
Contributor

zou3519 commented May 22, 2024

There are a couple of options for option 2 "update the schema of this op to mark it as mutating its input (to reflect the side effect)":

a) wait_tensor(Tensor(a!) self) -> Tensor(a!). This requires us to add a functional version of the op and reinplacing logic to inductor for it.
b) wait_tensor(Tensor(a!) self) -> (). This one doesn't require us to add a functional op (we will automatically generate a functional op
c) we've been adding new "print" operators that have arbitrary side effects that automatically get functionalized (@angelayi @yanboliang). There's a side table where we actually mark the operators as having side effects.

SIDE_EFFECTS: Dict[torch._ops.OpOverload, _EffectType] = {
torch.ops.aten._print.default: _EffectType.ORDERED,
}

@LucasLLC LucasLLC self-assigned this Jun 5, 2024
@LucasLLC LucasLLC added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
oncall: distributed Add this issue/PR to distributed oncall triage queue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

5 participants