Fixes for PyTorch/XLA functionalization integration #94537

wonjoo-wj · 2023-02-09T19:41:36Z

Fixes for PyTorch/XLA functionalization integration

Some notable changes include:

More asserts in FunctionalTensorWrapper, so bugs show up more cleanly in cases where we e.g. forget to wrap an output
Make the *_scatter ops CompositeExplicitAutogradNonFunctional, so we get a better error message and XLA doesn't accidentally try to us them
Fix LTC/XLA codegen in core to handle multi-tensor out= ops with no returns
Better erroring: Allow XLA to use the CPU fallback from core in a way so that it always errors on view ops, which XLA should no longer see.
Update MetaConverter to exclude XLA tensors in raising NotImplemented…
Add _propagate_xla_data op
Add meta tensor support for some ops

pytorch-bot · 2023-02-09T19:41:39Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94537

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 18b6dca:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

aten/src/ATen/FunctionalTensorWrapper.cpp

bdhirsh · 2023-02-10T18:55:42Z

torch/_subclasses/meta_utils.py

            or isinstance(t, FakeTensor)
        ):
-            if any(
+            if t.device.type != "xla" and any(


Do you know why this was needed? (Or what codepath in the functionalization <> XLA integration hit this code?)

Yep, AFAIK -- in dynamo codepath, pytorch converts the tensor into a fake tensor, and then run the ops on the fake tensor. And that process involves making a Meta tensor, and then turning that into the fake tensor subclass.

And now since XLA tensors are functional tensors, the if statement (torch._is_functional_tensor(t)) here returns true, which results in hitting the return NotImplemented code. So we added this check to bypass this if check.

More details at pytorch/xla#4414 (comment).

So, a while ago we talked about the functionalization code paths for dynamo vs. lazy tensor. I think my take here is that it would be preferably for pytorch/XLA to go through all of this functionalization infra only when using the normal, non-dynamo code paths. And in the dynamo integration, for XLA to do what it used to do before this PR (not bother wrapping tensors into FunctionalWrappers, etc).

The main reason I'm leaning this way is because when you're using the torch.compile() / dynamo API, functionalization for XLA is completely redundant - our infra will send a graph to the backend to compile, and you're guaranteed that the graph is already functionalized.

If you end up keeping all of this functionalization logic in both code paths, one thing you might (?) run into is that there will be two levels of functionalization happening. This isn't really something that's tested / supported today. So my take here is that:

(1) If this one check here is enough to get everything working smoothly for dynamo, then this carve-out seems fine to me.

(2) If it's not, and you start hitting other weird functionalization failures, we might want to think about turning XLA functionalization off in the dynamo codepath.

Thanks for the pointers. So far, this one check was enough to get everything working smoothly dynamo until recently we got a new dynamo regression for one of our unit tests (pytorch/xla#4680). I didn't get chance to debug this too much in-depth, but outside from this it seems like we're not getting any other weird functionalization failures.

@JackCaoG, what do you think about this? And if we were to modify PyTorch/XLA's dynamo to go off the functionalization codepath, would that be at our dynamo bridge at (https://github.com/pytorch/xla/blob/master/torch_xla/core/dynamo_bridge.py#L39 to manually update each tensor to a non-functional tensor?

My take is: it looks like that we can keep this approach for now, and then investigate to skip functioanlization for dynamo later on.

We can try to disable the functionization in https://github.com/pytorch/xla/blob/master/torch_xla/core/dynamo_bridge.py#L224. That might be the right thing to do (If there is an easy flag to turn it on and off).

Probably not, otherwise we can just land functionalization with the flag!!!

well then there is no way for us to do that. The dynamo path's tracing will go through the same lazy tracing logic.

Well, we can add a runtime flag... There are just a few places where we wrap the XLATensor into the FunctionalWrapper. I have been thinking about this for a while if we have to land the feature to unblock people...

aten/src/ATen/FunctionalTensorWrapper.cpp

wonjoo-wj · 2023-02-24T10:03:41Z

I'll mark this one ready for review now since all CIs are green.

aten/src/ATen/native/Copy.cpp

torch/_meta_registrations.py

aten/src/ATen/native/Copy.cpp

Summary: This pull request adds an op called _propagate_xla_data to help propagating information between the updated_tensor created during in-place ops transform and the original input such that pytorch/xla can keep their in-place ops optimization after adopting functionalization. Test Plan: In XLA: PJRT_DEVICE=CPU python test/test_input_output_aliases.py -v -k test_non_view

…_ and nan_to_num

This reverts commit a470744.

bdhirsh

Thanks! :)

bdhirsh · 2023-03-02T19:12:12Z

@pytorchbot merge

pytorchmergebot · 2023-03-02T19:14:10Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

alanwaketan · 2023-03-02T19:21:41Z

Thanks, Brian.

Fixes for PyTorch/XLA functionalization integration --- Some notable changes include: - More asserts in `FunctionalTensorWrapper`, so bugs show up more cleanly in cases where we e.g. forget to wrap an output - Make the *_scatter ops `CompositeExplicitAutogradNonFunctional`, so we get a better error message and XLA doesn't accidentally try to us them - Fix LTC/XLA codegen in core to handle multi-tensor out= ops with no returns - Better erroring: Allow XLA to use the CPU fallback from core in a way so that it always errors on view ops, which XLA should no longer see. - Update MetaConverter to exclude XLA tensors in raising NotImplemented… - Add `_propagate_xla_data` op - Add meta tensor support for some ops Pull Request resolved: pytorch/pytorch#94537 Approved by: https://github.com/bdhirsh

Fixes for PyTorch/XLA functionalization integration --- Some notable changes include: - More asserts in `FunctionalTensorWrapper`, so bugs show up more cleanly in cases where we e.g. forget to wrap an output - Make the *_scatter ops `CompositeExplicitAutogradNonFunctional`, so we get a better error message and XLA doesn't accidentally try to us them - Fix LTC/XLA codegen in core to handle multi-tensor out= ops with no returns - Better erroring: Allow XLA to use the CPU fallback from core in a way so that it always errors on view ops, which XLA should no longer see. - Update MetaConverter to exclude XLA tensors in raising NotImplemented… - Add `_propagate_xla_data` op - Add meta tensor support for some ops Approved by: https://github.com/bdhirsh [ghstack-poisoned]

Fixes for PyTorch/XLA functionalization integration --- Some notable changes include: - More asserts in `FunctionalTensorWrapper`, so bugs show up more cleanly in cases where we e.g. forget to wrap an output - Make the *_scatter ops `CompositeExplicitAutogradNonFunctional`, so we get a better error message and XLA doesn't accidentally try to us them - Fix LTC/XLA codegen in core to handle multi-tensor out= ops with no returns - Better erroring: Allow XLA to use the CPU fallback from core in a way so that it always errors on view ops, which XLA should no longer see. - Update MetaConverter to exclude XLA tensors in raising NotImplemented… - Add `_propagate_xla_data` op - Add meta tensor support for some ops Approved by: https://github.com/bdhirsh

github-actions bot added the ciflow/inductor label Feb 9, 2023

wonjoo-wj marked this pull request as draft February 9, 2023 19:45

pytorchbot added the open source label Feb 9, 2023

wonjoo-wj mentioned this pull request Feb 9, 2023

Fixes for PyTorch/XLA functionalization integration #88787

Closed

wonjoo-wj added the topic: not user facing topic category label Feb 9, 2023

alanwaketan reviewed Feb 10, 2023

View reviewed changes

aten/src/ATen/FunctionalTensorWrapper.cpp Outdated Show resolved Hide resolved

alanwaketan mentioned this pull request Feb 10, 2023

[Functionalization] Fix test_non_view pytorch/xla#4505

Merged

alanwaketan force-pushed the functionalization branch from 9f6d339 to 719bb6e Compare February 10, 2023 05:28

bdhirsh reviewed Feb 10, 2023

View reviewed changes

aten/src/ATen/FunctionalTensorWrapper.cpp Show resolved Hide resolved

alanwaketan force-pushed the functionalization branch 2 times, most recently from bfc0a1b to 36d4eb6 Compare February 21, 2023 21:13

alanwaketan force-pushed the functionalization branch from 73d8e2f to bc57d58 Compare February 24, 2023 01:44

wonjoo-wj marked this pull request as ready for review February 24, 2023 10:03

wonjoo-wj requested review from alanwaketan and bdhirsh February 24, 2023 10:03

bdhirsh reviewed Feb 24, 2023

View reviewed changes

aten/src/ATen/native/Copy.cpp Show resolved Hide resolved

bdhirsh reviewed Feb 24, 2023

View reviewed changes

torch/_meta_registrations.py Outdated Show resolved Hide resolved

bdhirsh reviewed Feb 24, 2023

View reviewed changes

torch/_meta_registrations.py Outdated Show resolved Hide resolved

alanwaketan reviewed Feb 25, 2023

View reviewed changes

torch/_meta_registrations.py Outdated Show resolved Hide resolved

wonjoo-wj force-pushed the functionalization branch 2 times, most recently from e152beb to 54a76e3 Compare February 25, 2023 00:57

alanwaketan reviewed Feb 25, 2023

View reviewed changes

aten/src/ATen/native/Copy.cpp Outdated Show resolved Hide resolved

ezyang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 27, 2023

alanwaketan requested a review from a team as a code owner February 27, 2023 20:31

wonjoo-wj added 3 commits March 1, 2023 23:39

Fixes for PyTorch/XLA functionalization integration

fd1c4bf

Add debugging lines

e3ba6bd

Update native_functions for as_strided_scatter

064ea3d

alanwaketan and others added 14 commits March 1, 2023 23:39

Add meta tensor support for _amp_foreach_non_finite_check_and_unscale…

ea0bb0b

…_ and nan_to_num

Run linter

f8a753f

Add default parameter values for nan_to_num

fbdf5cc

Run linter

b2158f7

Adds comments to propagate_xla_data

4ef4b18

Update TORCH_INTERNAL_ASSERT

fae835f

Use @out_wrapper for nan_to_num.out op

2a45749

Add back error message in _meta_registrations.py

54c891c

Add meta unit test for nan_to_num

595d9ac

Revert "Update TORCH_INTERNAL_ASSERT"

cf3d128

This reverts commit a470744.

Update XLA hash and TORCH_INTERNAL_ASSERT

d56e02b

Update TORCH_INTERNAL_ASSERT

8f54460

Update XLA hash

0f9d958

alanwaketan force-pushed the functionalization branch from 82c3002 to 0f9d958 Compare March 1, 2023 23:45

Update XLA hash

18b6dca

bdhirsh approved these changes Mar 2, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 2, 2023

pytorchmergebot added the Merged label Mar 2, 2023

pytorchmergebot closed this in 3095c95 Mar 2, 2023

github-actions bot deleted the functionalization branch August 31, 2024 02:00

Fixes for PyTorch/XLA functionalization integration #94537

Fixes for PyTorch/XLA functionalization integration #94537

Uh oh!

Conversation

wonjoo-wj commented Feb 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94537

✅ No Failures

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wonjoo-wj commented Feb 24, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bdhirsh left a comment

Choose a reason for hiding this comment

Uh oh!

bdhirsh commented Mar 2, 2023

Uh oh!

pytorchmergebot commented Mar 2, 2023

Merge started

Uh oh!

alanwaketan commented Mar 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

wonjoo-wj commented Feb 9, 2023 •

edited

Loading

pytorch-bot bot commented Feb 9, 2023 •

edited

Loading