Bugfix to forward autodiff causing different datatype 2 #165784

skpark-rh · 2025-10-17T19:56:35Z

The Problem Summary

The issue boiled down to data type promotion logic. The code base has two different functions that deal with dtype promotion logic. If it is purely multi-dimensional tensor operations, the cpp code gets triggered and that follows the numpy dtype promotion logic. That is why in #160513 NDim tensors are fine as NDim dtypes gets precedence. The issue came with python scalars and 0Dim tensors. When it detects "scalars", a python implementation of dtype promotion logic gets triggered (torch/_prims_common/init.py:1544). Since this is in python, the implementation can't distinguish what is from a wrapped tensor and a 0Dim tensor and thus will just take the highest dtype which is the python double wrapped number.

The Fix

The python implementation for dtype promotion had to know where the scalar came from. Once the scalar can be distinguished then the appropriate dtype can be set. The first approach was to try and expose the is_wrapped_number method but this came with a big issue. During the forward_ad the derivative of those scalars turned out to be ZeroTensors. The ZeroTensor internally uses a hack to initialize a meta dtype tensor which skips expensive dispatch operations. But the copy would not grab everything especially the is_number_wrapped_ property. I thought about modifying the copy but that seemed to go away from the spirit of what the copy was intended for and plus the tests for is_wrapped_number_ requires dim > 0 and a scalar ZeroTensor is a meta dtype tensor which complicates things.

So I chose the route of creating a new property called was_wrapped_number and exposed this property to the python tensor API. I had to modify the autograd code generation to set was_wrapped_number in the mul, add, and div operations in VariableType.cpp. Once this property was set, the dtype promotion logic could be updated to consider wrapped numbers and 0Dim numbers. Once that hierarchy was taken care of, the buggy behavior was fixed.

I wrote a new ops testing module TestForwardADWithScalars. I saw that this bug was unique and required new testing paradigm. This only tests the multiply, add, and divide and I chose this because all operations boil down to these three operations.

@ezyang @OihanJoyot

cc @EikanWang @jgong5 @wenzhe-nrv @sanchitintel

… tensor from a wrapped number.

… a "is_wrapped_number" as true if the derived derivated is also a wrapped number.

…python side to handle dtype promotions.

…numbers. Then using the correct dtype promotions on the python side.

… tensor from a wrapped number.

… a "is_wrapped_number" as true if the derived derivated is also a wrapped number.

…python side to handle dtype promotions.

…numbers. Then using the correct dtype promotions on the python side.

…rch into bugfix/dtype_foward_agrad

…erations caused dtypes to be different.

… tensor from a wrapped number.

… a "is_wrapped_number" as true if the derived derivated is also a wrapped number.

skpark-rh · 2025-10-17T19:59:52Z

@pytorchbot label "module: forward ad"

skpark-rh · 2025-10-17T20:14:27Z

@ezyang - continuing from our conversation in the closed PR...
The first way I tried has its limitations. The _efficientzerotensor_symint copies the size of the input tensor but that input tensor is an empty meta tensor. Even if I passed in an option of is_wrapped_number_, the device_type of ZeroTensor is a meta device with dim=0. So when the set_wrap_number is called it has a strict constraint saying that the tensor cannot be a scalar tensor and have is_wrapped_number_ set. Either we would have to remove that constraint from the setter or modify the _efficientzerotensor_symint to not be a meta device type. I just can't get around not having the ZeroTensor. The derivative of these scalars will have to be zeros.

ezyang · 2025-10-18T01:57:56Z

My intuition is that we should relax the is_wrapped_number setter so we are able to have a zerotensor 0d tensor be a wrapped number. This lets us repesent a zero 0d regular tensor. Do you know where the check you're running against is? I checked set_wrapped_number but it seems like it should work with 0d tensor.

skpark-rh · 2025-10-18T04:23:57Z

In VariableType_0.cpp:13495 -> result_new_fw_grad_opt = other_t * self_p + self_t * other_p; other_t and self_t becomes the ZeroTensor. I can set the is_wrapped_number_ property just fine here. The problem happens when the scalar operations start to execute. Once the addition/multiplication starts to happen the code eventually lands itself in the python code,torch/_prims_common/__init__.py:1537-1726, to compute the correct dtype. When the code lands in the python side of things, I need to access the is_wrapped_number_ or the _wrapped_number property. I tried to expose the property through the python bindings but the following code, torch/csrc/jit/python/pybind_utils.cpp:590, prevented me from exposing the property because the incoming ZeroTensor was a device type meta. There is a hard internal assert, TORCH_INTERNAL_ASSERT(tensor.device().is_cpu());, preventing me from grabbing the is_wrapped_number_ property.

This is why I needed a new property was_wrapped_number to get around this constraint. I believe this problem becomes difficult because the code suddenly goes to the python side to compute the appropriate dtypes... If I can get away with removing that internal assert from the JIT code then this becomes a simpler problem. I just don't have the expertise or the experience to make that judgement.

ezyang · 2025-10-18T15:03:10Z

Ok, this makes sense. I propose we relax the assert. Specifically if you have a zero tensor which is 0d then it is also ok for the device type to be meta.

Another strategy would be to avoid use of zero tensor when 0d but I do not know how easy that is.

skpark-rh · 2025-10-20T13:58:30Z

Okay I'll pursue that avenue of relaxing the assert. I'll check to see if it is also a meta device with dim=0 to allow for the property to pass through.

As for avoiding using ZeroTensor with 0dim, I do not know either... If the above path becomes a dead end then I can purse this route for sure.

skpark-rh · 2025-10-20T21:15:55Z

@ezyang
So my comment got lost in the woods with the botched merged in my previous PR but I need your advice on modifying TensorOption. Now that I can set the is_wrapped_number property in the EfficientZeroTensor and see it in the python side, I need to propagate the is_wrapped_number from the parent tensor to the child tensor using TensorOption when the EfficientZeroTensor is created. I am planning to add is_wrapped_number as an optional boolean to EfficientZeroTensor. I think I can minimize the fallout by modifying the torchgen/native_function_generation.py to look for the pattern, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None and add bool? is_wrapped_number=None at the end of it. By doing this, I will minimize the fallout from inspecting all of tensor native function to just the ones with said pattern. Since the added boolean is optional, it won't do any harm to other native functions and will give the added benefit of allowing developers to use it when the time comes.

Your thoughts?

ezyang · 2025-10-21T14:49:45Z

Is there a reason you have to update the constructor? I would have just set the property into the field directly after you made the ZeroTensor.

skpark-rh · 2025-10-21T17:46:54Z

Oh I just assumed from your comments about pursing my initial plan was that you wanted me to not add code to the autogeneration but do it through the constructor. (tools/autograd/gen_variable_type.py:766-773,1919-1925) I basically am using a setter there. I assumed that modifying the TensorOptions will provide a more robust way to catch any dtype issues with ZeroTensor. I am only using the setter for the mul, add, and div operations (tools/autograd/gen_variable_type.py:1919).

ezyang · 2025-10-27T03:01:29Z

we discussed this at PTC, there is a next step

…l assert to allow meta devices types with 0dims through.

…rch into bugfix/dtype_foward_agrad

skpark-rh · 2025-10-28T21:38:04Z

@ezyang So I removed the was_wrapped_number property and used the is_wrapped_number property instead. I am exposing is_wrapped_number on the python side to allow the dtype promotion logic to consider wrapped python numbers and 0d tensor scalars.

ezyang · 2025-10-29T02:59:52Z

aten/src/ATen/ZeroTensorFallback.cpp


      if (ivalue.isTensor()) {
        auto tensor = std::move(ivalue).toTensor();
+        bool is_wrapped_number = (tensor.unsafeGetTensorImpl()->is_wrapped_number()) ? true : false;


Not sure why you have to ternary here

Yeah, this was dumb of me...

ezyang · 2025-10-29T03:00:19Z

aten/src/ATen/ZeroTensorFallback.cpp

            TORCH_CHECK(!mut_arg, "ZeroTensors are immutable. Please use the materialized zero tensor ",
                    "obtained using .clone() if you want a mutable tensor.");
            tensors[j] = at::zeros({}, tensor.options()).expand(tensor.sizes());
+            Tensor updated_tensor = tensors[j];


skip writing into tensors[j] and then writing into it again.

Yep, dumb of me... Will get this updated.

ezyang · 2025-10-29T03:00:52Z

aten/src/ATen/native/TensorConversions.cpp

        r.copy_(self, non_blocking);
+        if (self.unsafeGetTensorImpl()->is_wrapped_number()) {
+          r.unsafeGetTensorImpl()->set_wrapped_number(true);
+        }


These two are kind of sus, are you sure these were necessary?

So when the copy_ gets invoked, the is_wrapped_number does not get copied over which I thought was weird. Without this setter, the flag gets lost.

ezyang · 2025-10-29T03:02:03Z

tools/pyi/gen_pyi.py

            "is_mkldnn": ["is_mkldnn: _bool"],
            "is_vulkan": ["is_vulkan: _bool"],
            "is_ipu": ["is_ipu: _bool"],
+            "is_wrapped_number": ["is_wrapped_number: _bool"],


It shouldn't be necessary to expose this in Python; wrapped numbers always turn into plain numbers when you get to Python. Why did you need it? Is it because of ZeroTensor wrapped number? I think ZeroTensor should turn into a plain number too.

Yes the reason this is needed is because ZeroTensor is a meta tensor with an empty storage. There is no number to convert to get to the python side.

I think if we turn ZeroTensor into a plain number then this become easier. I don't have to expose anything on the python side. I do wonder what would happen to precison/accuracy if a double precision ZeroTensor gets converted to just a plain number. The resulting scalar tensor will be a lower precision if the said scalar tensor is a float.

Although I do think the mul_zerotensor (pytorch/aten/src/ATen/native/BinaryOps.cpp:1012-1019) will be affected. Line 1017 is the one in question. I think it'll be okay if we handle the conversion in the JIT code.

ezyang · 2025-10-29T03:02:21Z

torch/_prims_common/__init__.py

-                    zero_dim_tensor_dtype = get_higher_dtype(
-                        zero_dim_tensor_dtype, _dtype
-                    )
+                    if x.is_wrapped_number:


so this shouldn't be possible here, specifically

So when I go through the stack trace and debug here specifically, the arguments are tensors and they are both meta devices. I think this is because of the forward autograd operation requiring them to be tensors all the way through. I can post the stack trace up until this point. The code uses two meta tensors to get the respective dtypes.

This is why at this point, the code needs to know if those meta devices are wrapped numbers to properly determine the right promotion.

ezyang

This is promising but I hope it can be tighter

ezyang

This is promising but I hope it can be tighter

skpark-rh added 30 commits September 29, 2025 22:26

Added is_wrapped_number method to determine on the python side if the…

ddf3e41

… tensor from a wrapped number.

Added fw derivative template for mul_Tensor to set a zero tensor with…

c29c212

… a "is_wrapped_number" as true if the derived derivated is also a wrapped number.

Reverted exposing the is_wrapped_number method.

c327cfc

Added a new property called was_wrapped_number and exposed it to the …

ad40ba4

…python side to handle dtype promotions.

Setting the was_wrapped_number for zero tensors derived from wrapped_…

1a10b92

…numbers. Then using the correct dtype promotions on the python side.

Python Tensor init doc update for was_wrapped_number.

76ac42b

Need to update add.

1ed00e4

Added is_wrapped_number method to determine on the python side if the…

145d95a

… tensor from a wrapped number.

Added fw derivative template for mul_Tensor to set a zero tensor with…

11e9d13

… a "is_wrapped_number" as true if the derived derivated is also a wrapped number.

Reverted exposing the is_wrapped_number method.

18faa1e

Added a new property called was_wrapped_number and exposed it to the …

d1f7c11

…python side to handle dtype promotions.

Setting the was_wrapped_number for zero tensors derived from wrapped_…

7a21c49

…numbers. Then using the correct dtype promotions on the python side.

Python Tensor init doc update for was_wrapped_number.

2ccd5d0

Need to update add.

7bc1899

Merge branch 'bugfix/dtype_foward_agrad' of github.com:skpark-rh/pyto…

dc6cd52

…rch into bugfix/dtype_foward_agrad

Added passthrough for new property.

8c2d2d0

Merge branch 'pytorch:main' into bugfix/dtype_foward_agrad

f8a0d63

Merge branch 'main' into bugfix/dtype_foward_agrad

167cb03

Merge branch 'pytorch:main' into bugfix/dtype_foward_agrad

d2d2bc8

Merge branch 'main' into bugfix/dtype_foward_agrad

b08d7e0

Initalize boolean variable to false.

4eca0bd

Added wrapped_num template for div.

c3885f8

Wrote new test for the forward autograd bug where basic arithmetic op…

c3fb426

…erations caused dtypes to be different.

Merge branch 'main' into bugfix/dtype_foward_agrad

f8efa8f

Merge branch 'pytorch:main' into bugfix/dtype_foward_agrad

b1fd13f

Clean up with lintrunner.

36a4c6a

was_wrapped_number property doesn't have a TorchScript implementation.

6b2308f

Merge branch 'main' into bugfix/dtype_foward_agrad

03fd032

Added is_wrapped_number method to determine on the python side if the…

0dc5cbf

… tensor from a wrapped number.

Added fw derivative template for mul_Tensor to set a zero tensor with…

746eb7d

… a "is_wrapped_number" as true if the derived derivated is also a wrapped number.

pytorch-bot bot added the release notes: autograd release notes category label Oct 17, 2025

pytorch-bot bot added the module: forward ad label Oct 17, 2025

pytorchbot added the open source label Oct 17, 2025

albanD removed their request for review October 19, 2025 03:29

albanD added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 19, 2025

skpark-rh added 7 commits October 28, 2025 15:00

Removed was_wrapped_number and used already existing is_wrapped_number.

aea4ecf

Added helper function to set is_wrapped_number.

46386bf

When kernel invokes python func with args and kwargs, relaxed interna…

1c3af6a

…l assert to allow meta devices types with 0dims through.

Merge branch 'bugfix/dtype_foward_agrad' of github.com:skpark-rh/pyto…

583a417

…rch into bugfix/dtype_foward_agrad

Used ternary operator.

f955c57

linting.

7148681

No torchscript implementation of is_wrapped_number.

d8b30f0

ezyang reviewed Oct 29, 2025

View reviewed changes

ezyang requested changes Oct 29, 2025

View reviewed changes

Uh oh!

Bugfix to forward autodiff causing different datatype 2 #165784

Are you sure you want to change the base?

Bugfix to forward autodiff causing different datatype 2 #165784

Conversation

skpark-rh commented Oct 17, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The Problem Summary

The Fix

Uh oh!

skpark-rh commented Oct 17, 2025

Uh oh!

skpark-rh commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented Oct 18, 2025

Uh oh!

skpark-rh commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented Oct 18, 2025

Uh oh!

skpark-rh commented Oct 20, 2025

Uh oh!

skpark-rh commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented Oct 21, 2025

Uh oh!

skpark-rh commented Oct 21, 2025

Uh oh!

ezyang commented Oct 27, 2025

Uh oh!

skpark-rh commented Oct 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skpark-rh Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skpark-rh Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skpark-rh Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skpark-rh Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

skpark-rh commented Oct 17, 2025 •

edited by pytorch-bot bot

Loading

skpark-rh commented Oct 17, 2025 •

edited

Loading

skpark-rh commented Oct 18, 2025 •

edited

Loading

skpark-rh commented Oct 20, 2025 •

edited

Loading

skpark-rh Oct 29, 2025 •

edited

Loading

skpark-rh Oct 29, 2025 •

edited

Loading

skpark-rh Oct 29, 2025 •

edited

Loading

skpark-rh Oct 29, 2025 •

edited

Loading