New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PT2] sum operator int32 compile+ eager backened fails #100698
Comments
I'm actually not sure this is a good first issue, but it's all in Python and a strong engineer should be able to figure it out. The PrimTorch decomposition for sum must be adjusted to handle mismatch between out precision and inferred precision. You'll need to first figure out what the eager semantics are, and then replicate them (or conclude the eager semantics are wrong and fixup eager not to allow this program.) cc @cpuhrsch |
Hi, I'm new to pytorch contributing and I'd like to work on this issue. Seems like for integer types the output dtype is set to int64 I assume to handle potential overflow. pytorch/torch/_refs/__init__.py Lines 2178 to 2182 in 6308563
But then this results in a failure here: pytorch/torch/_refs/__init__.py Lines 2059 to 2066 in 6308563
If this handling is meant to address potential overflow issues. I see that for floats such handling is done here: pytorch/torch/_refs/__init__.py Lines 2078 to 2080 in 6308563
With the relevant mappings here: pytorch/torch/_prims_common/__init__.py Lines 1171 to 1175 in 6308563
If I have diagnosed the issue correctly then potential solutions are:
Let me know if I'm on the right track or if I'm overlooking something important, I assume there must be a good reason why float type promotion seems to be handled more gracefully but not integers. Thanks! |
The important thing to figure out is how the eager mode test is done. Can you check that out? |
Ok I'll figure that out. |
Seems like for eager mode the semantics are (what's highlighted are the args to sum) :
Relevant code: pytorch/aten/src/ATen/native/ReduceOps.cpp Lines 135 to 148 in 6308563
pytorch/aten/src/ATen/native/ReduceOpsUtils.h Lines 323 to 335 in 6308563
These semantics make sense to me so seems that the TorchDynamo semantics should be updated to match this. |
Seems the following patch works to prevent the failure but I will not be able to submit a PR for it until at least next week as I will be away for a few days and still need to go through what is required for testing and other pytorch PR requirements. index cfc81762468..0ca454f5e97 100644
--- a/torch/_refs/__init__.py
+++ b/torch/_refs/__init__.py
@@ -2176,7 +2176,9 @@ def sum(
out: Optional[Tensor] = None,
) -> TensorLikeType:
if dtype is None:
- if utils.is_boolean_dtype(a.dtype) or utils.is_integer_dtype(a.dtype):
+ if out is not None:
+ dtype = out.dtype
+ elif utils.is_boolean_dtype(a.dtype) or utils.is_integer_dtype(a.dtype):
dtype = torch.int64
else:
dtype = a.dtype |
Fixes [pytorch#100698](pytorch#100698) The current behaviour for dynamo is to set the dtype to torch.int64 for integral types if the dtype is not specified explicitly which results in mismatched behaviour as compared to eager mode. In eager mode the semantics are: - If both out is specified and dtype is specified then they have to match - If dtype is not specified but out is specified then the dtype is set to match the out dtype - If neither dtype nor out is set then the dtype is set to kLong if it is a bool or an integral type
馃悰 Describe the bug
With sum operator in int32 variant with output initialized to empty fails with dtype argument and out dtype must match in reduction
Please use below code to reproduce the issue:
Error logs
Minified repro
Versions
Name: torch
Version: 2.0.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /home/jthakur/.venv_pt_op_test/lib/python3.8/site-packages
Requires: filelock, jinja2, networkx, sympy, typing-extensions
cc @ezyang @mruberry @ngimel @lezcano @peterbell10 @soumith @msaroufim @wconstab @bdhirsh @anijain2305
The text was updated successfully, but these errors were encountered: