Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorIteratorBase::is_scalar return false for empty numpy tensors, but true for empty Torch ones #113037

Open
malfet opened this issue Nov 6, 2023 · 9 comments
Labels
module: cpu CPU specific problem (e.g., perf, algorithm) module: crash Problem manifests as a hard crash, as opposed to a RuntimeError module: regression It used to work, and now it doesn't triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@malfet
Copy link
Contributor

malfet commented Nov 6, 2023

🐛 Describe the bug

Following example causes null pointer dereference at

opmath_t b = iter.original_scalar_value<opmath_t>(2);
which was introduced by #98820

import torch
import numpy as np

args = {
    'input': torch.from_numpy(np.array(np.random.uniform(-1000_000, 1000_000, [8, 0])).astype('float16')),
    'other': torch.from_numpy(np.array(np.random.uniform(-1000_000, 1000_000, [0, 8, 0])).astype('float16')),
    'out': torch.from_numpy(np.array(np.random.uniform(-1000_000, 1000_000, [0])).astype('float32')),
    'rounding_mode': 'floor'
}
torch.div(**args)

Versions

2.1

cc @ezyang @gchanan @zou3519 @kadeng @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

@malfet malfet added high priority module: crash Problem manifests as a hard crash, as opposed to a RuntimeError module: cpu CPU specific problem (e.g., perf, algorithm) module: regression It used to work, and now it doesn't labels Nov 6, 2023
@malfet malfet self-assigned this Nov 6, 2023
@malfet malfet added this to the 2.1.1 milestone Nov 6, 2023
@malfet
Copy link
Contributor Author

malfet commented Nov 6, 2023

As @tringwald already commented is is due to a slightly odd definition of TensorIterator::is_scalar which returns true for empty numpy tensors

bool TensorIteratorBase::is_scalar(int arg) const {
const auto& stride = operands_[arg].stride_bytes;
for (const auto i : c10::irange(ndim())) {
if (stride[i] != 0 && shape_[i] != 1) {
return false;
}
}
return true;
}

@malfet malfet modified the milestones: 2.1.1, 2.1.2 Nov 6, 2023
@colesbury colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 6, 2023
@mingfeima
Copy link
Collaborator

@jiayisunx could you please take a look at this one ?

As @tringwald already commented is is due to a slightly odd definition of TensorIterator::is_scalar which returns true for empty numpy tensors

bool TensorIteratorBase::is_scalar(int arg) const {
const auto& stride = operands_[arg].stride_bytes;
for (const auto i : c10::irange(ndim())) {
if (stride[i] != 0 && shape_[i] != 1) {
return false;
}
}
return true;
}

might be ndim() return 0 on numpy empty tensor.

@tringwald
Copy link
Collaborator

might be ndim() return 0 on numpy empty tensor.

ndim does not seem to be the problem. However, when initializing a tensor from an empty numpy array, all strides seem to be 0. This is not the case when directly creating an empty torch tensor.

>>> a = torch.from_numpy(np.ones((0, 0, 0)))
>>> a.stride()
(0, 0, 0)
>>> a.shape
torch.Size([0, 0, 0])
>>> a.ndim
3
>>> b = torch.ones(0, 0, 0)
>>> b.stride()
(1, 1, 1)

Due to this behavior, the stride[i] != 0 && shape_[i] != 1 check will always we false, as stride[i] == 0.

@malfet
Copy link
Contributor Author

malfet commented Nov 7, 2023

Very easy fix is to add data_ptr(arg) != null:

diff --git a/aten/src/ATen/TensorIterator.cpp b/aten/src/ATen/TensorIterator.cpp
index b42188858f6..fb7edeb5cfd 100644
--- a/aten/src/ATen/TensorIterator.cpp
+++ b/aten/src/ATen/TensorIterator.cpp
@@ -801,7 +801,7 @@ bool TensorIteratorBase::is_scalar(int arg) const {
       return false;
     }
   }
-  return true;
+  return data_ptr(arg) != nullptr;
 }
 
 bool TensorIteratorBase::is_cpu_scalar(int arg) const {

malfet added a commit that referenced this issue Nov 7, 2023
`TensorIterator::is_scalr()` should return false for empty numpy tensor
(i.e. ones created using `torch.from_numpy(np.empty([1, 2, 0]))`)

Fixes #113037 and #113156
malfet added a commit that referenced this issue Nov 8, 2023
`TensorIterator::is_scalr()` should return false for empty numpy tensor
(i.e. ones created using `torch.from_numpy(np.empty([1, 2, 0]))`)

Fixes #113037 and #113156
malfet added a commit that referenced this issue Dec 5, 2023
Targeted fix for #113037

A more fundamental one, where those functions are not even called for
empty tensors are coming later
pytorchmergebot pushed a commit that referenced this issue Dec 6, 2023
Targeted fix for #113037

A more fundamental one, where those functions are not even called for
empty tensors are coming later

Pull Request resolved: #115183
Approved by: https://github.com/drisspg, https://github.com/atalman, https://github.com/huydhn
huydhn pushed a commit that referenced this issue Dec 6, 2023
Targeted fix for #113037

A more fundamental one, where those functions are not even called for
empty tensors are coming later

Pull Request resolved: #115183
Approved by: https://github.com/drisspg, https://github.com/atalman, https://github.com/huydhn
huydhn added a commit that referenced this issue Dec 6, 2023
* Fix NULL dereference in binary CPU ops (#115183)

Targeted fix for #113037

A more fundamental one, where those functions are not even called for
empty tensors are coming later

Pull Request resolved: #115183
Approved by: https://github.com/drisspg, https://github.com/atalman, https://github.com/huydhn

* Fix build after conflict resolution

* Also include #113262 to pass the test

---------

Co-authored-by: Nikita Shulga <nshulga@meta.com>
malfet added a commit that referenced this issue Dec 9, 2023
Targeted fix for #113037

A more fundamental one, where those functions are not even called for
empty tensors are coming later

Cherry-pick of release #115183 into release/2.2 branch

(cherry picked from commit b56b002)
malfet added a commit that referenced this issue Dec 9, 2023
Targeted fix for #113037

A more fundamental one, where those functions are not even called for
empty tensors are coming later

Cherry-pick of release #115183 into release/2.2 branch

(cherry picked from commit b56b002)
@atalman
Copy link
Contributor

atalman commented Dec 14, 2023

closing this task since included in 2.1.2 and validated

@atalman atalman closed this as completed Dec 14, 2023
@malfet malfet modified the milestones: 2.1.2, 2.2.0 Dec 14, 2023
@malfet malfet reopened this Dec 14, 2023
@malfet
Copy link
Contributor Author

malfet commented Dec 14, 2023

I want to have a proper fix, when we don't even dispatch to the implementation if number of elements is 0

dmenig pushed a commit to dmenig/pytorch that referenced this issue Dec 21, 2023
Targeted fix for pytorch#113037

A more fundamental one, where those functions are not even called for
empty tensors are coming later

Pull Request resolved: pytorch#115183
Approved by: https://github.com/drisspg, https://github.com/atalman, https://github.com/huydhn
@atalman
Copy link
Contributor

atalman commented Jan 18, 2024

Validated with final rc. No failure

@atalman
Copy link
Contributor

atalman commented Feb 1, 2024

Removing from 2.2.0 milestone

@atalman atalman removed this from the 2.2.0 milestone Feb 1, 2024
@malfet malfet removed their assignment Jun 5, 2024
@malfet malfet changed the title torch.div on empty tensors causes segmentation fault TensorIteratorBase::is_scalar return false for empty numpy tensors, but true for empty Torch ones Jun 5, 2024
@malfet
Copy link
Contributor Author

malfet commented Jun 5, 2024

Unassigning myself, changing title and removing crash and high priority, as underlying crash was fixed by #115183 but still question remains what is scalar from TensorIterator point of view and why native kernels are even called ones structured kernel execution on an empty tensors has finished

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: cpu CPU specific problem (e.g., perf, algorithm) module: crash Problem manifests as a hard crash, as opposed to a RuntimeError module: regression It used to work, and now it doesn't triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

6 participants