Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mysterious Tensor Indexing Problem #22013

Open
adodge opened this issue Jun 20, 2019 · 8 comments
Open

Mysterious Tensor Indexing Problem #22013

adodge opened this issue Jun 20, 2019 · 8 comments
Labels
high priority module: advanced indexing Related to x[i] = y, index functions module: error checking Bugs related to incorrect/lacking error checking module: numpy Related to numpy support, and also numpy compatibility of our operators module: ux triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@adodge
Copy link

adodge commented Jun 20, 2019

🐛 Bug

Indexing into a tensor with a 2d list of indices seems to fail sometimes, with a critical point when the number of indices is less than 32.

To Reproduce

import torch

def index_test(n):
    M = torch.tensor([0.]*n) # Trivial example for illustrative purposes
    idxes = [(a,) for a in range(n)]
    return M[idxes]

index_test(31) # Note that this fails
index_test(32) # But this works

n=31 fails with:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-169-222ec1878948> in <module>
      6     return M[idxes]
      7 
----> 8 index_test(31)

<ipython-input-169-222ec1878948> in index_test(n)
      4     M = torch.tensor([0.]*n) # Trivial example for illustrative purposes
      5     idxes = [(a,) for a in range(n)]
----> 6     return M[idxes]
      7 
      8 index_test(31)

IndexError: too many indices for tensor of dimension 1

Expected behavior

I expected this indexing to work the same for any number of indices. This is problematic in my actual code as the size of the data varies.

Environment

PyTorch version: 1.1.0
Is debug build: No
CUDA used to build PyTorch: 10.0.130

OS: Ubuntu 16.04.6 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
CMake version: Could not collect

Python version: 3.6
Is CUDA available: No
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA

Versions of relevant libraries:
[pip3] numpy==1.16.4
[pip3] pytorch-pretrained-bert==0.6.2
[pip3] torch==1.1.0
[conda] Could not collect

Additional context

cc @ezyang @gchanan @zou3519 @mruberry @rgommers @heitorschueroff

@umanwizard
Copy link
Contributor

Does M[torch.tensor(idxes)] do what you want?

@adodge
Copy link
Author

adodge commented Jun 20, 2019

Aha! Indeed it does. Thank you! That's an excellent workaround.

I'm tempted to say this is still a bug.

Indexing by native python lists seems like it should either be equivalent to indexing by a tensor or completely disallowed. Or, at the very least, the error message should mention something about indexing by native python lists being unsupported.

@mrshenli mrshenli added module: advanced indexing Related to x[i] = y, index functions module: operators labels Jun 20, 2019
@umanwizard umanwizard added triage review triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jun 21, 2019
@umanwizard
Copy link
Contributor

In particular it does seem that we should have the same behavior regardless of whether the list has more or fewer than 32 elements.

But I don't know the exact PyTorch indexing rules -- will discuss with the rest of the team and update this issue accordingly.

@colesbury
Copy link
Member

Lists with 31 or fewer elements are interpreted as tuples in some cases. This is to match NumPy's behavior, which in turn is for backwards compatibility reasons. NumPy recently added a warning for this case. We should probably do the same.

static bool treatSequenceAsTuple(PyObject* index) {
if (PyTuple_Check(index)) {
return true;
}
if (!PySequence_Check(index)) {
return false;
}
// This uses a heuristics from NumPy for determining whether to treat
// non-tuple sequences as if they were a tuple. From the NumPy code comments:
//
// "At this point, we're left with a non-tuple, non-array, sequence:
// typically, a list. We use some somewhat-arbitrary heuristics from here
// onwards to decided whether to treat that list as a single index, or a
// list of indices. Backwards compatibility only takes effect for short
// sequences - otherwise we treat it like any other scalar."
auto n = PySequence_Size(index);
if (n < 0) {
// Negative size indicates a Python error in the PySequence_Size call.
PyErr_Clear();
return false;
}
if (n >= 32) {
return false;
}
for (Py_ssize_t i = 0; i < n; i++) {
auto obj = THPObjectPtr{PySequence_GetItem(index, i)};
if (!obj.get()) {
PyErr_Clear();
return false;
}
if (THPVariable_Check(obj.get()) || PySequence_Check(obj.get()) || PySlice_Check(obj.get())) {
return true;
}
if (obj.get() == Py_Ellipsis || obj.get() == Py_None) {
return true;
}
}
return false;
}

NumPy warns:

FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.

@ezyang
Copy link
Contributor

ezyang commented Jun 24, 2019

cc @VitalyFedyunin

@ezyang
Copy link
Contributor

ezyang commented Jun 24, 2019

We should definitely improve the error message here.

@mruberry mruberry added module: ux module: error checking Bugs related to incorrect/lacking error checking module: numpy Related to numpy support, and also numpy compatibility of our operators and removed module: operators (deprecated) labels Oct 10, 2020
@maPiche
Copy link

maPiche commented Jun 27, 2022

I am surprised this bug is still silently active:

`def bernoulli_i(p, size):
return (torch.rand(size) < p).int()

c32 =bernoulli_i(0.5,32)
print(c32)
idx2 =[x == 1 for x in c32]
print(idx2.count(True))
print(c32[idx2])

c31 =bernoulli_i(0.5,31)
print(c31)
idx1 =[x == 1 for x in c1]
print(idx1.count(True))
print(c31[idx1])`

returns a tensor([], size=(0, s) for an index created with size s < 32

@v-liuwei
Copy link

v-liuwei commented Dec 3, 2022

On version 1.12.1+cu116, this bug still exists:
1b865a2c178a58deb9e60512b454997

To produce:

torch.tensor([1])[[torch.tensor(0)] * 31]  # raises an IndexError
torch.tensor([1])[[torch.tensor(0)] * 32]  # works fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority module: advanced indexing Related to x[i] = y, index functions module: error checking Bugs related to incorrect/lacking error checking module: numpy Related to numpy support, and also numpy compatibility of our operators module: ux triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

9 participants