Skip to content

grid_sample returns corrupted tensor when grid.max() and grid.min() has huge difference #35202

@KoykL

Description

@KoykL

🐛 Bug

torch.nn.functional.grid_sample(input, grid, mode='bilinear', padding_mode='zeros', align_corners=None) returns corrupted tensor when grid.max() and grid.min() has very different value on GPU.

"Corrupted" meaning: attempt to read the returned tensor, such as printing the tensor will result in: RuntimeError: CUDA error: an illegal memory access was encountered.

The issue goes away when I clamp the values of grid to [-2, 2].

Maybe related to #24823.

To Reproduce

An example is that when grid.min() == -10059144 grid.max()==67680944, the returned tensor will have the behavior described above. And the issue disappears if I clamp grid to [-2, 2].

Example that can produce the error, minimized as much as I can:

import torch 
input_tensor = torch.rand(1, 1, 480, 640).cuda()
coords = torch.FloatTensor([[-10059144, 67680944], [67680944, 67680944]]).unsqueeze(0).unsqueeze(0).repeat(1, 1, 1, 1).cuda()
result = torch.nn.functional.grid_sample(input_tensor, coords)
print(result)
> python3 prod.py
/usr/lib/python3.8/site-packages/torch/nn/functional.py:2705: UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.
  warnings.warn("Default grid_sample and affine_grid behavior has changed "
THCudaCheck FAIL file=/build/python-pytorch/src/pytorch-1.4.0-opt-cuda/aten/src/THC/THCReduceAll.cuh line=327 error=700 : an illegal memory access was encountered
Traceback (most recent call last):
  File "prod.py", line 8, in <module>
    print(result)
  File "/usr/lib/python3.8/site-packages/torch/tensor.py", line 159, in __repr__
    return torch._tensor_str._str(self)
  File "/usr/lib/python3.8/site-packages/torch/_tensor_str.py", line 311, in _str
    tensor_str = _tensor_str(self, indent)
  File "/usr/lib/python3.8/site-packages/torch/_tensor_str.py", line 209, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/usr/lib/python3.8/site-packages/torch/_tensor_str.py", line 87, in __init__
    nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: cuda runtime error (700) : an illegal memory access was encountered at /build/python-pytorch/src/pytorch-1.4.0-opt-cuda/aten/src/THC/THCReduceAll.cuh:327

Expected behavior

grid_sample should not cause illegal memory access error no matter what is the input.

Environment

PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.2

OS: Arch Linux
GCC version: (Arch Linux 9.3.0-1) 9.3.0
CMake version: version 3.17.0

Python version: 3.8
Is CUDA available: Yes
CUDA runtime version: 10.2.89
GPU models and configuration:
GPU 0: GeForce RTX 2080 Ti
GPU 1: GeForce RTX 2080 Ti

Nvidia driver version: 440.59
cuDNN version: /usr/lib/libcudnn.so.7.6.5

Versions of relevant libraries:
[pip3] numpy==1.18.2
[pip3] torch==1.4.0
[pip3] torchvision==0.5.0
[conda] Could not collect

Additional context

The pytorch is installed on archlinux inside nvidia-container-toolkit. The host machine is debian buster.

cc @ezyang @gchanan @zou3519 @ngimel

Metadata

Metadata

Assignees

Labels

high prioritymodule: cudaRelated to torch.cuda, and CUDA support in generaltriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions