-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Description
🐛 Bug
torch.nn.functional.grid_sample(input, grid, mode='bilinear', padding_mode='zeros', align_corners=None) returns corrupted tensor when grid.max() and grid.min() has very different value on GPU.
"Corrupted" meaning: attempt to read the returned tensor, such as printing the tensor will result in: RuntimeError: CUDA error: an illegal memory access was encountered.
The issue goes away when I clamp the values of grid to [-2, 2].
Maybe related to #24823.
To Reproduce
An example is that when grid.min() == -10059144 grid.max()==67680944, the returned tensor will have the behavior described above. And the issue disappears if I clamp grid to [-2, 2].
Example that can produce the error, minimized as much as I can:
import torch
input_tensor = torch.rand(1, 1, 480, 640).cuda()
coords = torch.FloatTensor([[-10059144, 67680944], [67680944, 67680944]]).unsqueeze(0).unsqueeze(0).repeat(1, 1, 1, 1).cuda()
result = torch.nn.functional.grid_sample(input_tensor, coords)
print(result)
> python3 prod.py
/usr/lib/python3.8/site-packages/torch/nn/functional.py:2705: UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.
warnings.warn("Default grid_sample and affine_grid behavior has changed "
THCudaCheck FAIL file=/build/python-pytorch/src/pytorch-1.4.0-opt-cuda/aten/src/THC/THCReduceAll.cuh line=327 error=700 : an illegal memory access was encountered
Traceback (most recent call last):
File "prod.py", line 8, in <module>
print(result)
File "/usr/lib/python3.8/site-packages/torch/tensor.py", line 159, in __repr__
return torch._tensor_str._str(self)
File "/usr/lib/python3.8/site-packages/torch/_tensor_str.py", line 311, in _str
tensor_str = _tensor_str(self, indent)
File "/usr/lib/python3.8/site-packages/torch/_tensor_str.py", line 209, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/usr/lib/python3.8/site-packages/torch/_tensor_str.py", line 87, in __init__
nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: cuda runtime error (700) : an illegal memory access was encountered at /build/python-pytorch/src/pytorch-1.4.0-opt-cuda/aten/src/THC/THCReduceAll.cuh:327
Expected behavior
grid_sample should not cause illegal memory access error no matter what is the input.
Environment
PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.2
OS: Arch Linux
GCC version: (Arch Linux 9.3.0-1) 9.3.0
CMake version: version 3.17.0
Python version: 3.8
Is CUDA available: Yes
CUDA runtime version: 10.2.89
GPU models and configuration:
GPU 0: GeForce RTX 2080 Ti
GPU 1: GeForce RTX 2080 Ti
Nvidia driver version: 440.59
cuDNN version: /usr/lib/libcudnn.so.7.6.5
Versions of relevant libraries:
[pip3] numpy==1.18.2
[pip3] torch==1.4.0
[pip3] torchvision==0.5.0
[conda] Could not collect
Additional context
The pytorch is installed on archlinux inside nvidia-container-toolkit. The host machine is debian buster.