RuntimeError: CUDA error: an illegal memory access was encountered with channels_last

I get an illegal memory access when trying to train mnasnet (any version) with apex (O1) and channels_last



## To Reproduce

Steps to reproduce the behavior:

use the apex imagenet example:

python -m torch.distributed.launch --nproc_per_node=2 main_amp.py -a=mnasnet1_3 --b 224 --workers 4 --channels-last=True --opt-level=O1  -b=256 /intel_nvme/imagenet_data/


Traceback (most recent call last):
  File "main_amp.py", line 542, in <module>
    main()
  File "main_amp.py", line 247, in main
    train(train_loader, model, criterion, optimizer, epoch)
  File "main_amp.py", line 353, in train
    scaled_loss.backward()
  File "/home/tstand/anaconda3/lib/python3.7/contextlib.py", line 119, in __exit__
    next(self.gen)
  File "/home/tstand/anaconda3/lib/python3.7/site-packages/apex/amp/handle.py", line 123, in scale_loss
    optimizer._post_amp_backward(loss_scaler)
  File "/home/tstand/anaconda3/lib/python3.7/site-packages/apex/amp/_process_optimizer.py", line 249, in post_backward_no_master_weights
    post_backward_models_are_masters(scaler, params, stashed_grads)
  File "/home/tstand/anaconda3/lib/python3.7/site-packages/apex/amp/_process_optimizer.py", line 135, in post_backward_models_are_masters
    scale_override=(grads_have_scale, stashed_have_scale, out_scale))
  File "/home/tstand/anaconda3/lib/python3.7/site-packages/apex/amp/scaler.py", line 184, in unscale_with_stashed
    out_scale/stashed_have_scale)
  File "/home/tstand/anaconda3/lib/python3.7/site-packages/apex/amp/scaler.py", line 148, in unscale_with_stashed_python
    self.dynamic)
  File "/home/tstand/anaconda3/lib/python3.7/site-packages/apex/amp/scaler.py", line 22, in axpby_check_overflow_python
    cpu_sum = float(model_grad.float().sum())
RuntimeError: CUDA error: an illegal memory access was encountered
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: an illegal memory access was encountered (insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:771)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x46 (0x7f5827507536 in /home/tstand/.local/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x7ae (0x7f582774afbe in /home/tstand/.local/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f58274f7abd in /home/tstand/.local/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: <unknown function> + 0x5236b2 (0x7f58732c06b2 in /home/tstand/.local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x523756 (0x7f58732c0756 in /home/tstand/.local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x19dfce (0x55748c40bfce in /home/tstand/anaconda3/bin/python)
frame #6: <unknown function> + 0x103948 (0x55748c371948 in /home/tstand/anaconda3/bin/python)
frame #7: <unknown function> + 0x114267 (0x55748c382267 in /home/tstand/anaconda3/bin/python)
frame #8: <unknown function> + 0x11427d (0x55748c38227d in /home/tstand/anaconda3/bin/python)
frame #9: <unknown function> + 0x11427d (0x55748c38227d in /home/tstand/anaconda3/bin/python)
frame #10: PyDict_SetItem + 0x502 (0x55748c3cd602 in /home/tstand/anaconda3/bin/python)
frame #11: PyDict_SetItemString + 0x4f (0x55748c3ce0cf in /home/tstand/anaconda3/bin/python)
frame #12: PyImport_Cleanup + 0x9e (0x55748c40d91e in /home/tstand/anaconda3/bin/python)
frame #13: Py_FinalizeEx + 0x67 (0x55748c483367 in /home/tstand/anaconda3/bin/python)
frame #14: <unknown function> + 0x227d93 (0x55748c495d93 in /home/tstand/anaconda3/bin/python)
frame #15: _Py_UnixMain + 0x3c (0x55748c4960bc in /home/tstand/anaconda3/bin/python)
frame #16: __libc_start_main + 0xf3 (0x7f5875ba81e3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #17: <unknown function> + 0x1d0990 (0x55748c43e990 in /home/tstand/anaconda3/bin/python)

THCudaCheck FAIL file=/pytorch/aten/src/THC/THCCachingHostAllocator.cpp line=278 error=700 : an illegal memory access was encountered
Traceback (most recent call last):
  File "main_amp.py", line 542, in <module>
    main()
  File "main_amp.py", line 247, in main
    train(train_loader, model, criterion, optimizer, epoch)
  File "main_amp.py", line 353, in train
    scaled_loss.backward()
  File "/home/tstand/anaconda3/lib/python3.7/contextlib.py", line 119, in __exit__
    next(self.gen)
  File "/home/tstand/anaconda3/lib/python3.7/site-packages/apex/amp/handle.py", line 123, in scale_loss
    optimizer._post_amp_backward(loss_scaler)
  File "/home/tstand/anaconda3/lib/python3.7/site-packages/apex/amp/_process_optimizer.py", line 249, in post_backward_no_master_weights
    post_backward_models_are_masters(scaler, params, stashed_grads)
  File "/home/tstand/anaconda3/lib/python3.7/site-packages/apex/amp/_process_optimizer.py", line 135, in post_backward_models_are_masters
    scale_override=(grads_have_scale, stashed_have_scale, out_scale))
  File "/home/tstand/anaconda3/lib/python3.7/site-packages/apex/amp/scaler.py", line 184, in unscale_with_stashed
    out_scale/stashed_have_scale)
  File "/home/tstand/anaconda3/lib/python3.7/site-packages/apex/amp/scaler.py", line 148, in unscale_with_stashed_python
    self.dynamic)
  File "/home/tstand/anaconda3/lib/python3.7/site-packages/apex/amp/scaler.py", line 22, in axpby_check_overflow_python
    cpu_sum = float(model_grad.float().sum())
RuntimeError: CUDA error: an illegal memory access was encountered
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: an illegal memory access was encountered (insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:771)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x46 (0x7f911c250536 in /home/tstand/.local/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x7ae (0x7f911c493fbe in /home/tstand/.local/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f911c240abd in /home/tstand/.local/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: <unknown function> + 0x5236b2 (0x7f91680096b2 in /home/tstand/.local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x523756 (0x7f9168009756 in /home/tstand/.local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x19dfce (0x5599d63f8fce in /home/tstand/anaconda3/bin/python)
frame #6: <unknown function> + 0x103948 (0x5599d635e948 in /home/tstand/anaconda3/bin/python)
frame #7: <unknown function> + 0x114267 (0x5599d636f267 in /home/tstand/anaconda3/bin/python)
frame #8: <unknown function> + 0x11427d (0x5599d636f27d in /home/tstand/anaconda3/bin/python)
frame #9: <unknown function> + 0x11427d (0x5599d636f27d in /home/tstand/anaconda3/bin/python)
frame #10: PyDict_SetItem + 0x502 (0x5599d63ba602 in /home/tstand/anaconda3/bin/python)
frame #11: PyDict_SetItemString + 0x4f (0x5599d63bb0cf in /home/tstand/anaconda3/bin/python)
frame #12: PyImport_Cleanup + 0x9e (0x5599d63fa91e in /home/tstand/anaconda3/bin/python)
frame #13: Py_FinalizeEx + 0x67 (0x5599d6470367 in /home/tstand/anaconda3/bin/python)
frame #14: <unknown function> + 0x227d93 (0x5599d6482d93 in /home/tstand/anaconda3/bin/python)
frame #15: _Py_UnixMain + 0x3c (0x5599d64830bc in /home/tstand/anaconda3/bin/python)
frame #16: __libc_start_main + 0xf3 (0x7f916a8f11e3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #17: <unknown function> + 0x1d0990 (0x5599d642b990 in /home/tstand/anaconda3/bin/python)


Collecting environment information...
PyTorch version: 1.5.0
Is debug build: No
CUDA used to build PyTorch: 10.2

OS: Ubuntu 19.10
GCC version: (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008
CMake version: Could not collect

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: Could not collect (not installed outside pytorch)
GPU models and configuration: 
GPU 0: TITAN RTX
GPU 1: TITAN RTX

Nvidia driver version: 440.82
cuDNN version: Could not collect (not installed outside pytorch)

Versions of relevant libraries:
[pip3] numpy==1.18.3
[pip3] torch==1.5.0
[pip3] torchvision==0.6.0
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               10.2.89              hfd86e86_0  
[conda] mkl                       2020.0                      166  
[conda] mkl-service               2.3.0            py37he904b0f_0  
[conda] mkl_fft                   1.0.15           py37ha843d7b_0  
[conda] mkl_random                1.1.0            py37hd6b4f25_0  
[conda] numpy                     1.18.1           py37h4f9e942_0  
[conda] numpy-base                1.18.1           py37hde5b4d6_1  
[conda] numpydoc                  0.9.2                      py_0    conda-forge
[conda] pytorch                   1.5.0           py3.7_cuda10.2.89_cudnn7.6.5_0    pytorch
[conda] torchvision               0.6.0                py37_cu102    pytorch



cc @ezyang @gchanan @zou3519 @bdhirsh @heitorschueroff @seemethere @malfet @walterddr @ngimel @csarofeen @ptrblck

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RuntimeError: CUDA error: an illegal memory access was encountered with channels_last #37449

To Reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RuntimeError: CUDA error: an illegal memory access was encountered with channels_last #37449

Description

To Reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions