Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

forward compatibility was attempted on non supported HW #40671

Closed
drozzy opened this issue Jun 28, 2020 · 11 comments
Closed

forward compatibility was attempted on non supported HW #40671

drozzy opened this issue Jun 28, 2020 · 11 comments

Comments

@drozzy
Copy link

drozzy commented Jun 28, 2020

🐛 Bug

Trying to run training with pytorch after recent ubuntu update (it used to work before), results in:

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1591914855613/work/aten/src/THC/THCGeneral.cpp line=47 error=804 : forward compatibility was attempted on non supported HW
Traceback (most recent call last):
  File "fc_model_sub_episodes.py", line 278, in <module>
    main()
  File "fc_model_sub_episodes.py", line 101, in main
    predict(module, module.test_dataloader(), num=3)
  File "fc_model_sub_episodes.py", line 264, in predict
    states, actions = xb['states'].cuda(), xb['actions'].cuda()
  File "/home/andriy/miniconda3/envs/my_proj/lib/python3.7/site-packages/torch/cuda/__init__.py", line 153, in _lazy_init
    torch._C._cuda_init()
RuntimeError: cuda runtime error (804) : forward compatibility was attempted on non supported HW at /opt/conda/conda-bld/pytorch_1591914855613/work/aten/src/THC/THCGeneral.cpp:47

I have no idea what's wrong. The only thing I can think of is that I ran some updates in Ubuntu. It was literally working yesterday...
I'm using Ubuntu 20.04 LTS.
I'm using conda, and here is my complete env file:

name: my_proj
channels:
  - conda-forge
  - pytorch
  - defaults
dependencies:
  - ipywidgets
  - altair 
  - vega_datasets
  - python=3.7
  - numpy
  - jupyterlab
  - h5py
  - matplotlib
  - pytorch
  - torchvision
  - pydantic
  - pip
  - pytest
  - scikit-learn
  - pip:    
    - wandb
    - pytorch-lightning
    - tqdm
    - tensorflow
@ppershing
Copy link

ppershing commented Jun 28, 2020

Same thing here. Running Ubuntu 20, I know it was working about a week ago, I did nothing with pytorch but Ubuntu probably installed updates and now pytorch shows the same error (just on line 50, probably different pytorch version).

However, this seems to be general problem with nvidia and not pytorch as nvidia-smi returns an error Failed to initialize NVML: Driver/library version mismatch.

@drozzy
Copy link
Author

drozzy commented Jun 28, 2020

Yes, I also get Failed to initialize NVML: Driver/library version mismatch for nvidia-smi.

@usamec
Copy link

usamec commented Jun 28, 2020

Just reboot or reset nvidia drivers.
https://stackoverflow.com/a/45319156/1391392

@drozzy
Copy link
Author

drozzy commented Jun 28, 2020

Rebooting fixed the problem. Thanks!

@drozzy drozzy closed this as completed Jun 28, 2020
@lidorshimoni
Copy link

lidorshimoni commented Oct 15, 2020

Rebooting fixed the issue for me as well

@cristyioan2000
Copy link

cristyioan2000 commented Jan 13, 2021

Rebooting fixed the issue, thanks!

@gemfield
Copy link
Contributor

gemfield commented Feb 1, 2021

Apart from above, I met this problem when building pytorch master based on the latest cuda11.1.
The build is ok when based on cuda 11.0

@moncio
Copy link

moncio commented Feb 6, 2021

Hi, I'm using docker image base: nvidia/cuda:11.1.1-cudnn8-devel-ubuntu20.04, it shows the same error but when I build with Cuda 11.0 it's all ok. Is there supporting pytorch with Cuda 11.1?

Thanks!

@gemfield
Copy link
Contributor

gemfield commented Apr 1, 2021

@moncio You can have a look at https://zhuanlan.zhihu.com/p/361545761

@rajb245
Copy link

rajb245 commented Aug 24, 2021

I once got this when I had nvidia driver 460 on the machine, and was running cuda 11.3 inside a docker. When the host outside the docker is running a driver that doesn't support the cuda version inside the container, you can see this message if you don't have a datacenter/Tesla GPU.

@taeyoungYoo
Copy link

taeyoungYoo commented Oct 20, 2021

still stuck on this prob
I'm trying to run just simple hello.cu code on docker, driver version 460 and cuda version 11.3 but having problem with this error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants