THCudaCheck FAIL invalid device function #1351

selcukyazar · 2023-02-05T11:49:37Z

❓ Questions and Help

Hi,

I try to use https://github.com/fregu856/ebms_regression repo with docker file. I think everything is ok, (Ubuntu 16.04 host file docker image run with success) I know this is very old but any chance to solve it?

Regards.

2023-01-31 14:55:41,056 maskrcnn_benchmark INFO: Saving config into: /root/ebms_regression/detection/checkpoints/nce+/config.yml
2023-01-31 15:13:17,753 maskrcnn_benchmark INFO: Using 1 GPUs
2023-01-31 15:13:17,753 maskrcnn_benchmark INFO: Namespace(config_file='configs/nce+_train.yaml', distributed=False, local_rank=0, opts=[], skip_test=False)
2023-01-31 15:13:17,753 maskrcnn_benchmark INFO: Collecting env info (might take some time)
2023-01-31 15:13:18,383 maskrcnn_benchmark INFO:
PyTorch version: 1.0.0.dev20190401
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: Ubuntu 16.04.5 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
CMake version: version 3.14.20190401-g3e12

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 9.0.176
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU
Nvidia driver version: 512.78
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.4.2

Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] torch-nightly==1.0.0.dev20190401
[pip3] torchvision-nightly==0.2.1
.......................................
Selected optimization level O0: Pure FP32 training.

Defaults for this optimization level are:
enabled : True
opt_level : O0
cast_model_type : torch.float32
patch_torch_functions : False
keep_batchnorm_fp32 : None
master_weights : False
loss_scale : 1.0
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O0
cast_model_type : torch.float32
patch_torch_functions : False
keep_batchnorm_fp32 : None
master_weights : False
loss_scale : 1.0
2023-02-05 11:30:55,867 maskrcnn_benchmark.utils.checkpoint INFO: Loading checkpoint from /root/ebms_regression/detection/pretrained_models/e2e_faster_R-50-FPN_1x.pkl
2023-02-05 11:30:56,796 maskrcnn_benchmark.utils.c2_model_loading INFO: Remapping C2 weights
2023-02-05 11:30:56,796 maskrcnn_benchmark.utils.c2_model_loading INFO: C2 name: bbox_pred_b mapped name: bbox_pred.bias
2023-02-05 11:30:56,797 maskrcnn_benchmark.utils.c2_model_loading INFO: C2 name: bbox_pred_w mapped name: bbox_pred.weight
...........
creating index...
index created!
2023-02-05 10:49:27,779 maskrcnn_benchmark.utils.miscellaneous INFO: Saving labels mapping into /root/ebms_regression/detection/checkpoints/nce+/labels.json
2023-02-05 10:49:27,781 maskrcnn_benchmark.trainer INFO: Start training
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=383 error=8 : invalid device function

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

THCudaCheck FAIL invalid device function #1351

THCudaCheck FAIL invalid device function #1351

selcukyazar commented Feb 5, 2023

THCudaCheck FAIL invalid device function #1351

THCudaCheck FAIL invalid device function #1351

Comments

selcukyazar commented Feb 5, 2023

❓ Questions and Help