V100 GPU -- RuntimeError: Unable to find a valid cuDNN algorithm to run convolution #18

monajalal · 2023-06-27T19:02:58Z

Could you please tell what is the minimum requirement for GPU (GPU compute capability or GPU names) for running your code?

I got this error after 72 minutes of running.

[2023-06-27 11:35:12.206] [warning] [FeatureManager.cpp:2690] _raw_matches found exsting pair (1668813085642941848, 1668813085474306823)                                                                           
[loftr_wrapper.py] image0: torch.Size([28, 1, 400, 400])                                                                                                                                                           
Traceback (most recent call last):                                                                                                                                                                                 
  File "run_custom.py", line 203, in <module>                                                                                                                                                                      
    run_one_video(video_dir=args.video_dir, out_folder=args.out_folder, use_segmenter=args.use_segmenter, use_gui=args.use_gui)                                                                                    
  File "run_custom.py", line 103, in run_one_video                                                                                                                                                                 
    tracker.run(color, depth, K, id_str, mask=mask, occ_mask=None, pose_in_model=pose_in_model)                                                                                                                    
  File "/home/azureuser/BundleSDF/bundlesdf.py", line 543, in run                                                                                                                                                  
    self.process_new_frame(frame)
  File "/home/azureuser/BundleSDF/bundlesdf.py", line 494, in process_new_frame
    self.find_corres(pairs)
  File "/home/azureuser/BundleSDF/bundlesdf.py", line 362, in find_corres
    corres = self.loftr.predict(rgbAs=imgs[::2], rgbBs=imgs[1::2])
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/azureuser/BundleSDF/loftr_wrapper.py", line 50, in predict
    self.matcher(tmp)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/azureuser/BundleSDF/BundleTrack/LoFTR/src/loftr/loftr.py", line 49, in forward
    feats_c, feats_f = self.backbone(torch.cat([data['image0'], data['image1']], dim=0))
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/azureuser/BundleSDF/BundleTrack/LoFTR/src/loftr/backbone/resnet_fpn.py", line 116, in forward
    x1_out = self.layer1_outconv2(x1_out+x2_out_2x)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
    input = module(input)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 447, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 443, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution
Process Process-2:
Traceback (most recent call last):
  File "/opt/conda/envs/py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/envs/py38/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/azureuser/BundleSDF/bundlesdf.py", line 89, in run_nerf
    join = p_dict['join']
  File "<string>", line 2, in __getitem__
  File "/opt/conda/envs/py38/lib/python3.8/multiprocessing/managers.py", line 835, in _callmethod

(py38) root@bundlesdf:/home/azureuser/BundleSDF# nvidia-smi
Tue Jun 27 12:02:42 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.182.03   Driver Version: 470.182.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  On   | 00000001:00:00.0 Off |                    0 |
| N/A   29C    P0    40W / 250W |   2729MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-PCIE...  On   | 00000002:00:00.0 Off |                    0 |
| N/A   27C    P0    25W / 250W |      4MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

The text was updated successfully, but these errors were encountered:

wenbowen123 · 2023-06-28T02:12:40Z

It seems your GPU RAM is not enough, see this. Better to try on a larger GPU, e.g. 3090 or above. Or you can resize the images to be smaller

bkyCadida · 2023-06-28T12:00:21Z

I think it would be helpful to update your ReadMe with minimum hardware requirements, since this is a serious obstacle for many people and it leads to not obvious errors.

monajalal · 2023-06-28T13:39:30Z

@wenbowen123
Since we are interested in your project, we would like to buy a new GPU.
However, we have a preference to buy 4090 vs 3090 Ti. We didn't find any 4090 Ti for buying.
Could you please confirm if your code works with a 4090 (w 24GB VRAM)?
I found these two on Amazon and open to buy whatever 4090 that may work suggested by you.

Thanks a lot for your help.

Link 1

Link 2

wenbowen123 · 2023-06-28T23:39:27Z

It has been tested on 3090 and 3090 Ti. In your case, I think either will work.

wenbowen123 mentioned this issue Jun 28, 2023

near real-time?? #17

Closed

monajalal closed this as completed Jun 29, 2023

athnzc mentioned this issue May 15, 2024

GPU requirements to run Bundle SDF #155

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V100 GPU -- RuntimeError: Unable to find a valid cuDNN algorithm to run convolution #18

V100 GPU -- RuntimeError: Unable to find a valid cuDNN algorithm to run convolution #18

monajalal commented Jun 27, 2023

wenbowen123 commented Jun 28, 2023

bkyCadida commented Jun 28, 2023

monajalal commented Jun 28, 2023

wenbowen123 commented Jun 28, 2023

V100 GPU -- RuntimeError: Unable to find a valid cuDNN algorithm to run convolution #18

V100 GPU -- RuntimeError: Unable to find a valid cuDNN algorithm to run convolution #18

Comments

monajalal commented Jun 27, 2023

wenbowen123 commented Jun 28, 2023

bkyCadida commented Jun 28, 2023

monajalal commented Jun 28, 2023

wenbowen123 commented Jun 28, 2023