Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V100 GPU -- RuntimeError: Unable to find a valid cuDNN algorithm to run convolution #18

Closed
monajalal opened this issue Jun 27, 2023 · 4 comments

Comments

@monajalal
Copy link

Could you please tell what is the minimum requirement for GPU (GPU compute capability or GPU names) for running your code?

I got this error after 72 minutes of running.

[2023-06-27 11:35:12.206] [warning] [FeatureManager.cpp:2690] _raw_matches found exsting pair (1668813085642941848, 1668813085474306823)                                                                           
[loftr_wrapper.py] image0: torch.Size([28, 1, 400, 400])                                                                                                                                                           
Traceback (most recent call last):                                                                                                                                                                                 
  File "run_custom.py", line 203, in <module>                                                                                                                                                                      
    run_one_video(video_dir=args.video_dir, out_folder=args.out_folder, use_segmenter=args.use_segmenter, use_gui=args.use_gui)                                                                                    
  File "run_custom.py", line 103, in run_one_video                                                                                                                                                                 
    tracker.run(color, depth, K, id_str, mask=mask, occ_mask=None, pose_in_model=pose_in_model)                                                                                                                    
  File "/home/azureuser/BundleSDF/bundlesdf.py", line 543, in run                                                                                                                                                  
    self.process_new_frame(frame)
  File "/home/azureuser/BundleSDF/bundlesdf.py", line 494, in process_new_frame
    self.find_corres(pairs)
  File "/home/azureuser/BundleSDF/bundlesdf.py", line 362, in find_corres
    corres = self.loftr.predict(rgbAs=imgs[::2], rgbBs=imgs[1::2])
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/azureuser/BundleSDF/loftr_wrapper.py", line 50, in predict
    self.matcher(tmp)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/azureuser/BundleSDF/BundleTrack/LoFTR/src/loftr/loftr.py", line 49, in forward
    feats_c, feats_f = self.backbone(torch.cat([data['image0'], data['image1']], dim=0))
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/azureuser/BundleSDF/BundleTrack/LoFTR/src/loftr/backbone/resnet_fpn.py", line 116, in forward
    x1_out = self.layer1_outconv2(x1_out+x2_out_2x)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
    input = module(input)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 447, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/opt/conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 443, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution
Process Process-2:
Traceback (most recent call last):
  File "/opt/conda/envs/py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/envs/py38/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/azureuser/BundleSDF/bundlesdf.py", line 89, in run_nerf
    join = p_dict['join']
  File "<string>", line 2, in __getitem__
  File "/opt/conda/envs/py38/lib/python3.8/multiprocessing/managers.py", line 835, in _callmethod

(py38) root@bundlesdf:/home/azureuser/BundleSDF# nvidia-smi
Tue Jun 27 12:02:42 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.182.03   Driver Version: 470.182.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  On   | 00000001:00:00.0 Off |                    0 |
| N/A   29C    P0    40W / 250W |   2729MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-PCIE...  On   | 00000002:00:00.0 Off |                    0 |
| N/A   27C    P0    25W / 250W |      4MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

@wenbowen123
Copy link
Collaborator

It seems your GPU RAM is not enough, see this. Better to try on a larger GPU, e.g. 3090 or above. Or you can resize the images to be smaller

@bkyCadida
Copy link

I think it would be helpful to update your ReadMe with minimum hardware requirements, since this is a serious obstacle for many people and it leads to not obvious errors.

@monajalal
Copy link
Author

@wenbowen123
Since we are interested in your project, we would like to buy a new GPU.
However, we have a preference to buy 4090 vs 3090 Ti. We didn't find any 4090 Ti for buying.
Could you please confirm if your code works with a 4090 (w 24GB VRAM)?
I found these two on Amazon and open to buy whatever 4090 that may work suggested by you.

Thanks a lot for your help.

Screenshot from 2023-06-28 09-36-06
Screenshot from 2023-06-28 09-36-14

Link 1

Link 2

@wenbowen123
Copy link
Collaborator

It has been tested on 3090 and 3090 Ti. In your case, I think either will work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants