If you are running into problems with TensorFlow #173

chogovadze · 2020-10-27T22:22:12Z

Hello everyone,
It seems that several users are reporting the same kind of obstacles with regards to training/predicting.
After research, this problem appears to be a compatibility issue of old versions of tensorflow 1.x and newer GPUs when installing through pip. Compiling tensorflow from source resolves this issue however it is very time-consuming. I hope this write up could help other users that are having trouble with their environment.

This method requires the use of conda.

Create a new conda environment and simply run: conda install tensorflow-gpu=1.12 (conda will automatically pull the correct cuda/cudnn versions).
Once installation is complete, remove the tensorflow-gpu==1.12 from requirement.txt and run the makefile.
Change all batch_size and eval_batch_size in the config files to 1.
Finally run export TF_FORCE_GPU_ALLOW_GROWTH=true followed by export TMPDIR=/tmp/ in your current terminal session.

If you are still having issues be sure that you have NOT:

Used an old conda environment with cuda/cudnn already configured.
Installed cuda/cudnn separately with the command conda install cudnn=x.x.x=cudax.x_x.
Run the makefile within the new conda environment before the aforementioned steps, thus installing tensorflow through pip.

References from:

I have successfully worked with this repository with the following setup:

Ubuntu 18.04
Ryzen 3700
GTX 2070s (8GB)

If you are still having some issues, please do not hesitate to reach out.

The text was updated successfully, but these errors were encountered:

paragghosh · 2021-08-10T20:37:15Z

@chogovadze , Thanks for outlining the steps here. I was having the same issues described here and followed the steps to fix the TF version and CUDA version incompatibility. After finishing these steps I got an error when I tried to run superpoint (script export_detections.py):
ImportError: No module named 'superpoint'
Following the thread #206 I did another round of make install. It finished fine but I am still getting the same error. Any ideas?

paragghosh · 2021-08-10T23:01:42Z

I realized my error - I was pointing to my earlier venv in the makefile. After I removed that, I reran make install (which reinstalled superpoint). However, now I am getting the following error when I try to run the export_detections.py script:
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

David-Willo · 2023-10-20T15:22:23Z

For those who have difficulties running on GPUs that can't match lower version CUDA (3080 in my case),
try switching to NVIDIA's TensorFlow repo https://github.com/NVIDIA/tensorflow#install
this solves my issue.

iMeleon · 2023-10-27T08:06:34Z

For those who have difficulties running on GPUs that can't match lower version CUDA (3080 in my case), try switching to NVIDIA's TensorFlow repo https://github.com/NVIDIA/tensorflow#install this solves my issue.

Thanks. Solve my issue with loss nan, precision nan, recall 0.0000 on RTX 3090.

20181313zhang · 2024-02-12T11:55:05Z

对于那些在无法与较低版本的 CUDA（就我而言为 3080）相匹配的 GPU 上运行困难的人，请尝试切换到 NVIDIA 的 TensorFlow 存储库 https://github.com/NVIDIA/tensorflow#install 这样可以解决我的问题。

谢谢。解决我在 RTX 3090 上的损失 nan、精度 nan、召回 0.0000 的问题。

你好,我的是RTX3080Ti,请问你的训练成功了吗?希望可以联系一下,可以相互学习学习,感谢

vegetable233 · 2024-08-22T08:20:10Z

对于那些在无法与较低版本的 CUDA（就我而言为 3080）相匹配的 GPU 上运行困难的人，请尝试切换到 NVIDIA 的 TensorFlow 存储库 https://github.com/NVIDIA/tensorflow#install 这样可以解决我的问题。

谢谢。解决我在 RTX 3090 上的损失 nan、精度 nan、召回 0.0000 的问题。

你好,我的是RTX3080Ti,请问你的训练成功了吗?希望可以联系一下,可以相互学习学习,感谢

我在训练magicpoint的时候也遇到了loss nan的问题，请问您解决了吗？可以加QQ 972048746联系一下

rpautrat mentioned this issue Nov 19, 2020

version of cudnn #179

Closed

rpautrat mentioned this issue Dec 15, 2020

The Problem about Training MagicPoint on Synthetic Shapes #183

Closed

aelsaer mentioned this issue Jan 30, 2021

Training on Synthetic Shapes (loss nan, precision nan, recall 0.0000) #189

Closed

jack-turkey mentioned this issue Mar 16, 2021

Magic Point (and hence Superpoint) does not train well #194

Closed

popovata mentioned this issue Apr 17, 2021

HPatches homography estimation #202

Closed

This was referenced Aug 1, 2021

Check failed: cusolverDnCreate(&cusolver_dn_handle) == CUSOLVER_STATUS_SUCCESS Failed to create cuSolverDN instance. #225

Closed

About Synthetic shapes? #118

Closed

rpautrat mentioned this issue Oct 28, 2021

Problem in training…… #238

Closed

akshadshyam mentioned this issue Nov 19, 2021

Superpoint evaluation error #239

Closed

rpautrat mentioned this issue May 2, 2022

cuda version #250

Closed

rpautrat mentioned this issue Dec 5, 2022

About Step1 Training MagicPoint on Synthetic Shapes Problem #282

Open

This was referenced Jul 12, 2023

Error encountered while running step1 #213

Closed

Error encountered while running step1：loss nan, precision nan, recall 0.0000 #296

Closed

This was referenced Jul 20, 2023

from superpoint.settings import EXPER_PATH #297

Closed

step2 error： #299

Closed

rpautrat mentioned this issue Aug 22, 2024

Training MagicPoint on Synthetic Shape related issues #327

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

If you are running into problems with TensorFlow #173

If you are running into problems with TensorFlow #173

chogovadze commented Oct 27, 2020 •

edited

Loading

paragghosh commented Aug 10, 2021 •

edited

Loading

paragghosh commented Aug 10, 2021

David-Willo commented Oct 20, 2023

iMeleon commented Oct 27, 2023 •

edited

Loading

20181313zhang commented Feb 12, 2024

vegetable233 commented Aug 22, 2024

If you are running into problems with TensorFlow #173

If you are running into problems with TensorFlow #173

Comments

chogovadze commented Oct 27, 2020 • edited Loading

paragghosh commented Aug 10, 2021 • edited Loading

paragghosh commented Aug 10, 2021

David-Willo commented Oct 20, 2023

iMeleon commented Oct 27, 2023 • edited Loading

20181313zhang commented Feb 12, 2024

vegetable233 commented Aug 22, 2024

chogovadze commented Oct 27, 2020 •

edited

Loading

paragghosh commented Aug 10, 2021 •

edited

Loading

iMeleon commented Oct 27, 2023 •

edited

Loading