Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiled successfully with torch==1.1 installed by pip #24

Closed
youmi-zym opened this issue Jul 6, 2019 · 10 comments
Closed

Compiled successfully with torch==1.1 installed by pip #24

youmi-zym opened this issue Jul 6, 2019 · 10 comments

Comments

@youmi-zym
Copy link

Actually, my torch is installed with pip, and successfully compile GANet and sync_bn module!
Here is my env:
PyTorch version: 1.1.0
Is debug build: No
CUDA used to build PyTorch: 10.0.130

OS: Ubuntu 16.04.3 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
CMake version: version 3.10.0

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 7.5.17
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
GPU 2: GeForce GTX 1080 Ti
GPU 3: TITAN X (Pascal)

Nvidia driver version: 430.09

@cuizelu
Copy link

cuizelu commented Jul 7, 2019

When i do "sh compile.sh", the error: command '/usr/local/cuda-10.0/bin/bin/nvcc' failed with exit status 1**** .My $PATH is :
export CUDA_HOME=/usr/local/cuda-10.0/bin
export LD_LIBRARTY_PATH=/usr/local/cuda-10.0/lib64
export PATH="$PATH:$LD_LIBRARY_PATH:$CUDA_HOME"\

My env is same to yours, I want to know your $PATH and do you have the same error?

@youmi-zym
Copy link
Author

@cuizelu I'm sure my $PATH is the same as yours.
But can you provide more env info with the code below:

from torch.utils.collect_env import get_pretty_env_info
print(get_pretty_env_info())

@feihuzhang
Copy link
Owner

When i do "sh compile.sh", the error: command '/usr/local/cuda-10.0/bin/bin/nvcc' failed with exit status 1**** .My $PATH is :
export CUDA_HOME=/usr/local/cuda-10.0/bin
export LD_LIBRARTY_PATH=/usr/local/cuda-10.0/lib64
export PATH="$PATH:$LD_LIBRARY_PATH:$CUDA_HOME"\

My env is same to yours, I want to know your $PATH and do you have the same error?

Your environment variable settings are wrong. You can follow examples in "compile.sh".
$CUDA_HOME is the folder path where you implement your cuda.
$PATH is the path for your own or system bin files. $PATH should not point to library files or cuda folders.

export LD_LIBRARY_PATH="/home/feihu/anaconda3/lib:$LD_LIBRARY_PATH"
export LD_INCLUDE_PATH="/home/feihu/anaconda3/include:$LD_INCLUDE_PATH"
export CUDA_HOME="/usr/local/cuda-10.0"
export PATH="/home/feihu/anaconda3/bin:/usr/local/cuda-10.0/bin:$PATH"
export CPATH="/usr/local/cuda-10.0/include"
export CUDNN_INCLUDE_DIR="/usr/local/cuda-10.0/include"
export CUDNN_LIB_DIR="/usr/local/cuda-10.0/lib64"

@feihuzhang
Copy link
Owner

Actually, my torch is installed with pip, and successfully compile GANet and sync_bn module!
Here is my env:
PyTorch version: 1.1.0
Is debug build: No
CUDA used to build PyTorch: 10.0.130

OS: Ubuntu 16.04.3 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
CMake version: version 3.10.0

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 7.5.17
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
GPU 2: GeForce GTX 1080 Ti
GPU 3: TITAN X (Pascal)

Nvidia driver version: 430.09

Good to know this.
Did you try to run the code for training and testing?
I hope everything also goes well.
Compiling from source is mainly to avoid lib conflicts.

So we could generalize this to others for easy implementation.
Compiling pytorch is really time-consuming.
Thank you so much for sharing your case.

@youmi-zym
Copy link
Author

youmi-zym commented Jul 8, 2019

I have tried to run the code, here is the running info:

Namespace(batchSize=8, crop_height=240, crop_width=528, cuda=1, data_path='/home/youmin/data/StereoMatching/SceneFlow/', in_path=None, job_name=None, kitti=0, kitti2015=0, left_right=0, lr=0.001, max_disp=192, nEpochs=11, out_path=None, pretrained_path=None, resume='', save_path='/home/youmin/exps/GANet/clean-test', seed=123, shift=0, testBatchSize=1, threads=16, training_list='/home/youmin/data/annotations/SceneFlow/cleanpass_train.json', val_list='/home/youmin/data/annotations/SceneFlow/cleanpass_test.json')
===> Loading datasets
===> Building model
0.001
===> Epoch[1](0/4431): Loss: 127.7617, Error: (68.4038 59.5425 79.2540)
===> Epoch[1](1/4431): Loss: 90.6924, Error: (46.6843 48.0715 53.4087)
===> Epoch[1](2/4431): Loss: 85.0300, Error: (49.5175 50.6676 45.6224)
===> Epoch[1](3/4431): Loss: 67.8248, Error: (46.5083 42.8192 33.7256)
===> Epoch[1](4/4431): Loss: 67.2712, Error: (45.5687 40.1456 34.9652)
===> Epoch[1](5/4431): Loss: 66.7984, Error: (46.9146 39.7734 34.4440)
===> Epoch[1](6/4431): Loss: 55.6109, Error: (41.5324 33.2189 28.2651)

It goes well for training.
And evaluating on the testing dataset works too.

@lizolson
Copy link

Heads up for Docker users, I had success with pytorch/pytorch:1.1.0-cuda10.0-cudnn7.5-devel

@UESTCtubiao
Copy link

I have tried to run the code, here is the running info:

Namespace(batchSize=8, crop_height=240, crop_width=528, cuda=1, data_path='/home/youmin/data/StereoMatching/SceneFlow/', in_path=None, job_name=None, kitti=0, kitti2015=0, left_right=0, lr=0.001, max_disp=192, nEpochs=11, out_path=None, pretrained_path=None, resume='', save_path='/home/youmin/exps/GANet/clean-test', seed=123, shift=0, testBatchSize=1, threads=16, training_list='/home/youmin/data/annotations/SceneFlow/cleanpass_train.json', val_list='/home/youmin/data/annotations/SceneFlow/cleanpass_test.json')
===> Loading datasets
===> Building model
0.001
===> Epoch[1](0/4431): Loss: 127.7617, Error: (68.4038 59.5425 79.2540)
===> Epoch[1](1/4431): Loss: 90.6924, Error: (46.6843 48.0715 53.4087)
===> Epoch[1](2/4431): Loss: 85.0300, Error: (49.5175 50.6676 45.6224)
===> Epoch[1](3/4431): Loss: 67.8248, Error: (46.5083 42.8192 33.7256)
===> Epoch[1](4/4431): Loss: 67.2712, Error: (45.5687 40.1456 34.9652)
===> Epoch[1](5/4431): Loss: 66.7984, Error: (46.9146 39.7734 34.4440)
===> Epoch[1](6/4431): Loss: 55.6109, Error: (41.5324 33.2189 28.2651)

It goes well for training.
And evaluating on the testing dataset works too.

Hello, I am running like you, what do the three average error rates mean? Why am I wrong with the test data?

@UESTCtubiao
Copy link

I have tried to run the code, here is the running info:

Namespace(batchSize=8, crop_height=240, crop_width=528, cuda=1, data_path='/home/youmin/data/StereoMatching/SceneFlow/', in_path=None, job_name=None, kitti=0, kitti2015=0, left_right=0, lr=0.001, max_disp=192, nEpochs=11, out_path=None, pretrained_path=None, resume='', save_path='/home/youmin/exps/GANet/clean-test', seed=123, shift=0, testBatchSize=1, threads=16, training_list='/home/youmin/data/annotations/SceneFlow/cleanpass_train.json', val_list='/home/youmin/data/annotations/SceneFlow/cleanpass_test.json')
===> Loading datasets
===> Building model
0.001
===> Epoch[1](0/4431): Loss: 127.7617, Error: (68.4038 59.5425 79.2540)
===> Epoch[1](1/4431): Loss: 90.6924, Error: (46.6843 48.0715 53.4087)
===> Epoch[1](2/4431): Loss: 85.0300, Error: (49.5175 50.6676 45.6224)
===> Epoch[1](3/4431): Loss: 67.8248, Error: (46.5083 42.8192 33.7256)
===> Epoch[1](4/4431): Loss: 67.2712, Error: (45.5687 40.1456 34.9652)
===> Epoch[1](5/4431): Loss: 66.7984, Error: (46.9146 39.7734 34.4440)
===> Epoch[1](6/4431): Loss: 55.6109, Error: (41.5324 33.2189 28.2651)

It goes well for training.
And evaluating on the testing dataset works too.

Hello, I am running like you, what do the three average error rates mean? Why am I wrong with the test data?

What should be the final training and prediction results?

@youmi-zym
Copy link
Author

@UESTCtubiao
For training:
There are three disparity outputs from the network, and each will calculate an average error.
For testing:
There is only one disparity output, thus only an error is calculated.
As for the final training and prediction results, I think the readme file has claimed clearly.

@UESTCtubiao
Copy link

Thank you very much,i got the final results, i understand what you mean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants