Compiled successfully with torch==1.1 installed by pip #24

youmi-zym · 2019-07-06T06:18:42Z

Actually, my torch is installed with pip, and successfully compile GANet and sync_bn module!
Here is my env:
PyTorch version: 1.1.0
Is debug build: No
CUDA used to build PyTorch: 10.0.130

OS: Ubuntu 16.04.3 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
CMake version: version 3.10.0

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 7.5.17
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
GPU 2: GeForce GTX 1080 Ti
GPU 3: TITAN X (Pascal)

Nvidia driver version: 430.09

cuizelu · 2019-07-07T14:17:11Z

When i do "sh compile.sh", the error: command '/usr/local/cuda-10.0/bin/bin/nvcc' failed with exit status 1**** .My $PATH is :
export CUDA_HOME=/usr/local/cuda-10.0/bin
export LD_LIBRARTY_PATH=/usr/local/cuda-10.0/lib64
export PATH="$PATH:$LD_LIBRARY_PATH:$CUDA_HOME"\

My env is same to yours, I want to know your $PATH and do you have the same error?

youmi-zym · 2019-07-08T06:33:45Z

@cuizelu I'm sure my $PATH is the same as yours.
But can you provide more env info with the code below:

from torch.utils.collect_env import get_pretty_env_info
print(get_pretty_env_info())

feihuzhang · 2019-07-08T13:52:33Z

When i do "sh compile.sh", the error: command '/usr/local/cuda-10.0/bin/bin/nvcc' failed with exit status 1**** .My $PATH is :
export CUDA_HOME=/usr/local/cuda-10.0/bin
export LD_LIBRARTY_PATH=/usr/local/cuda-10.0/lib64
export PATH="$PATH:$LD_LIBRARY_PATH:$CUDA_HOME"\

My env is same to yours, I want to know your $PATH and do you have the same error?

Your environment variable settings are wrong. You can follow examples in "compile.sh".
$CUDA_HOME is the folder path where you implement your cuda.
$PATH is the path for your own or system bin files. $PATH should not point to library files or cuda folders.

export LD_LIBRARY_PATH="/home/feihu/anaconda3/lib:$LD_LIBRARY_PATH"
export LD_INCLUDE_PATH="/home/feihu/anaconda3/include:$LD_INCLUDE_PATH"
export CUDA_HOME="/usr/local/cuda-10.0"
export PATH="/home/feihu/anaconda3/bin:/usr/local/cuda-10.0/bin:$PATH"
export CPATH="/usr/local/cuda-10.0/include"
export CUDNN_INCLUDE_DIR="/usr/local/cuda-10.0/include"
export CUDNN_LIB_DIR="/usr/local/cuda-10.0/lib64"

feihuzhang · 2019-07-08T14:18:00Z

Actually, my torch is installed with pip, and successfully compile GANet and sync_bn module!
Here is my env:
PyTorch version: 1.1.0
Is debug build: No
CUDA used to build PyTorch: 10.0.130

OS: Ubuntu 16.04.3 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
CMake version: version 3.10.0

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 7.5.17
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
GPU 2: GeForce GTX 1080 Ti
GPU 3: TITAN X (Pascal)

Nvidia driver version: 430.09

Good to know this.
Did you try to run the code for training and testing?
I hope everything also goes well.
Compiling from source is mainly to avoid lib conflicts.

So we could generalize this to others for easy implementation.
Compiling pytorch is really time-consuming.
Thank you so much for sharing your case.

youmi-zym · 2019-07-08T14:47:50Z

I have tried to run the code, here is the running info:

Namespace(batchSize=8, crop_height=240, crop_width=528, cuda=1, data_path='/home/youmin/data/StereoMatching/SceneFlow/', in_path=None, job_name=None, kitti=0, kitti2015=0, left_right=0, lr=0.001, max_disp=192, nEpochs=11, out_path=None, pretrained_path=None, resume='', save_path='/home/youmin/exps/GANet/clean-test', seed=123, shift=0, testBatchSize=1, threads=16, training_list='/home/youmin/data/annotations/SceneFlow/cleanpass_train.json', val_list='/home/youmin/data/annotations/SceneFlow/cleanpass_test.json')
===> Loading datasets
===> Building model
0.001
===> Epoch[1](0/4431): Loss: 127.7617, Error: (68.4038 59.5425 79.2540)
===> Epoch[1](1/4431): Loss: 90.6924, Error: (46.6843 48.0715 53.4087)
===> Epoch[1](2/4431): Loss: 85.0300, Error: (49.5175 50.6676 45.6224)
===> Epoch[1](3/4431): Loss: 67.8248, Error: (46.5083 42.8192 33.7256)
===> Epoch[1](4/4431): Loss: 67.2712, Error: (45.5687 40.1456 34.9652)
===> Epoch[1](5/4431): Loss: 66.7984, Error: (46.9146 39.7734 34.4440)
===> Epoch[1](6/4431): Loss: 55.6109, Error: (41.5324 33.2189 28.2651)

It goes well for training.
And evaluating on the testing dataset works too.

lizolson · 2019-07-19T19:16:50Z

Heads up for Docker users, I had success with pytorch/pytorch:1.1.0-cuda10.0-cudnn7.5-devel

UESTCtubiao · 2019-10-23T14:58:14Z

I have tried to run the code, here is the running info:

Namespace(batchSize=8, crop_height=240, crop_width=528, cuda=1, data_path='/home/youmin/data/StereoMatching/SceneFlow/', in_path=None, job_name=None, kitti=0, kitti2015=0, left_right=0, lr=0.001, max_disp=192, nEpochs=11, out_path=None, pretrained_path=None, resume='', save_path='/home/youmin/exps/GANet/clean-test', seed=123, shift=0, testBatchSize=1, threads=16, training_list='/home/youmin/data/annotations/SceneFlow/cleanpass_train.json', val_list='/home/youmin/data/annotations/SceneFlow/cleanpass_test.json')
===> Loading datasets
===> Building model
0.001
===> Epoch[1](0/4431): Loss: 127.7617, Error: (68.4038 59.5425 79.2540)
===> Epoch[1](1/4431): Loss: 90.6924, Error: (46.6843 48.0715 53.4087)
===> Epoch[1](2/4431): Loss: 85.0300, Error: (49.5175 50.6676 45.6224)
===> Epoch[1](3/4431): Loss: 67.8248, Error: (46.5083 42.8192 33.7256)
===> Epoch[1](4/4431): Loss: 67.2712, Error: (45.5687 40.1456 34.9652)
===> Epoch[1](5/4431): Loss: 66.7984, Error: (46.9146 39.7734 34.4440)
===> Epoch[1](6/4431): Loss: 55.6109, Error: (41.5324 33.2189 28.2651)

It goes well for training.
And evaluating on the testing dataset works too.

Hello, I am running like you, what do the three average error rates mean? Why am I wrong with the test data?

UESTCtubiao · 2019-10-23T14:59:52Z

I have tried to run the code, here is the running info:

Namespace(batchSize=8, crop_height=240, crop_width=528, cuda=1, data_path='/home/youmin/data/StereoMatching/SceneFlow/', in_path=None, job_name=None, kitti=0, kitti2015=0, left_right=0, lr=0.001, max_disp=192, nEpochs=11, out_path=None, pretrained_path=None, resume='', save_path='/home/youmin/exps/GANet/clean-test', seed=123, shift=0, testBatchSize=1, threads=16, training_list='/home/youmin/data/annotations/SceneFlow/cleanpass_train.json', val_list='/home/youmin/data/annotations/SceneFlow/cleanpass_test.json')
===> Loading datasets
===> Building model
0.001
===> Epoch[1](0/4431): Loss: 127.7617, Error: (68.4038 59.5425 79.2540)
===> Epoch[1](1/4431): Loss: 90.6924, Error: (46.6843 48.0715 53.4087)
===> Epoch[1](2/4431): Loss: 85.0300, Error: (49.5175 50.6676 45.6224)
===> Epoch[1](3/4431): Loss: 67.8248, Error: (46.5083 42.8192 33.7256)
===> Epoch[1](4/4431): Loss: 67.2712, Error: (45.5687 40.1456 34.9652)
===> Epoch[1](5/4431): Loss: 66.7984, Error: (46.9146 39.7734 34.4440)
===> Epoch[1](6/4431): Loss: 55.6109, Error: (41.5324 33.2189 28.2651)

It goes well for training.
And evaluating on the testing dataset works too.

Hello, I am running like you, what do the three average error rates mean? Why am I wrong with the test data?

What should be the final training and prediction results?

youmi-zym · 2019-10-30T14:21:46Z

@UESTCtubiao
For training:
There are three disparity outputs from the network, and each will calculate an average error.
For testing:
There is only one disparity output, thus only an error is calculated.
As for the final training and prediction results, I think the readme file has claimed clearly.

UESTCtubiao · 2019-10-30T14:30:30Z

Thank you very much,i got the final results, i understand what you mean.

youmi-zym closed this as completed Jul 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compiled successfully with torch==1.1 installed by pip #24

Compiled successfully with torch==1.1 installed by pip #24

youmi-zym commented Jul 6, 2019

cuizelu commented Jul 7, 2019

youmi-zym commented Jul 8, 2019

feihuzhang commented Jul 8, 2019

feihuzhang commented Jul 8, 2019

youmi-zym commented Jul 8, 2019 •

edited

Loading

lizolson commented Jul 19, 2019

UESTCtubiao commented Oct 23, 2019

UESTCtubiao commented Oct 23, 2019

youmi-zym commented Oct 30, 2019

UESTCtubiao commented Oct 30, 2019

Compiled successfully with torch==1.1 installed by pip #24

Compiled successfully with torch==1.1 installed by pip #24

Comments

youmi-zym commented Jul 6, 2019

cuizelu commented Jul 7, 2019

youmi-zym commented Jul 8, 2019

feihuzhang commented Jul 8, 2019

feihuzhang commented Jul 8, 2019

youmi-zym commented Jul 8, 2019 • edited Loading

lizolson commented Jul 19, 2019

UESTCtubiao commented Oct 23, 2019

UESTCtubiao commented Oct 23, 2019

youmi-zym commented Oct 30, 2019

UESTCtubiao commented Oct 30, 2019

youmi-zym commented Jul 8, 2019 •

edited

Loading