Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime error #5

Open
farshidfarhat opened this issue Oct 17, 2016 · 14 comments
Open

Runtime error #5

farshidfarhat opened this issue Oct 17, 2016 · 14 comments

Comments

@farshidfarhat
Copy link

farshidfarhat commented Oct 17, 2016

Could you please let me know the issue with my demo?

error.txt
...
I1016 22:46:16.365223 24943 net.cpp:816] Ignoring source layer loss_loc
I1016 22:46:16.374922 24943 net.cpp:816] Ignoring source layer loss_next
save dir /gpfs/work/f/fuf111/deepcut/data/mpii-multiperson/scoremaps/test
testing from net file /gpfs/work/f/fuf111/deepcut/data/caffe-models/ResNet-101-mpii-multiperson.caffemodel
deepcut: test (MPII multiperson test) 2/1758
F1016 22:46:17.488354 24943 syncedmem.cpp:136] Cannot use GPU in CPU-only Caffe: check mode.
*** Check failure stack trace: ***

@eldar
Copy link
Owner

eldar commented Oct 17, 2016

Hi, can you try changing this line https://github.com/eldar/deepcut/blob/master/lib/pose/cnn_cache_features.m#L47 to caffe.set_mode_cpu(); ? I always use GPU, but it never occured to me that people might not have GPUs with large enough memory, sorry!

@eldar
Copy link
Owner

eldar commented Oct 18, 2016

It's actually very difficult to say from this log, what the error is. I've never seen anything like that.
So how exactly did you build caffe? "After applying the solution from issue 1799" - what was this fix?

@farshidfarhat
Copy link
Author

here https://github.com/eldar/deepcut-cnn/blob/9b5de9cb70a0a440311178f26fbd6984d81e5c54/models/finetune_flickr_style/solver.prototxt#L17, I uncommented the last line to solve the issue about "Cannot use GPU in CPU-only Caffe".

Actually I installed Caffe locally (without SUDO/ROOT access) on a Redhat-based cluster. I changed Makefile.config as follows based on my system config:
CXXFLAGS += -std=c++11
CPU_ONLY := 1
BLAS := mkl

I commented the following part https://github.com/eldar/deepcut-cnn/blob/9b5de9cb70a0a440311178f26fbd6984d81e5c54/src/caffe/layers/softmax_loss_vec_layer.cpp#L236-L251 similar to softmax_loss_layer.cpp by myself.

I couldn't "make solver-callback" from your instructions, as there was no "solver-callback:" in Makefile!

Also I made your change "caffe.set_mode_cpu();" in https://github.com/eldar/deepcut/blob/master/lib/pose/cnn_cache_features.m#L47

@eldar
Copy link
Owner

eldar commented Oct 18, 2016

"make solver-callback" - this will have to be executed not in the directory of caffe, but of directory of the solver.

Can you run the CNN-only demo as described here: https://github.com/eldar/deepcut-cnn/#installation-instructions
adding the use_cpu flag like so:

python ./pose_demo.py image.png --out_name=prediction

This will ensure that you got the CNN running, at the very least.

@farshidfarhat
Copy link
Author

After debugging, I could run "python ./pose_demo.py image.png --out_name=prediction".
But "make solver-callback" gives the following log:
[ 50%] Building CXX object CMakeFiles/solver-callback.dir/src/pose/research/solver-callback.cxx.o
cc1plus: error: unrecognized command line option "-std=c++11"
make[3]: *** [CMakeFiles/solver-callback.dir/src/pose/research/solver-callback.cxx.o] Error 1
make[2]: *** [CMakeFiles/solver-callback.dir/all] Error 2
make[1]: *** [CMakeFiles/solver-callback.dir/rule] Error 2
make: *** [solver-callback] Error 2

@farshidfarhat
Copy link
Author

farshidfarhat commented Oct 19, 2016

I used this command to solve the above error:

cmake . -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=c++ -DGUROBI_ROOT_DIR=/usr/global/gurobi/gurobi651/linux64 -DGUROBI_VERSION=65

GCC and GUROBI should be compatible in this case.
Finally I made it on my system.

make.err.txt

@farshidfarhat
Copy link
Author

farshidfarhat commented Oct 20, 2016

Segmentation fault after running the demo:

...
I1020 11:20:43.944026 15336 net.cpp:228] conv1 does not need backward computation.
I1020 11:20:43.944032 15336 net.cpp:270] This network produces output loc_pred
I1020 11:20:43.944036 15336 net.cpp:270] This network produces output next_pred
I1020 11:20:43.944042 15336 net.cpp:270] This network produces output prob
I1020 11:20:43.944288 15336 net.cpp:283] Network initialization done.
I1020 11:20:44.850095 15336 net.cpp:816] Ignoring source layer data
I1020 11:20:44.850126 15336 net.cpp:816] Ignoring source layer label_data_1_split
I1020 11:20:44.902542 15336 net.cpp:816] Ignoring source layer res4b4_up_pose
I1020 11:20:44.902570 15336 net.cpp:816] Ignoring source layer crop_res4b4
I1020 11:20:44.902576 15336 net.cpp:816] Ignoring source layer loss_part_res4b4
I1020 11:20:44.902582 15336 net.cpp:816] Ignoring source layer res4b12_up_pose
I1020 11:20:44.902587 15336 net.cpp:816] Ignoring source layer crop_res4b12
I1020 11:20:44.902593 15336 net.cpp:816] Ignoring source layer loss_part_res4b12
I1020 11:20:44.902909 15336 net.cpp:816] Ignoring source layer loss_part_res5c
I1020 11:20:44.903682 15336 net.cpp:816] Ignoring source layer loss_loc
I1020 11:20:44.912511 15336 net.cpp:816] Ignoring source layer loss_next
save dir /gpfs/work/f/fuf111/deepcut/data/mpii-multiperson/scoremaps/test
testing from net file /gpfs/work/f/fuf111/deepcut/data/caffe-models/ResNet-101-mpii-multiperson.caffemodel
deepcut: test (MPII multiperson test) 2/1758
/usr/global/matlab/R2015a/bin/matlab: line 1: 15216 Segmentation fault pbs_taskset matlab-bin $@

@eldar
Copy link
Owner

eldar commented Oct 20, 2016

Hey, I can't see from the log what exactly is the problem, but it could be that you didn't set the gurobi license file appropriately. This is where the location is set in the code https://github.com/eldar/deepcut/blob/master/lib/pose/exp_params.m#L18, you can modify it. You can obtain the academic license for free from Gurobi website.

P.S. In the next couple of days we will update the repository with completely new solver, that runs fast and also doesn't require any license.

@farshidfarhat
Copy link
Author

Hi Eldar,

Thanks for your reply.
Actually I did all the instructions as you posted in README.md as well as Gurobi license.
I don't know Matlab version matters or not. But there is an error when I run ./start_matlab.sh as:

                                                           < M A T L A B (R) >
                                                 Copyright 1984-2015 The MathWorks, Inc.
                                                 R2015a (8.5.0.197613) 64-bit (glnxa64)
                                                            February 12, 2015

To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

Pose startup done

Academic License

Error using dbstop
Not enough input arguments.

@eldar
Copy link
Owner

eldar commented Oct 20, 2016

Can you modify start_matlab.sh script or just start it with this command instead?

LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6 matlab

@farshidfarhat
Copy link
Author

farshidfarhat commented Oct 21, 2016

Yes. I ran "dbstop if error" later inside Matlab, and the error is as follows:

...
I1021 11:12:10.756536 2446 net.cpp:270] This network produces output next_pred
I1021 11:12:10.756551 2446 net.cpp:270] This network produces output prob
I1021 11:12:10.757047 2446 net.cpp:283] Network initialization done.
Unexpected Standard exception from MEX file.
What() is:basic_string::append
..

Error in caffe.Net/copy_from (line 123)
caffe_('net_copy_from', self.hNet_self, weights_file);

Error in caffe.get_net (line 34)
net.copy_from(weights_file);

Error in caffe.Net (line 31)
self = caffe.get_net(varargin{:});

Error in cnn_cache_features (line 52)
net = caffe.Net(net_def_file, net_bin_file, 'test');

Error in demo_multiperson (line 9)
cnn_cache_features( experiment_index, 'test', image_index, 1);

123 caffe_('net_copy_from', self.hNet_self, weights_file);

@eldar
Copy link
Owner

eldar commented Oct 21, 2016

Can you stop the debugger on this line:

Error in cnn_cache_features (line 52)
net = caffe.Net(net_def_file, net_bin_file, 'test');

and check if net_def_file points to existing model definition file (somewhere in /models) and net_bin_file points to correct caffe binary weights fiel (something.caffe)?

@farshidfarhat
Copy link
Author

farshidfarhat commented Oct 24, 2016

It seems fine! May it be related to copy a huge model file?

...

Cleared 0 solvers and 0 stand-alone nets
52 net = caffe.Net(net_def_file, net_bin_file, 'test');

K>> net_def_file
net_def_file =
/gpfs/work/f/fuf111/deepcut/models/ResNet-101-FCN_out_14_sigmoid_locreg_allpairs_test.prototxt

K>> net_bin_file
net_bin_file =
/gpfs/work/f/fuf111/deepcut/data/caffe-models/ResNet-101-mpii-multiperson.caffemodel

@eldar
Copy link
Owner

eldar commented Oct 25, 2016

Sorry, it's quite difficult to say what's wrong without proper error log. The model definitely fits on a 12Gb GPU. Maybe the file was corrupted during download? Here's the hash for mine:

deepercut-models$ md5sum ResNet-101-mpii-multiperson.caffemodel
a1aa7fb45c4f1a0e90087d6ddac24cf1  ResNet-101-mpii-multiperson.caffemodel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants