Runtime error #5

farshidfarhat · 2016-10-17T02:51:46Z

Could you please let me know the issue with my demo?

error.txt
...
I1016 22:46:16.365223 24943 net.cpp:816] Ignoring source layer loss_loc
I1016 22:46:16.374922 24943 net.cpp:816] Ignoring source layer loss_next
save dir /gpfs/work/f/fuf111/deepcut/data/mpii-multiperson/scoremaps/test
testing from net file /gpfs/work/f/fuf111/deepcut/data/caffe-models/ResNet-101-mpii-multiperson.caffemodel
deepcut: test (MPII multiperson test) 2/1758
F1016 22:46:17.488354 24943 syncedmem.cpp:136] Cannot use GPU in CPU-only Caffe: check mode.
*** Check failure stack trace: ***

eldar · 2016-10-17T22:47:04Z

Hi, can you try changing this line https://github.com/eldar/deepcut/blob/master/lib/pose/cnn_cache_features.m#L47 to caffe.set_mode_cpu(); ? I always use GPU, but it never occured to me that people might not have GPUs with large enough memory, sorry!

eldar · 2016-10-18T13:34:37Z

It's actually very difficult to say from this log, what the error is. I've never seen anything like that.
So how exactly did you build caffe? "After applying the solution from issue 1799" - what was this fix?

farshidfarhat · 2016-10-18T14:11:32Z

here https://github.com/eldar/deepcut-cnn/blob/9b5de9cb70a0a440311178f26fbd6984d81e5c54/models/finetune_flickr_style/solver.prototxt#L17, I uncommented the last line to solve the issue about "Cannot use GPU in CPU-only Caffe".

Actually I installed Caffe locally (without SUDO/ROOT access) on a Redhat-based cluster. I changed Makefile.config as follows based on my system config:
CXXFLAGS += -std=c++11
CPU_ONLY := 1
BLAS := mkl

I commented the following part https://github.com/eldar/deepcut-cnn/blob/9b5de9cb70a0a440311178f26fbd6984d81e5c54/src/caffe/layers/softmax_loss_vec_layer.cpp#L236-L251 similar to softmax_loss_layer.cpp by myself.

I couldn't "make solver-callback" from your instructions, as there was no "solver-callback:" in Makefile!

Also I made your change "caffe.set_mode_cpu();" in https://github.com/eldar/deepcut/blob/master/lib/pose/cnn_cache_features.m#L47

eldar · 2016-10-18T14:28:11Z

"make solver-callback" - this will have to be executed not in the directory of caffe, but of directory of the solver.

Can you run the CNN-only demo as described here: https://github.com/eldar/deepcut-cnn/#installation-instructions
adding the use_cpu flag like so:

python ./pose_demo.py image.png --out_name=prediction

This will ensure that you got the CNN running, at the very least.

farshidfarhat · 2016-10-18T22:42:14Z

After debugging, I could run "python ./pose_demo.py image.png --out_name=prediction".
But "make solver-callback" gives the following log:
[ 50%] Building CXX object CMakeFiles/solver-callback.dir/src/pose/research/solver-callback.cxx.o
cc1plus: error: unrecognized command line option "-std=c++11"
make[3]: *** [CMakeFiles/solver-callback.dir/src/pose/research/solver-callback.cxx.o] Error 1
make[2]: *** [CMakeFiles/solver-callback.dir/all] Error 2
make[1]: *** [CMakeFiles/solver-callback.dir/rule] Error 2
make: *** [solver-callback] Error 2

farshidfarhat · 2016-10-19T23:03:57Z

I used this command to solve the above error:

cmake . -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=c++ -DGUROBI_ROOT_DIR=/usr/global/gurobi/gurobi651/linux64 -DGUROBI_VERSION=65

GCC and GUROBI should be compatible in this case.
Finally I made it on my system.

make.err.txt

farshidfarhat · 2016-10-20T15:22:59Z

Segmentation fault after running the demo:

...
I1020 11:20:43.944026 15336 net.cpp:228] conv1 does not need backward computation.
I1020 11:20:43.944032 15336 net.cpp:270] This network produces output loc_pred
I1020 11:20:43.944036 15336 net.cpp:270] This network produces output next_pred
I1020 11:20:43.944042 15336 net.cpp:270] This network produces output prob
I1020 11:20:43.944288 15336 net.cpp:283] Network initialization done.
I1020 11:20:44.850095 15336 net.cpp:816] Ignoring source layer data
I1020 11:20:44.850126 15336 net.cpp:816] Ignoring source layer label_data_1_split
I1020 11:20:44.902542 15336 net.cpp:816] Ignoring source layer res4b4_up_pose
I1020 11:20:44.902570 15336 net.cpp:816] Ignoring source layer crop_res4b4
I1020 11:20:44.902576 15336 net.cpp:816] Ignoring source layer loss_part_res4b4
I1020 11:20:44.902582 15336 net.cpp:816] Ignoring source layer res4b12_up_pose
I1020 11:20:44.902587 15336 net.cpp:816] Ignoring source layer crop_res4b12
I1020 11:20:44.902593 15336 net.cpp:816] Ignoring source layer loss_part_res4b12
I1020 11:20:44.902909 15336 net.cpp:816] Ignoring source layer loss_part_res5c
I1020 11:20:44.903682 15336 net.cpp:816] Ignoring source layer loss_loc
I1020 11:20:44.912511 15336 net.cpp:816] Ignoring source layer loss_next
save dir /gpfs/work/f/fuf111/deepcut/data/mpii-multiperson/scoremaps/test
testing from net file /gpfs/work/f/fuf111/deepcut/data/caffe-models/ResNet-101-mpii-multiperson.caffemodel
deepcut: test (MPII multiperson test) 2/1758
/usr/global/matlab/R2015a/bin/matlab: line 1: 15216 Segmentation fault pbs_taskset matlab-bin $@

eldar · 2016-10-20T15:34:39Z

Hey, I can't see from the log what exactly is the problem, but it could be that you didn't set the gurobi license file appropriately. This is where the location is set in the code https://github.com/eldar/deepcut/blob/master/lib/pose/exp_params.m#L18, you can modify it. You can obtain the academic license for free from Gurobi website.

P.S. In the next couple of days we will update the repository with completely new solver, that runs fast and also doesn't require any license.

farshidfarhat · 2016-10-20T15:40:21Z

Hi Eldar,

Thanks for your reply.
Actually I did all the instructions as you posted in README.md as well as Gurobi license.
I don't know Matlab version matters or not. But there is an error when I run ./start_matlab.sh as:

                                                           < M A T L A B (R) >
                                                 Copyright 1984-2015 The MathWorks, Inc.
                                                 R2015a (8.5.0.197613) 64-bit (glnxa64)
                                                            February 12, 2015

To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

Pose startup done

Academic License

Error using dbstop
Not enough input arguments.

eldar · 2016-10-20T16:06:07Z

Can you modify start_matlab.sh script or just start it with this command instead?

LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6 matlab

farshidfarhat · 2016-10-21T15:14:09Z

Yes. I ran "dbstop if error" later inside Matlab, and the error is as follows:

...
I1021 11:12:10.756536 2446 net.cpp:270] This network produces output next_pred
I1021 11:12:10.756551 2446 net.cpp:270] This network produces output prob
I1021 11:12:10.757047 2446 net.cpp:283] Network initialization done.
Unexpected Standard exception from MEX file.
What() is:basic_string::append
..

Error in caffe.Net/copy_from (line 123)
caffe_('net_copy_from', self.hNet_self, weights_file);

Error in caffe.get_net (line 34)
net.copy_from(weights_file);

Error in caffe.Net (line 31)
self = caffe.get_net(varargin{:});

Error in cnn_cache_features (line 52)
net = caffe.Net(net_def_file, net_bin_file, 'test');

Error in demo_multiperson (line 9)
cnn_cache_features( experiment_index, 'test', image_index, 1);

123 caffe_('net_copy_from', self.hNet_self, weights_file);

eldar · 2016-10-21T15:26:00Z

Can you stop the debugger on this line:

Error in cnn_cache_features (line 52)
net = caffe.Net(net_def_file, net_bin_file, 'test');

and check if net_def_file points to existing model definition file (somewhere in /models) and net_bin_file points to correct caffe binary weights fiel (something.caffe)?

farshidfarhat · 2016-10-24T03:59:12Z

It seems fine! May it be related to copy a huge model file?

...

Cleared 0 solvers and 0 stand-alone nets
52 net = caffe.Net(net_def_file, net_bin_file, 'test');

K>> net_def_file
net_def_file =
/gpfs/work/f/fuf111/deepcut/models/ResNet-101-FCN_out_14_sigmoid_locreg_allpairs_test.prototxt

K>> net_bin_file
net_bin_file =
/gpfs/work/f/fuf111/deepcut/data/caffe-models/ResNet-101-mpii-multiperson.caffemodel

eldar · 2016-10-25T15:27:35Z

Sorry, it's quite difficult to say what's wrong without proper error log. The model definitely fits on a 12Gb GPU. Maybe the file was corrupted during download? Here's the hash for mine:

deepercut-models$ md5sum ResNet-101-mpii-multiperson.caffemodel
a1aa7fb45c4f1a0e90087d6ddac24cf1  ResNet-101-mpii-multiperson.caffemodel

minhtriet mentioned this issue Oct 17, 2016

Failed to build Cafee #4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime error #5

Runtime error #5

farshidfarhat commented Oct 17, 2016 •

edited

Loading

eldar commented Oct 17, 2016

eldar commented Oct 18, 2016

farshidfarhat commented Oct 18, 2016

eldar commented Oct 18, 2016 •

edited

Loading

farshidfarhat commented Oct 18, 2016

farshidfarhat commented Oct 19, 2016 •

edited

Loading

farshidfarhat commented Oct 20, 2016 •

edited

Loading

eldar commented Oct 20, 2016

farshidfarhat commented Oct 20, 2016

eldar commented Oct 20, 2016

farshidfarhat commented Oct 21, 2016 •

edited

Loading

eldar commented Oct 21, 2016

farshidfarhat commented Oct 24, 2016 •

edited

Loading

eldar commented Oct 25, 2016

Runtime error #5

Runtime error #5

Comments

farshidfarhat commented Oct 17, 2016 • edited Loading

eldar commented Oct 17, 2016

eldar commented Oct 18, 2016

farshidfarhat commented Oct 18, 2016

eldar commented Oct 18, 2016 • edited Loading

farshidfarhat commented Oct 18, 2016

farshidfarhat commented Oct 19, 2016 • edited Loading

farshidfarhat commented Oct 20, 2016 • edited Loading

eldar commented Oct 20, 2016

farshidfarhat commented Oct 20, 2016

eldar commented Oct 20, 2016

farshidfarhat commented Oct 21, 2016 • edited Loading

eldar commented Oct 21, 2016

farshidfarhat commented Oct 24, 2016 • edited Loading

eldar commented Oct 25, 2016

farshidfarhat commented Oct 17, 2016 •

edited

Loading

eldar commented Oct 18, 2016 •

edited

Loading

farshidfarhat commented Oct 19, 2016 •

edited

Loading

farshidfarhat commented Oct 20, 2016 •

edited

Loading

farshidfarhat commented Oct 21, 2016 •

edited

Loading

farshidfarhat commented Oct 24, 2016 •

edited

Loading