Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train_val prototxt #6

Closed
ShervinAr opened this issue May 22, 2017 · 11 comments
Closed

train_val prototxt #6

ShervinAr opened this issue May 22, 2017 · 11 comments

Comments

@ShervinAr
Copy link

ShervinAr commented May 22, 2017

Hello, Could you please provide the train_val.prototxt file?
I would like to do some fine-tuning but it seems like the number of blobs for the batchnorm layers are not the same between the training and deploy models.

@shicai
Copy link
Owner

shicai commented May 23, 2017

except data layers and loss/acc layers, the main body of the training and deploy prototxt files should be the same. please check your training prototxt files.

@ShervinAr
Copy link
Author

thanks for your reply. Is it possible for your to share your train_val.prototxt file?

@shicai
Copy link
Owner

shicai commented May 23, 2017

they are actually the same.
so if you have any troubles, would you please provide your error infomation?

@ShervinAr
Copy link
Author

ShervinAr commented May 23, 2017

there seems to be an inconsistency between the batchnormalization layers definition in the mobilenet_deploy.prototxt and that of the provided Caffe model. More specifically, for example, for the conv1/bn layer, I get the following error message:

ERROR: Check failed: target_blobs.size() == source_layer.blobs_size() (5 vs. 3) Incompatible number of blobs for layer conv1/bn

I could remove the error message by renaming ALL the batchnormalization layers but I would like to use exactly the same model as you have provided

@shicai
Copy link
Owner

shicai commented May 23, 2017

do you use the official caffe?
batch_norm_layer has only 3 blobs.
please take a look at: https://github.com/BVLC/caffe/blob/master/src/caffe/layers/batch_norm_layer.cpp#L25

@ShervinAr
Copy link
Author

ShervinAr commented May 23, 2017

I am using nvCaffe and the NVIDIA Digits environment for fine-tuning.
the corresponding batch norm layer has five blobs as

this->blobs_[0].reset(new Blob<Dtype>(sz));  // scale
this->blobs_[1].reset(new Blob<Dtype>(sz));  // bias
this->blobs_[2].reset(new Blob<Dtype>(sz));  // mean
this->blobs_[3].reset(new Blob<Dtype>(sz));  // variance

.
.
.
this->blobs_[4].reset(new Blob(sz));

Do you have any suggestions how to work around this problem?

@shicai
Copy link
Owner

shicai commented May 23, 2017

please take a look at https://github.com/NVIDIA/caffe/blob/caffe-0.16/src/caffe/layers/batch_norm_layer.cpp#L25

  scale_bias_ = false;
  scale_bias_ = param.scale_bias(); // by default = false;
  if (param.has_scale_filler() || param.has_bias_filler()) { // implicit set
    scale_bias_ = true;
  }

  if (this->blobs_.size() > 0) {
    LOG(INFO) << "Skipping parameter initialization";
  } else {
    if (scale_bias_)
      this->blobs_.resize(5);
    else
      this->blobs_.resize(3);

please confirm that you set scale_bias false, and have no scale_filler or bias_filler for batch norm layers.

@ShervinAr
Copy link
Author

many thanks for your reply. As I am very new to Caffe, could you please let me know exactly how to do that?

@shicai
Copy link
Owner

shicai commented May 23, 2017

please update your caffe to newest one from: https://github.com/NVIDIA/caffe/
and the default param settings would be ok for you.

@ShervinAr
Copy link
Author

ShervinAr commented May 23, 2017

thanks for your reply. I have upgraded the caffe but still face the same problem. Any help on how to resolve this is greatly appreciated. The configuration of required libs are as follows:
-- ******************* Caffe Configuration Summary *******************
-- General:
-- Version : 0.16.1
-- Git : v0.16.1-6-g5a06f0e
-- System : Linux
-- C++ compiler : /usr/bin/c++
-- Release CXX flags : -O3 -DNDEBUG -fPIC -Wall -std=c++11 -Wno-sign-compare -Wno-uninitialized
-- Debug CXX flags : -g -DDEBUG -fPIC -Wall -std=c++11 -Wno-sign-compare -Wno-uninitialized
-- Build type : Release

-- BUILD_SHARED_LIBS : ON
-- BUILD_python : ON
-- BUILD_matlab : OFF
-- BUILD_docs : ON
-- CPU_ONLY : OFF
-- USE_OPENCV : ON
-- USE_LEVELDB : ON
-- USE_LMDB : ON
-- ALLOW_LMDB_NOLOCK : OFF
-- TEST_FP16 : OFF

-- Dependencies:
-- BLAS : Yes (Atlas)
-- Boost : Yes (ver. 1.54)
-- glog : Yes
-- gflags : Yes
-- protobuf : Yes (ver. 2.5.0)
-- lmdb : Yes (ver. 0.9.16)
-- LevelDB : Yes (ver. 1.15)
-- Snappy : Yes (ver. 1.1.0)
-- OpenCV : Yes (ver. 2.4.8)
-- CUDA : Yes (ver. 8.0)

-- NVIDIA CUDA:
-- Target GPU(s) : Auto
-- GPU arch(s) : sm_52
-- cuDNN : Yes (ver. 6.0)
-- NCCL : Not found
-- NVML : /usr/lib/nvidia-361/libnvidia-ml.so

-- Python:
-- Interpreter : /usr/bin/python2.7 (ver. 2.7.6)
-- Libraries : /usr/lib/x86_64-linux-gnu/libpython2.7.so (ver 2.7.6)
-- NumPy : /usr/lib/python2.7/dist-packages/numpy/core/include (ver 1.8.2)

-- Documentaion:
-- Doxygen : No
-- config_file :

@shicai
Copy link
Owner

shicai commented May 24, 2017

would you please share a link of your train_val.prototxt for me?
I need take a good look at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants