Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On weights of the loss layers #1

Open
happynear opened this issue Oct 25, 2014 · 2 comments
Open

On weights of the loss layers #1

happynear opened this issue Oct 25, 2014 · 2 comments

Comments

@happynear
Copy link

I noticed that the weight of the loss layers are realised by setting the blob_lr parameter of the innerproduct layer in this project. This is equivalent to the formulation (3) for training the innerproduct layer. However, the gradients backpropagate to the bottom conv layer will not be influenced by the weight (0.001 in the prototxt file).
In another word, this realization just slowly learns the innerproduct layer of the previous SVMs, but applies the gradients of the classifiers equally to the nets. All SVMs have the same weight this way.
CAFFE has provided a param called "loss_weight", which is the correct method to realize the model described in the paper as far as I see.
This is all my opinion. If I were wrong, please reply me.

@s9xie
Copy link
Owner

s9xie commented Nov 2, 2014

Yes exactly the case. Back to the time I did the experiments caffe did not support per layer weight, you can find several lengthy discussions in caffe's repo. We should use per layer weight instead of the per layer lr finally, but please note that you can always achieve a certain configuration by tuning the per layer loss, so the correctness of the experiments will not be influenced here.

@happynear
Copy link
Author

@s9xie, I have done some experiments and the two strategies' performance is quite close to your declared ones. However, by tuning the lr, the previous SVMs apply gradients fully with weight 1, but is learned much slower with weght 0.001. The lr and the loss weight are quite different things. I am just astonished by these results. Nonetheless, the idea of adding SVMs to the conv layers is really impressive and works well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants