Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training problem #47

Open
peternara opened this issue Jul 24, 2017 · 5 comments
Open

training problem #47

peternara opened this issue Jul 24, 2017 · 5 comments

Comments

@peternara
Copy link

peternara commented Jul 24, 2017

I am trying to train your base code.
but, In the process, we see strange phenomena.
During learning, the values ​​of ap and an are gradually increasing. Do you know what the problem is?
Here is an example.

init step

I0724 11:03:43.994547 9413 solver.cpp:338] Iteration 0, Testing net (#0)
loss: diff: -0.000300745 ap:0.0187428 an:0.0184421
I0724 11:03:44.687252 9413 solver.cpp:406] Test net output #0: loss = 0.0998072 (* 1 = 0.0998072 loss)
loss: diff: -0.0166375 ap:0.204302 an:0.187664
I0724 11:03:45.168706 9413 solver.cpp:229] Iteration 0, loss = 0.101533
I0724 11:03:45.168732 9413 solver.cpp:245] Train net output #0: loss = 0.101533 (* 1 = 0.101533 loss)
I0724 11:03:45.168741 9413 sgd_solver.cpp:106] Iteration 0, lr = 0.05
fc9_1: 0.0270555
loss: diff: 0.00402641 ap:0.187749 an:0.191775
fc9_1: 0.02706
loss: diff: -0.000589401 ap:0.185847 an:0.185258
fc9_1: 0.0270641
loss: diff: 0.00889902 ap:0.18142 an:0.190319
fc9_1: 0.0270755
loss: diff: -0.00702135 ap:0.179443 an:0.172421
fc9_1: 0.0270836
loss: diff: -0.000574797 ap:0.179911 an:0.179336
fc9_1: 0.0270977
loss: diff: -0.00920136 ap:0.198604 an:0.189403
fc9_1: 0.0271129
loss: diff: -0.00556538 ap:0.195801 an:0.190236
fc9_1: 0.0271284
loss: diff: -0.00923073 ap:0.194738 an:0.185507
fc9_1: 0.0271592
loss: diff: 0.00120996 ap:0.199738 an:0.200948
fc9_1: 0.027195
loss: diff: 0.00612946 ap:0.193083 an:0.199212
fc9_1: 0.0272452
loss: diff: -0.00397289 ap:0.195506 an:0.191533
fc9_1: 0.0273063
loss: diff: -0.00386688 ap:0.195402 an:0.191535
fc9_1: 0.0273628
loss: diff: 0.00654422 ap:0.172184 an:0.178728
fc9_1: 0.0273842
loss: diff: -0.00313556 ap:0.187025 an:0.183889
fc9_1: 0.0274232
loss: diff: 0.00579476 ap:0.197591 an:0.203386
fc9_1: 0.0274561
loss: diff: -0.010954 ap:0.201672 an:0.190718
fc9_1: 0.0274747
loss: diff: 0.0195905 ap:0.170718 an:0.190309
fc9_1: 0.0274855
loss: diff: 0.00869015 ap:0.193575 an:0.202265
fc9_1: 0.0275473
loss: diff: -0.0020541 ap:0.196721 an:0.194667
fc9_1: 0.0275993
loss: diff: -0.00750799 ap:0.204833 an:0.197325

3360 step

loss: diff: 303786.0 ap:229735.0 an:533521.0
I0724 11:03:09.143044 511 solver.cpp:229] Iteration 3360, loss = 5285.04
I0724 11:03:09.143069 511 solver.cpp:245] Train net output #0: loss = 5285.04 (* 1 = 5285.04 loss)
I0724 11:03:09.143077 511 sgd_solver.cpp:106] Iteration 3360, lr = 0.05
fc9_1: 49.6616
loss: diff: 374721.0 ap:515413.0 an:890134.0
fc9_1: 49.8801
loss: diff: -581244.0 ap:2.19052e+06 an:1.60928e+06
fc9_1: 49.9985
loss: diff: 547982.0 ap:352190.0 an:900172.0
fc9_1: 50.1899
loss: diff: 393501.0 ap:617288.0 an:1.01079e+06
fc9_1: 50.5875
loss: diff: 501302.0 ap:433316.0 an:934618.0
fc9_1: 50.9914
loss: diff: -466161.0 ap:870104.0 an:403944.0
fc9_1: 51.5431
loss: diff: -457406.0 ap:1.35256e+06 an:895150.0
fc9_1: 51.9892
loss: diff: 60009.7 ap:679217.0 an:739227.0
fc9_1: 52.2999
loss: diff: -328786.0 ap:879514.0 an:550729.0
fc9_1: 52.7494
loss: diff: -152841.0 ap:813207.0 an:660366.0
fc9_1: 53.1168
loss: diff: -457695.0 ap:1.29021e+06 an:832511.0
fc9_1: 53.2979
loss: diff: 171481.0 ap:1.07467e+06 an:1.24615e+06
fc9_1: 53.524
loss: diff: -172325.0 ap:787410.0 an:615085.0
fc9_1: 53.811
loss: diff: -35747.0 ap:1.9394e+06 an:1.90365e+06
fc9_1: 54.0079
loss: diff: 113833.0 ap:421736.0 an:535569.0
fc9_1: 54.3732
loss: diff: 241538.0 ap:1.02858e+06 an:1.27012e+06
fc9_1: 54.7839
loss: diff: -186023.0 ap:930481.0 an:744458.0
fc9_1: 55.2063
loss: diff: 616794.0 ap:955952.0 an:1.57275e+06
fc9_1: 55.2271
loss: diff: -755828.0 ap:1.74845e+06 an:992624.0
fc9_1: 55.2024
loss: diff: -33648.6 ap:638003.0 an:604355.0

@JoeFannie
Copy link

I also met this problem and solved it by removing the last ip layer, which is the "fc9_1" layer. Since the "fc9_1" layer will magnify the norm of the feature to enlarge the distance between a and an, you can see the distance keeps increasing untill it overflows. I also recommend adding a "PReLU" layer after the last ip layer before normalizing it.

@peternara
Copy link
Author

peternara commented Jul 27, 2017

@JoeFannie thanks .
Do you mean to remove the embedding layer?

layer {
name: "norm2"
type: "Python"
bottom: "fc7"
top: "norm2"
python_param {
module: "norm2layer"
layer: "Norm2Layer"
}
}
layer {
name: "fc9_1"
type: "InnerProduct"
bottom: "norm2"
top: "fc9_1"
param {
lr_mult: 1
decay_mult: 0
}
inner_product_param {
num_output: 512
weight_filler {
type: "xavier"
}
}
}

Then, what kind of layer do you measure?
( I need a dimensionally reduced vector. So I want to know how to reduce it.)

@JoeFannie
Copy link

layer {
name: "norm2"
type: "Python"
bottom: "fc7"
top: "norm2"
python_param {
module: "norm2layer"
layer: "Norm2Layer"
}
}
add only norm2 layer is just good. If u want to reduce the dimension, u can move the norm2 layer to the next to the last ip layer "fc9_1". In one word, the feature is supposed to be normalized before being fed to the triplet loss layer. Hope it helps.

@peternara
Copy link
Author

@JoeFannie sorry very late..and thanks.

fc7 -> fc9_1 -> norm2 ->"triplet_select"
Do you mean this order?

so, below...

,,,

layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  param {
    lr_mult: 0
    decay_mult: 0
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type:"xavier"
    }
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}

layer {
  name: "fc9_1"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc9_1"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  inner_product_param {
    num_output: 512
    weight_filler {
      type: "xavier"
    }
  }
}

layer {
  name: "norm2"
  type: "Python"
  bottom: "fc9_1"
  top: "norm2"
  python_param {
    module: "norm2layer"
    layer: "Norm2Layer"
  }
}
#add activation layer may increase the feature's expression
layer {
  name: "triplet_select"
  type: "Python"
  bottom: "norm2"
  bottom: "labels"
  top: "archor"
  top: "positive"
  top: "negative"
  python_param {
    module: "tripletselectlayer"
    layer: "TripletSelectLayer"
  }
}
layer {
  name: "tripletloss"
  type: "Python"
  bottom: "archor"
  bottom: "positive"
  bottom: "negative"
  top: "loss"
  python_param {
    module: "tripletlosslayer"
    layer: "TripletLayer"
    param_str: "'margin': 0.2"
  }
  loss_weight: 1
}

@JoeFannie
Copy link

It is what I mean

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants