By using tanh rescaling the output of the convolution, it rescales the scale of the output.
And pass through the Softmax, the loss will become more sensitive if the class is hard to classify.
By the characteristics of tanh, it will have drastic changes when the value is in the middle part.
If we look at the probability, it means not classified well. So using can let the model get more sensitive when it is hard to classify. Because the range of tanh is between -1 and 1, and it will cause the model hard to converge. So here use the tanh as the scaling scale. So if value rescaling by tanh, it will not be constrained and get the behavior of tanh.
By tanh and exponential, it will be more sensitive after rescaling if the value difference increase, .
Foward:
y = (2 + tanh(x)) * x
Backward:
y' = (1 - tanh(x) ^2) + (2 + tanh(x))
layer {
name: "TanhScale"
type: "TanhScale"
bottom: "deconv1"
top: "TanhScale"
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "TanhScale"
bottom: "label"
top: "loss"
propagate_down: true
propagate_down: false
}