Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weights before softmax error in the weighted loss function #8

Closed
FelixGruen opened this issue Dec 5, 2016 · 11 comments
Closed

Weights before softmax error in the weighted loss function #8

FelixGruen opened this issue Dec 5, 2016 · 11 comments

Comments

@FelixGruen
Copy link
Contributor

Hi,

in the implementation of the weighted loss function the weights are applied to the logits before the softmax activation function. The result for a two class problem is that the bigger value after the application of the softmax function will increase, the smaller value will decrease. In other words, the network will look more confident in its predictions. If the weight was large and the prediction was wrong the gradients will also be larger though not necessarily by the expected amount. If the prediction was right, however, the gradients will be smaller than they would have been otherwise.

To ensure correct scaling, the weights should be applied after the call to tf.nn.softmax_cross_entropy_with_logits() and before the call to tf.reduce_mean()

@jakeret
Copy link
Owner

jakeret commented Dec 5, 2016

I'm not sure if I understand your argumentation. The return value of softmax_cross_entropy_with_logits is a 1D-Tensor. How should one apply a class weight on this?
My intension with the implementation was, in case of class in-balance, to dampen the activations of a dominant class and amplify the others. However, could well be that I misunderstood the concept

@FelixGruen
Copy link
Contributor Author

I think you maybe didn't understand my argument, because I made a mistake in my explanation. I should have read the code more closely before posting. But I'll try to explain the problem in more detail (correctly this time ;) and outline the implementation.

So lets imagine you have ten times more pixels with label a than you have with label b. To balance this out you want the gradients that come from pixels with label b to count ten times as much as gradients that come from pixels with label a, so that when you add it all up learning for both cases happens at the same speed.

The gradients are the derivative with respect to the loss. The math is a bit involved, but I found this explanation of how to compute the drivatives of the last output layer in the case of a subsequent softmax activation function with cross entropy loss.

The basic takeaway is that the gradient is ∂L / ∂oi = pi - yi where L is the loss, oi is the output of the last layer (the logits), yi is the label (vector) and pi is the result of the softmax activation function: pi = eoi / ( Σj eoj )

Now if you multiply the loss L by 10, the gradient will also be multiplied by 10. But if you multiply the logits oi by 10, it will only influence the gradient through the result of the softmax pi.

Specifically, the way it is implemented now (and here is where I was wrong above), the values oi (or ob, w, h, i to make it consistend with TensorFlow dimensions) will be multiplied with a certain weight wi across the whole feature map only depending on their class i and irrespective of the label of their pixel.

I sent you an (untested) PR for the implementation. You should multiply the label array with the weights and sum it up across the last dimension (the one that defines the classes), so that you have a weightmap which corresponds to the pixel labels. Then reshape it into a 1D vector and multiply it elementwise with the loss. That way you have larger gradients for pixels with a label with a larger weight, and smaller gradients for pixels with a label with a smaller weight.

@jakeret
Copy link
Owner

jakeret commented Dec 7, 2016

Ok thats interessting. I need to think about this a little bit and thanks for the PR. I'll have a closer look

@jakeret
Copy link
Owner

jakeret commented Dec 14, 2016

I had a closer look at your PR and the referenced explanation. I think I understand the concept but I'm struggling a bit with the implementation.
Anyway, I checked out your branch and let it run on a problem with a class imbalance. Something doesn't seem to be alright. After a few epochs the loss function started to drift away and exploded (from ~1 to 10^6).

@FelixGruen
Copy link
Contributor Author

Hm, that's indeed not good :)

As I said I just coded it down in an editor and didn't have time to test it. I'll see if I have time to look at it again. But if you find an error either in the concept or the implementation, I'm of course always grateful.

@jakeret
Copy link
Owner

jakeret commented Dec 19, 2016

I just pushed a little extension to the toy problem such that the unet has to segment background (85%), circles (12%) and rectangles (2%). Maybe this is going to help to track down the issue

@jakeret
Copy link
Owner

jakeret commented Jan 8, 2017

HI nicolov, thanks for also looking into this.
The solution referenced in SO is essetially what is implemented in the master branch. The second solution in the post is what FelixGruen implemented in his new branch.

tf.nn.weighted_cross_entropy_with_logits sound interessting. If I understand correctly, the weights are supposed to do something slightly differently to what we try to achieve here. But maybe I could be used for our purpose. Any thoughts?

@nicolov
Copy link
Contributor

nicolov commented Jan 8, 2017

Yep, I agree with your analysis. I believe the tf.nn.weighted_cross_entropy_with_logits is for sigmoid activations, not softmax. Also interesting, this paper shows that a different loss function, based on the Dice coefficient, works better than re-weighting in the case of class inbalances.

@jakeret
Copy link
Owner

jakeret commented Jan 8, 2017

I just pushed a new branch to make it easier to add new cost functions. Furhtermore I also included an implementation of the dice coeffient loss.

@jakeret jakeret closed this as completed Feb 19, 2017
@jakeret
Copy link
Owner

jakeret commented Feb 19, 2017

I merged the branch quite a while ago

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants