Weights before softmax error in the weighted loss function #8

FelixGruen · 2016-12-05T16:13:59Z

Hi,

in the implementation of the weighted loss function the weights are applied to the logits before the softmax activation function. The result for a two class problem is that the bigger value after the application of the softmax function will increase, the smaller value will decrease. In other words, the network will look more confident in its predictions. If the weight was large and the prediction was wrong the gradients will also be larger though not necessarily by the expected amount. If the prediction was right, however, the gradients will be smaller than they would have been otherwise.

To ensure correct scaling, the weights should be applied after the call to tf.nn.softmax_cross_entropy_with_logits() and before the call to tf.reduce_mean()

The text was updated successfully, but these errors were encountered:

jakeret · 2016-12-05T20:44:53Z

I'm not sure if I understand your argumentation. The return value of softmax_cross_entropy_with_logits is a 1D-Tensor. How should one apply a class weight on this?
My intension with the implementation was, in case of class in-balance, to dampen the activations of a dominant class and amplify the others. However, could well be that I misunderstood the concept

FelixGruen · 2016-12-06T16:42:02Z

I think you maybe didn't understand my argument, because I made a mistake in my explanation. I should have read the code more closely before posting. But I'll try to explain the problem in more detail (correctly this time ;) and outline the implementation.

So lets imagine you have ten times more pixels with label a than you have with label b. To balance this out you want the gradients that come from pixels with label b to count ten times as much as gradients that come from pixels with label a, so that when you add it all up learning for both cases happens at the same speed.

The gradients are the derivative with respect to the loss. The math is a bit involved, but I found this explanation of how to compute the drivatives of the last output layer in the case of a subsequent softmax activation function with cross entropy loss.

The basic takeaway is that the gradient is ∂L / ∂o_i = p_i - y_i where L is the loss, o_i is the output of the last layer (the logits), y_i is the label (vector) and p_i is the result of the softmax activation function: p_i = e^o_i / ( Σ_j e^o_j )

Now if you multiply the loss L by 10, the gradient will also be multiplied by 10. But if you multiply the logits o_i by 10, it will only influence the gradient through the result of the softmax p_i.

Specifically, the way it is implemented now (and here is where I was wrong above), the values o_i (or o_{b, w, h, i} to make it consistend with TensorFlow dimensions) will be multiplied with a certain weight w_i across the whole feature map only depending on their class i and irrespective of the label of their pixel.

I sent you an (untested) PR for the implementation. You should multiply the label array with the weights and sum it up across the last dimension (the one that defines the classes), so that you have a weightmap which corresponds to the pixel labels. Then reshape it into a 1D vector and multiply it elementwise with the loss. That way you have larger gradients for pixels with a label with a larger weight, and smaller gradients for pixels with a label with a smaller weight.

jakeret · 2016-12-07T21:22:05Z

Ok thats interessting. I need to think about this a little bit and thanks for the PR. I'll have a closer look

jakeret · 2016-12-14T19:05:01Z

I had a closer look at your PR and the referenced explanation. I think I understand the concept but I'm struggling a bit with the implementation.
Anyway, I checked out your branch and let it run on a problem with a class imbalance. Something doesn't seem to be alright. After a few epochs the loss function started to drift away and exploded (from ~1 to 10^6).

FelixGruen · 2016-12-14T19:20:48Z

Hm, that's indeed not good :)

As I said I just coded it down in an editor and didn't have time to test it. I'll see if I have time to look at it again. But if you find an error either in the concept or the implementation, I'm of course always grateful.

jakeret · 2016-12-19T18:37:57Z

I just pushed a little extension to the toy problem such that the unet has to segment background (85%), circles (12%) and rectangles (2%). Maybe this is going to help to track down the issue

nicolov · 2017-01-06T04:13:50Z

Some resources I found while looking at this issue:

jakeret · 2017-01-08T11:53:11Z

HI nicolov, thanks for also looking into this.
The solution referenced in SO is essetially what is implemented in the master branch. The second solution in the post is what FelixGruen implemented in his new branch.

tf.nn.weighted_cross_entropy_with_logits sound interessting. If I understand correctly, the weights are supposed to do something slightly differently to what we try to achieve here. But maybe I could be used for our purpose. Any thoughts?

nicolov · 2017-01-08T12:23:10Z

Yep, I agree with your analysis. I believe the tf.nn.weighted_cross_entropy_with_logits is for sigmoid activations, not softmax. Also interesting, this paper shows that a different loss function, based on the Dice coefficient, works better than re-weighting in the case of class inbalances.

jakeret · 2017-01-08T15:26:56Z

I just pushed a new branch to make it easier to add new cost functions. Furhtermore I also included an implementation of the dice coeffient loss.

jakeret · 2017-02-19T21:08:15Z

I merged the branch quite a while ago

FelixGruen mentioned this issue Dec 6, 2016

Fixed weighted loss #10

Closed

jakeret closed this as completed Feb 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weights before softmax error in the weighted loss function #8

Weights before softmax error in the weighted loss function #8

FelixGruen commented Dec 5, 2016

jakeret commented Dec 5, 2016

FelixGruen commented Dec 6, 2016

jakeret commented Dec 7, 2016

jakeret commented Dec 14, 2016

FelixGruen commented Dec 14, 2016

jakeret commented Dec 19, 2016

nicolov commented Jan 6, 2017

jakeret commented Jan 8, 2017

nicolov commented Jan 8, 2017

jakeret commented Jan 8, 2017

jakeret commented Feb 19, 2017

Weights before softmax error in the weighted loss function #8

Weights before softmax error in the weighted loss function #8

Comments

FelixGruen commented Dec 5, 2016

jakeret commented Dec 5, 2016

FelixGruen commented Dec 6, 2016

jakeret commented Dec 7, 2016

jakeret commented Dec 14, 2016

FelixGruen commented Dec 14, 2016

jakeret commented Dec 19, 2016

nicolov commented Jan 6, 2017

jakeret commented Jan 8, 2017

nicolov commented Jan 8, 2017

jakeret commented Jan 8, 2017

jakeret commented Feb 19, 2017