Which loss function works in multi-label classification task? #10371

buaasky · 2018-06-07T11:00:31Z

I need to train a multi-label classifier for text topic classification task. Having searched around the internet, I follow the suggestion to use sigmoid + binary_crossentropy. But I can't get good results (i.e. subset accuracy) on the validation set although the loss is very small. After reading the source codes in Keras, I find out that the binary_crossentropy loss is implemented like this,

def binary_crossentropy(y_true, y_pred): 
    return K.mean(K.binary_crossentropy(y_true, y_pred), axis=-1)

My doubt is whether it makes sense to use the average in the case of multi-label classification task. Suppose that the dimension of label set is 30, and each training sample has only two or three of the labels. Since most of the labels are zeros in the most of the samples, I guess this loss will encourage the classifier to predict a tiny probability in each output dimension.

Following the idea here, #2826, I also give a try to categorial_crossentropy but still have no such luck.

Any tips on choosing the loss function for multi-label classification task is beyond welcome. Thanks in advance.

The text was updated successfully, but these errors were encountered:

ismaeIfm · 2018-06-07T17:46:43Z

The standard way to train a multilabel classifier is with sigmoid + binary_crossentropy,
but you can train a multilabel classifier with tanh + hinge, just the targets should be in {-1,1}. I don't think your issue has to do with the loss and the output activation, I think is more related with the complexity of your model. Also I'm curious: how you're evaluating your model?

buaasky · 2018-06-11T05:11:58Z

@ismaeIfm
Thanks for you answering.
Maybe I did not make my question exactly. The model I use is a BLSTM model with attention mechanism. I used it for my text topic multi-class classification task with categorical_crossentropy and it proves to be good. So, when I encounter text topic multi-label classification task, I just switched from softmax+ctg_ent to sigmoid+binary_ent. But the results are not that good. So I am wondering if there is something wrong with my loss function.
The way I use to evaluating my model is subset accuracy, which is the same as accuracy in multi-class problem. We regard it to be right only when the output is the same as true label.

ismaeIfm · 2018-06-11T16:03:03Z

As far as I understand subset accuracy needs the explicit classes {0, 1}, but your model outputs probabilities, how did you choose the threshold to binarize the labels? Have you tried using LRAP to evaluate your model?

daniel410 · 2018-06-12T17:27:36Z

For the multi-label classification, you can try tanh+hinge with {-1, 1} values in labels like (1, -1, -1, 1).
Or sigmoid + hamming loss with {0, 1} values in labels like (1, 0, 0, 1).
In my case, sigmoid + focal loss with {0, 1} values in labels like (1, 0, 0, 1) worked well.
You can check this paper https://arxiv.org/abs/1708.02002.

buaasky · 2018-06-17T07:24:26Z

@ismaeIfm
I’ve chosen 0.5 as the threshold to get binary outputs because each output represents the probability of the corresponding label.
I will try LRAP to evaluate my model to see if how the model works.
Thanks a lot.

buaasky · 2018-06-17T07:24:38Z

@daniel410
Thanks for you answering. It helps me a lot , and I will try the methods that you provided. I think it will help to use the focal loss because the focal loss can alleviate the issue of imbalanced labels.

BovineEnthusiast · 2018-11-05T07:45:27Z

I found an implementation of multi-label focal loss here:

https://github.com/Umi-you/FocalLoss

EDIT: Seems like his implementation doesn't work.

dberma15 · 2018-12-06T23:38:28Z

The multi-label focal loss equation doesn't seem to work.

Abhijit-2592 · 2019-01-03T15:05:15Z

@dberma15 focal loss doesn't work as in, it doesn't converge or implementation error? I feel it is the latter. Because, of 2 major issues. It shouldn't use numpy and implementation of cross entropy loss is flawed

Abhijit-2592 · 2019-01-07T05:09:52Z

import tensorflow as tf

K = tf.keras.backend


class FocalLoss(object):
    def __init__(self, gamma=2, alpha=0.25):
        self._gamma = gamma
        self._alpha = alpha

    def compute_loss(self, y_true, y_pred):
        cross_entropy_loss = K.binary_crossentropy(y_true, y_pred, from_logits=False)
        p_t = ((y_true * y_pred) +
               ((1 - y_true) * (1 - y_pred)))
        modulating_factor = 1.0
        if self._gamma:
            modulating_factor = tf.pow(1.0 - p_t, self._gamma)
        alpha_weight_factor = 1.0
        if self._alpha is not None:
            alpha_weight_factor = (y_true * self._alpha +
                                   (1 - y_true) * (1 - self._alpha))
        focal_cross_entropy_loss = (modulating_factor * alpha_weight_factor *
                                    cross_entropy_loss)
        return K.mean(focal_cross_entropy_loss, axis=-1)

@MrSnappingTurtle and @dberma15
My implementation of focal loss for Keras

randomwalker42 · 2019-08-01T08:21:35Z

For the multi-label classification, you can try tanh+hinge with {-1, 1} values in labels like (1, -1, -1, 1).
Or sigmoid + hamming loss with {0, 1} values in labels like (1, 0, 0, 1).
In my case, sigmoid + focal loss with {0, 1} values in labels like (1, 0, 0, 1) worked well.
You can check this paper https://arxiv.org/abs/1708.02002.

@daniel410 Hi, would you mind sharing how you implement your focal loss for the multi-label task, if it's not too much trouble?

talhaanwarch · 2020-02-01T18:45:40Z

    def compute_loss(self, y_true, y_pred):
        cross_entropy_loss = K.binary_crossentropy(y_true, y_pred, from_logits=False)
        p_t = ((y_true * y_pred) +
               ((1 - y_true) * (1 - y_pred)))
        modulating_factor = 1.0
        if self._gamma:
            modulating_factor = tf.pow(1.0 - p_t, self._gamma)
        alpha_weight_factor = 1.0
        if self._alpha is not None:
            alpha_weight_factor = (y_true * self._alpha +
                                   (1 - y_true) * (1 - self._alpha))
        focal_cross_entropy_loss = (modulating_factor * alpha_weight_factor *
                                    cross_entropy_loss)
        return K.mean(focal_cross_entropy_loss, axis=-1)

@Abhijit-2592 is it for multilabel classification?

it give me error
AttributeError: 'FocalLoss' object has no attribute 'get_shape'
i used it as follow
model.compile('adam',loss=FocalLoss,metrics=['accuracy'])

sushanttripathy · 2020-06-06T17:44:24Z

You can try my implementation and let me know if it works. https://github.com/sushanttripathy/Keras_loss_functions/blob/master/focal_loss.py

Vishnux0pa · 2020-06-14T04:40:46Z

@sushanttripathy: I tried your code and it works but the output is a tensor focal_loss_tensor is of 2d array. Should I take a mean to arrive at the final loss?

sushanttripathy · 2020-06-14T14:59:50Z

@Vishnux0pa I am not sure if the auto-differentiation requires me to provide the loss per sample (instead of per batch). I looked at categorical_crossentropy, and it seemed like that's what it was doing.

I did not get convergence with the earlier version of the loss (the one that yielded a scalar). It does converge with this one though.

fchollet closed this as completed Jun 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which loss function works in multi-label classification task? #10371

Which loss function works in multi-label classification task? #10371

buaasky commented Jun 7, 2018

ismaeIfm commented Jun 7, 2018

buaasky commented Jun 11, 2018

ismaeIfm commented Jun 11, 2018

daniel410 commented Jun 12, 2018 •

edited

buaasky commented Jun 17, 2018

buaasky commented Jun 17, 2018

BovineEnthusiast commented Nov 5, 2018 •

edited

dberma15 commented Dec 6, 2018

Abhijit-2592 commented Jan 3, 2019 •

edited

Abhijit-2592 commented Jan 7, 2019 •

edited

randomwalker42 commented Aug 1, 2019

talhaanwarch commented Feb 1, 2020

sushanttripathy commented Jun 6, 2020

Vishnux0pa commented Jun 14, 2020

sushanttripathy commented Jun 14, 2020

Which loss function works in multi-label classification task? #10371

Which loss function works in multi-label classification task? #10371

Comments

buaasky commented Jun 7, 2018

ismaeIfm commented Jun 7, 2018

buaasky commented Jun 11, 2018

ismaeIfm commented Jun 11, 2018

daniel410 commented Jun 12, 2018 • edited

buaasky commented Jun 17, 2018

buaasky commented Jun 17, 2018

BovineEnthusiast commented Nov 5, 2018 • edited

dberma15 commented Dec 6, 2018

Abhijit-2592 commented Jan 3, 2019 • edited

Abhijit-2592 commented Jan 7, 2019 • edited

randomwalker42 commented Aug 1, 2019

talhaanwarch commented Feb 1, 2020

sushanttripathy commented Jun 6, 2020

Vishnux0pa commented Jun 14, 2020

sushanttripathy commented Jun 14, 2020

daniel410 commented Jun 12, 2018 •

edited

BovineEnthusiast commented Nov 5, 2018 •

edited

Abhijit-2592 commented Jan 3, 2019 •

edited

Abhijit-2592 commented Jan 7, 2019 •

edited