Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which loss function works in multi-label classification task? #10371

Closed
buaasky opened this issue Jun 7, 2018 · 15 comments
Closed

Which loss function works in multi-label classification task? #10371

buaasky opened this issue Jun 7, 2018 · 15 comments

Comments

@buaasky
Copy link

buaasky commented Jun 7, 2018

I need to train a multi-label classifier for text topic classification task. Having searched around the internet, I follow the suggestion to use sigmoid + binary_crossentropy. But I can't get good results (i.e. subset accuracy) on the validation set although the loss is very small. After reading the source codes in Keras, I find out that the binary_crossentropy loss is implemented like this,

def binary_crossentropy(y_true, y_pred): 
    return K.mean(K.binary_crossentropy(y_true, y_pred), axis=-1)

My doubt is whether it makes sense to use the average in the case of multi-label classification task. Suppose that the dimension of label set is 30, and each training sample has only two or three of the labels. Since most of the labels are zeros in the most of the samples, I guess this loss will encourage the classifier to predict a tiny probability in each output dimension.

Following the idea here, #2826, I also give a try to categorial_crossentropy but still have no such luck.

Any tips on choosing the loss function for multi-label classification task is beyond welcome. Thanks in advance.

@ismaeIfm
Copy link

ismaeIfm commented Jun 7, 2018

The standard way to train a multilabel classifier is with sigmoid + binary_crossentropy,
but you can train a multilabel classifier with tanh + hinge, just the targets should be in {-1,1}. I don't think your issue has to do with the loss and the output activation, I think is more related with the complexity of your model. Also I'm curious: how you're evaluating your model?

@buaasky
Copy link
Author

buaasky commented Jun 11, 2018

@ismaeIfm
Thanks for you answering.
Maybe I did not make my question exactly. The model I use is a BLSTM model with attention mechanism. I used it for my text topic multi-class classification task with categorical_crossentropy and it proves to be good. So, when I encounter text topic multi-label classification task, I just switched from softmax+ctg_ent to sigmoid+binary_ent. But the results are not that good. So I am wondering if there is something wrong with my loss function.
The way I use to evaluating my model is subset accuracy, which is the same as accuracy in multi-class problem. We regard it to be right only when the output is the same as true label.

@ismaeIfm
Copy link

As far as I understand subset accuracy needs the explicit classes {0, 1}, but your model outputs probabilities, how did you choose the threshold to binarize the labels? Have you tried using LRAP to evaluate your model?

@daniel410
Copy link

daniel410 commented Jun 12, 2018

For the multi-label classification, you can try tanh+hinge with {-1, 1} values in labels like (1, -1, -1, 1).
Or sigmoid + hamming loss with {0, 1} values in labels like (1, 0, 0, 1).
In my case, sigmoid + focal loss with {0, 1} values in labels like (1, 0, 0, 1) worked well.
You can check this paper https://arxiv.org/abs/1708.02002.

@buaasky
Copy link
Author

buaasky commented Jun 17, 2018

@ismaeIfm
I’ve chosen 0.5 as the threshold to get binary outputs because each output represents the probability of the corresponding label.
I will try LRAP to evaluate my model to see if how the model works.
Thanks a lot.

@buaasky
Copy link
Author

buaasky commented Jun 17, 2018

@daniel410
Thanks for you answering. It helps me a lot , and I will try the methods that you provided. I think it will help to use the focal loss because the focal loss can alleviate the issue of imbalanced labels.

@BovineEnthusiast
Copy link

BovineEnthusiast commented Nov 5, 2018

I found an implementation of multi-label focal loss here:

https://github.com/Umi-you/FocalLoss

EDIT: Seems like his implementation doesn't work.

@dberma15
Copy link

dberma15 commented Dec 6, 2018

The multi-label focal loss equation doesn't seem to work.

@Abhijit-2592
Copy link

Abhijit-2592 commented Jan 3, 2019

@dberma15 focal loss doesn't work as in, it doesn't converge or implementation error? I feel it is the latter. Because, of 2 major issues. It shouldn't use numpy and implementation of cross entropy loss is flawed

@Abhijit-2592
Copy link

Abhijit-2592 commented Jan 7, 2019

import tensorflow as tf

K = tf.keras.backend


class FocalLoss(object):
    def __init__(self, gamma=2, alpha=0.25):
        self._gamma = gamma
        self._alpha = alpha

    def compute_loss(self, y_true, y_pred):
        cross_entropy_loss = K.binary_crossentropy(y_true, y_pred, from_logits=False)
        p_t = ((y_true * y_pred) +
               ((1 - y_true) * (1 - y_pred)))
        modulating_factor = 1.0
        if self._gamma:
            modulating_factor = tf.pow(1.0 - p_t, self._gamma)
        alpha_weight_factor = 1.0
        if self._alpha is not None:
            alpha_weight_factor = (y_true * self._alpha +
                                   (1 - y_true) * (1 - self._alpha))
        focal_cross_entropy_loss = (modulating_factor * alpha_weight_factor *
                                    cross_entropy_loss)
        return K.mean(focal_cross_entropy_loss, axis=-1)

@MrSnappingTurtle and @dberma15
My implementation of focal loss for Keras

@randomwalker42
Copy link

For the multi-label classification, you can try tanh+hinge with {-1, 1} values in labels like (1, -1, -1, 1).
Or sigmoid + hamming loss with {0, 1} values in labels like (1, 0, 0, 1).
In my case, sigmoid + focal loss with {0, 1} values in labels like (1, 0, 0, 1) worked well.
You can check this paper https://arxiv.org/abs/1708.02002.

@daniel410 Hi, would you mind sharing how you implement your focal loss for the multi-label task, if it's not too much trouble?

@talhaanwarch
Copy link

    def compute_loss(self, y_true, y_pred):
        cross_entropy_loss = K.binary_crossentropy(y_true, y_pred, from_logits=False)
        p_t = ((y_true * y_pred) +
               ((1 - y_true) * (1 - y_pred)))
        modulating_factor = 1.0
        if self._gamma:
            modulating_factor = tf.pow(1.0 - p_t, self._gamma)
        alpha_weight_factor = 1.0
        if self._alpha is not None:
            alpha_weight_factor = (y_true * self._alpha +
                                   (1 - y_true) * (1 - self._alpha))
        focal_cross_entropy_loss = (modulating_factor * alpha_weight_factor *
                                    cross_entropy_loss)
        return K.mean(focal_cross_entropy_loss, axis=-1)

@Abhijit-2592 is it for multilabel classification?

it give me error
AttributeError: 'FocalLoss' object has no attribute 'get_shape'
i used it as follow
model.compile('adam',loss=FocalLoss,metrics=['accuracy'])

@sushanttripathy
Copy link

You can try my implementation and let me know if it works. https://github.com/sushanttripathy/Keras_loss_functions/blob/master/focal_loss.py

@Vishnux0pa
Copy link

@sushanttripathy: I tried your code and it works but the output is a tensor focal_loss_tensor is of 2d array. Should I take a mean to arrive at the final loss?

@sushanttripathy
Copy link

@Vishnux0pa I am not sure if the auto-differentiation requires me to provide the loss per sample (instead of per batch). I looked at categorical_crossentropy, and it seemed like that's what it was doing.

I did not get convergence with the earlier version of the loss (the one that yielded a scalar). It does converge with this one though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests