TFSequenceClassificationLoss for MultiLabel classification #30792

ds-mike · 2024-05-13T19:45:33Z

System Info

transformers version: 4.40.2
Platform: Linux-6.1.84-x86_64-with-glibc2.31
Python version: 3.11.9
Huggingface_hub version: 0.23.0
Safetensors version: 0.4.3
Accelerate version: not installed
Accelerate config: not found
PyTorch version (GPU?): not installed (NA)
Tensorflow version (GPU?): 2.16.1 (False)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: no
Using distributed or parallel set-up in script?: no

Who can help?

Tagging (up to) three individuals who may be able to help:
@ArthurZucker (Text Models), @gante (Tensorflow), @Rocketknight1 (Tensorflow examples)

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Hi,

I have a multi-label classification situation that I am experimenting with using BertForSequenceClassification as a base model for. I am basically declaring the model like the following, where my number of classes is 36. The individual training sample could have several labels.

model = TFBertForSequenceClassification.from_pretrained(
    <my path>,
    problem_type="multi_label_classification",
    label2id=labels["label2id"],
    id2label=labels["id2label"],
    num_labels=num_classes)
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True))

My inputs are as follows:

Inputs: {'input_ids': <tf.Tensor: shape=(1, 9), dtype=int32, numpy=
 array([[ 101, 2061, 2061, 2172, 3793, 2302, 2151, 3574,  102]],
       dtype=int32)>,
 'token_type_ids': <tf.Tensor: shape=(1, 9), dtype=int32, numpy=array([[0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)>,
 'attention_mask': <tf.Tensor: shape=(1, 9), dtype=int32, numpy=array([[1, 1, 1, 1, 1, 1, 1, 1, 1]], dtype=int32)>}
Labels: <tf.Tensor: shape=(1, 36), dtype=int64, numpy=
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]])>

When I try to call the model with the labels present, only then do I get the following error:

`labels.shape` must equal `logits.shape` except for the last dimension. Received: labels.shape=(36,) and logits.shape=(1, 36)

It seems like it is because SparseCategoricalCrossentropy() is getting set as the loss function here. If I manually override this to what I understand to be a more appropriate loss function, in this case Binary Crossentropy, loss computes as expected.

I tested the torch code here and it seems to work fine. I get something like the following: tensor(0.8118, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>).

I believe the TF loss function should be changed in the line described above. Furthermore, compiling the model as I did in my code doesn't seem to be doing anything. Please let me know where I have gone wrong or if there are other details I can provide.

Thank you!
Mike

Expected behavior

Loss should be computed with binary crossentropy, given that it is a multi-label classification scenario and I am passing a onehot-encoded vector.

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2024-05-14T13:04:48Z

Hi @ds-mike, our TF models don't support multi-label classification with the default loss (I'm not sure Torch does either - you got a loss value there but I don't know if it's the correct one!)

Our TFXXXForSequenceClassification models expect labels to just be integer category indices, so if num_labels=36, then the label should be an integer between 0 and 35. This is by design - if we switched to binary crossentropy, then lots of people's existing code wouldn't work anymore!

Multi-label classification with these models is possible, but you'll need to either compile with a new loss function like binary crossentropy as you did there, or for more advanced use cases you can use the base TFBertModel as a layer in your own model class, and add a custom output head onto it that can have whatever shape and loss you want.

In other words, I don't think you're making any mistake here - you just have a slightly advanced use-case that requires a bit of custom loss computation rather than relying on the built-in losses, and that's okay!

amyeroberts added TensorFlow Anything TensorFlow Examples Which is related to examples in general labels May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TFSequenceClassificationLoss for MultiLabel classification #30792

TFSequenceClassificationLoss for MultiLabel classification #30792

ds-mike commented May 13, 2024 •

edited

Rocketknight1 commented May 14, 2024

TFSequenceClassificationLoss for MultiLabel classification #30792

TFSequenceClassificationLoss for MultiLabel classification #30792

Comments

ds-mike commented May 13, 2024 • edited

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Rocketknight1 commented May 14, 2024

ds-mike commented May 13, 2024 •

edited