When labels is integer instead of float, the training crashes in MPS / CUDA. Could this be validated and fail with a better error higher up the stack ? #27707

przem8k · 2023-11-26T10:14:14Z

System Info

transformers version: 4.31.0
Platform: macOS-14.1.1-arm64-arm-64bit
Python version: 3.11.4
Huggingface_hub version: 0.16.4
Safetensors version: 0.3.1
Accelerate version: 0.21.0
Accelerate config: not found
PyTorch version (GPU?): 2.1.0.dev20230804 (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: yes MPS
Using distributed or parallel set-up in script?: No

Who can help?

@muellerzr @pacman100

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

import pandas as pd
from datasets import Dataset
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from io import StringIO
from transformers import TrainingArguments, Trainer

sample_data = """input,labels
bazinga,0
please-just-work,1
"""

df = pd.read_csv(StringIO(sample_data))
ds = Dataset.from_pandas(df)

model_name = "microsoft/deberta-v3-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)

def tokenizer_func(x): return tokenizer(x["input"])
ds_tokenized = ds.map(tokenizer_func, batched=True)
dds = ds_tokenized.train_test_split(0.2, seed=42)

bs = 16
epochs = 4
lr = 8e-5

args = TrainingArguments(
    "outputs",
    learning_rate=lr,
    warmup_ratio=0.1,
    lr_scheduler_type="cosine",
    fp16=False,
    evaluation_strategy="epoch",
    per_device_train_batch_size=bs,
    per_device_eval_batch_size=bs * 2,
    num_train_epochs=epochs,
    weight_decay=0.01,
    report_to="none",
)

model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=1)
trainer = Trainer(
    model,
    args,
    train_dataset=dds["train"],
    eval_dataset=dds["test"],
    tokenizer=tokenizer,
)

# This crashes
trainer.train()

Expected behavior

When running on Apple Sillicon Mac, the repro above crashes with MPS crash:

2023-11-25 09:29:20.582 Python[15924:310024] Error getting visible function:
 (null) Function square_i64 was not found in the library
/AppleInternal/Library/BuildRoots/495c257e-668e-11ee-93ce-926038f30c31/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Utility/MPSKernelDAG.mm:805: failed assertion `Error getting visible function:

EDIT: When running on CUDA, the error is RuntimeError: "mse_cuda" not implemented for 'Long', see this notebook

after much head-banging I realized that the issue is just that in my sample data, labels are integers instead of floats.

If integer labels are not supported, could this be validated and fail with a better error higher up the stack ?

Thanks a lot for all the awesome work on transformers 🥳, I'm having a lot of fun learning the library !

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2023-11-27T09:45:48Z

Hey! Thanks for raising the issue! 🤗
It's kind of hard to have architecture specific errors for something that works very well anywhere else! specifically given that it makes more sense for labels to be int rather than floats 😅 So not really in favor of this

przem8k · 2023-11-28T22:48:13Z

Hi @ArthurZucker , thanks for taking the look! This actually does not work on CUDA as well, resulting in:

RuntimeError: "mse_cuda" not implemented for 'Long'

See https://www.kaggle.com/przem8k/transformers-issue-27707-re-when-label-is-int

Given that it breaks on both Metal and CUDA I assumed it's not supported. Do you think the issue may be specific to the microsoft/deberta-v3-small model ?

pacman100 · 2023-11-29T07:48:47Z

Hello @przem8k, let's take a step back here and understand the loss function implemented in https://huggingface.co/microsoft/deberta-v3-small. If we go to the modeling file and check below lines:

transformers/src/transformers/models/deberta_v2/modeling_deberta_v2.py

Line 1321 in bd50402

loss_fn = nn.MSELoss()

We observe that it is using MSELoss as the num_labels is 1 and as such it is thinking of this task as regression task instead of classification. For regression tasks, the label is a float and so is the prediction.

Now, if you change the num_labels=2 as shown below which is the default and fits your usecase of binary classification. In this case, training happens as expected because it now uses CrossEntropyLoss which accepts integer labels.

model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

github-actions · 2023-12-27T08:03:29Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

przem8k · 2024-01-21T11:03:02Z

@pacman100 that makes a lot of sense, thank you for taking the time to point this out!

I think what ultimately confused me was 'num_labels' -> I thought it's the number of resulting labels (in this case we only apply one label), but I now understand it's the number of different possible label values. I took some notes here.

Thank you again!

przem8k mentioned this issue Nov 26, 2023

MPS: transformers crashing in executeMPSGraph with Function square_i64 was not found in the library pytorch/pytorch#114532

Closed

pacman100 added the solved label Nov 29, 2023

github-actions bot closed this as completed Jan 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When labels is integer instead of float, the training crashes in MPS / CUDA. Could this be validated and fail with a better error higher up the stack ? #27707

When labels is integer instead of float, the training crashes in MPS / CUDA. Could this be validated and fail with a better error higher up the stack ? #27707

przem8k commented Nov 26, 2023 •

edited

ArthurZucker commented Nov 27, 2023

przem8k commented Nov 28, 2023 •

edited

pacman100 commented Nov 29, 2023

github-actions bot commented Dec 27, 2023

przem8k commented Jan 21, 2024

When labels is integer instead of float, the training crashes in MPS / CUDA. Could this be validated and fail with a better error higher up the stack ? #27707

When labels is integer instead of float, the training crashes in MPS / CUDA. Could this be validated and fail with a better error higher up the stack ? #27707

Comments

przem8k commented Nov 26, 2023 • edited

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

ArthurZucker commented Nov 27, 2023

przem8k commented Nov 28, 2023 • edited

pacman100 commented Nov 29, 2023

github-actions bot commented Dec 27, 2023

przem8k commented Jan 21, 2024

przem8k commented Nov 26, 2023 •

edited

przem8k commented Nov 28, 2023 •

edited