Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When labels is integer instead of float, the training crashes in MPS / CUDA. Could this be validated and fail with a better error higher up the stack ? #27707

Closed
4 tasks
przem8k opened this issue Nov 26, 2023 · 5 comments
Labels

Comments

@przem8k
Copy link

przem8k commented Nov 26, 2023

System Info

  • transformers version: 4.31.0
  • Platform: macOS-14.1.1-arm64-arm-64bit
  • Python version: 3.11.4
  • Huggingface_hub version: 0.16.4
  • Safetensors version: 0.3.1
  • Accelerate version: 0.21.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.1.0.dev20230804 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: yes MPS
  • Using distributed or parallel set-up in script?: No

Who can help?

@muellerzr @pacman100

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

import pandas as pd
from datasets import Dataset
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from io import StringIO
from transformers import TrainingArguments, Trainer

sample_data = """input,labels
bazinga,0
please-just-work,1
"""

df = pd.read_csv(StringIO(sample_data))
ds = Dataset.from_pandas(df)

model_name = "microsoft/deberta-v3-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)

def tokenizer_func(x): return tokenizer(x["input"])
ds_tokenized = ds.map(tokenizer_func, batched=True)
dds = ds_tokenized.train_test_split(0.2, seed=42)

bs = 16
epochs = 4
lr = 8e-5

args = TrainingArguments(
    "outputs",
    learning_rate=lr,
    warmup_ratio=0.1,
    lr_scheduler_type="cosine",
    fp16=False,
    evaluation_strategy="epoch",
    per_device_train_batch_size=bs,
    per_device_eval_batch_size=bs * 2,
    num_train_epochs=epochs,
    weight_decay=0.01,
    report_to="none",
)

model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=1)
trainer = Trainer(
    model,
    args,
    train_dataset=dds["train"],
    eval_dataset=dds["test"],
    tokenizer=tokenizer,
)

# This crashes
trainer.train()

Expected behavior

When running on Apple Sillicon Mac, the repro above crashes with MPS crash:

2023-11-25 09:29:20.582 Python[15924:310024] Error getting visible function:
 (null) Function square_i64 was not found in the library
/AppleInternal/Library/BuildRoots/495c257e-668e-11ee-93ce-926038f30c31/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Utility/MPSKernelDAG.mm:805: failed assertion `Error getting visible function:

EDIT: When running on CUDA, the error is RuntimeError: "mse_cuda" not implemented for 'Long', see this notebook

after much head-banging I realized that the issue is just that in my sample data, labels are integers instead of floats.

If integer labels are not supported, could this be validated and fail with a better error higher up the stack ?

Thanks a lot for all the awesome work on transformers 🥳, I'm having a lot of fun learning the library !

@ArthurZucker
Copy link
Collaborator

Hey! Thanks for raising the issue! 🤗
It's kind of hard to have architecture specific errors for something that works very well anywhere else! specifically given that it makes more sense for labels to be int rather than floats 😅 So not really in favor of this

@przem8k
Copy link
Author

przem8k commented Nov 28, 2023

Hi @ArthurZucker , thanks for taking the look! This actually does not work on CUDA as well, resulting in:

RuntimeError: "mse_cuda" not implemented for 'Long'

See https://www.kaggle.com/przem8k/transformers-issue-27707-re-when-label-is-int

Given that it breaks on both Metal and CUDA I assumed it's not supported. Do you think the issue may be specific to the microsoft/deberta-v3-small model ?

@przem8k przem8k changed the title When labels is integer instead of float, the training crashes in MPS. Could this be validated and fail with a better error higher up the stack ? When labels is integer instead of float, the training crashes in MPS / CUDA. Could this be validated and fail with a better error higher up the stack ? Nov 28, 2023
@pacman100
Copy link
Contributor

Hello @przem8k, let's take a step back here and understand the loss function implemented in https://huggingface.co/microsoft/deberta-v3-small. If we go to the modeling file and check below lines:

We observe that it is using MSELoss as the num_labels is 1 and as such it is thinking of this task as regression task instead of classification. For regression tasks, the label is a float and so is the prediction.

Now, if you change the num_labels=2 as shown below which is the default and fits your usecase of binary classification. In this case, training happens as expected because it now uses CrossEntropyLoss which accepts integer labels.

model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

Screenshot 2023-11-29 at 1 17 27 PM

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this as completed Jan 4, 2024
@przem8k
Copy link
Author

przem8k commented Jan 21, 2024

@pacman100 that makes a lot of sense, thank you for taking the time to point this out!

I think what ultimately confused me was 'num_labels' -> I thought it's the number of resulting labels (in this case we only apply one label), but I now understand it's the number of different possible label values. I took some notes here.

Thank you again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants