New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When labels is integer instead of float, the training crashes in MPS / CUDA. Could this be validated and fail with a better error higher up the stack ? #27707
Comments
Hey! Thanks for raising the issue! 🤗 |
Hi @ArthurZucker , thanks for taking the look! This actually does not work on CUDA as well, resulting in:
See https://www.kaggle.com/przem8k/transformers-issue-27707-re-when-label-is-int Given that it breaks on both Metal and CUDA I assumed it's not supported. Do you think the issue may be specific to the microsoft/deberta-v3-small model ? |
Hello @przem8k, let's take a step back here and understand the loss function implemented in https://huggingface.co/microsoft/deberta-v3-small. If we go to the modeling file and check below lines:
We observe that it is using MSELoss as the Now, if you change the
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
@pacman100 that makes a lot of sense, thank you for taking the time to point this out! I think what ultimately confused me was 'num_labels' -> I thought it's the number of resulting labels (in this case we only apply one label), but I now understand it's the number of different possible label values. I took some notes here. Thank you again! |
System Info
transformers
version: 4.31.0Who can help?
@muellerzr @pacman100
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
When running on Apple Sillicon Mac, the repro above crashes with MPS crash:
EDIT: When running on CUDA, the error is
RuntimeError: "mse_cuda" not implemented for 'Long'
, see this notebookafter much head-banging I realized that the issue is just that in my sample data, labels are integers instead of floats.
If integer labels are not supported, could this be validated and fail with a better error higher up the stack ?
Thanks a lot for all the awesome work on transformers 🥳, I'm having a lot of fun learning the library !
The text was updated successfully, but these errors were encountered: