Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move tensors to same device to enable IDEFICS naive MP training #27746

Merged
merged 1 commit into from
Dec 5, 2023

Conversation

willemsenbram
Copy link
Contributor

What does this PR do?

Fixes #27736

logits, labels, and attention_mask are expected to be on the same device.
This PR moves these tensors to the same device, making IDEFICS training compatible with naive model parallelism.

See also #22561

@amyeroberts @ArthurZucker @younesbelkada

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks for adding this fix!

@ArthurZucker would setting the attentions here to the logits device cause any issues?

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks alright! Would be good to add a test for this IMO @amyeroberts but otherwise good to go

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot !

@younesbelkada
Copy link
Contributor

For the test I suggest to do it in a follow-up PR to test that for all architectures, through a test that runs on the daily CI

@ArthurZucker ArthurZucker merged commit e5c12c0 into huggingface:main Dec 5, 2023
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RuntimeError(s) when attempting multi-GPU fine-tuning of IDEFICS with naive model parallelism
4 participants