-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[trainer] make generate work with multigpu #8716
Conversation
It did - thank you! This is a very ambiguous situation for a user who wants to use HF trainer in their code. When to use What happens here is Here are some possible solutions to resolve this ambiguity:
|
We can certainly improve the documentation and the debugging experience. I think I prefer the solution 2 since 1. is too magic (so will probably make things harder to debug) and 3 is not compatible with the regular Doing |
Did you mean to say "needs the wrapped model"? Unless I'm misreading what you wrote 3rd solution is the right one, since the Trainer doesn't do anything with the wrapped model. I don't know though whether this is so everywhere. The 4th solution is passing
Except it won't be wrapped per se most of the time - very confusing to the user. Currently it should be called |
I meant the wrapped model, sorry. |
I'm getting this issue too using a T5 Model on multiple gpus
Is this supposed to be resolved? I've never seen this before. I've tried with 4.10.0 as well as current master branch |
Is it possible you somehow have a really old If not, as always we need a way to reproduce the problem as the first step. And ideally in a new issue so that it can be tracked. But you can also see the fix in this PR and try to trace it to where the Thank you. |
This PR:
Chances are that this would be the same problem with any other
model.foo
calls as this is not the first time this is happening. i.e. the base model class most likely needs to made aware ofDataParallel
and transparently get themodel
at the calling point.@sgugger, @LysandreJik, @patrickvonplaten
Fixes: #8713