New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expected to have finished reduction in the prior iteration before starting a new one. #24
Comments
Hi there! For your first problem, you have to set
This will still let your script run on one GPU/CPU (it will just be ignored then) and when in distributed training, should fix your first issue. If the error only appears when
For the second issue, the model is not the same once you have passed it to
so replace your generate line with:
|
@sgugger that worked, thanks! |
@sgugger on a related note, I am using
Is this the correct way to do it? Or should I specifically tell the |
This is the proper way indeed! You don't have to use the metrics distributed part since |
Cool! Lastly, I am saving optimizer, etc. with |
Yes, for the model you should first make sure every process finished training with
(it should work without but let's be cautious). Then you should unwrap the model to get the
Then pass along the
Note that there are more examples using accelerate in the official Transformers examples if you want to see more usecases of the library :-) |
That's great, thanks for all the awesome work you do! |
(hi Rahul :)) @sgugger, is there a way to pass find_unused_parameters to accelerate just through the config file? |
No, only with the |
@sgugger This answer really helped. |
@sgugger thank you so much! worked like magic! |
I have modified the
nlp_example
to finetune anEncoderDecoder
on translation data like this:I am getting the following error during training
and the following during generation
Both are working fine if I change the configuration to use only one GPU using
accelerate config
The text was updated successfully, but these errors were encountered: