New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distributed launch raise Error #7160
Comments
Hey @xixiaoyao, Could you please copy paste the command you used to run squad so that I can be 100% sure we are running the same command? How did you enable Would be great if you can copy-paste a runnable code snippet here :-) |
Getting the same issue when using Reformer with pytorch-lightning's distributeddataparallel, although not using one of the official training scripts. |
I am also getting this exact same error with Reformer, but only when I wrap it with DDP and then train across multiple GPUs on the same box. I do not get this error with Longformer. If I don't use DDP with Reformer, then it works fine. Am doing vanilla AR language model training using a custom script. But my script works fine when used on a single GPU with no DDP. The error seems to indicate that there's something about Reformer which DDP does not yet support: "RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the Am using transformers 3.5.1. |
same with @trias702 |
This issue has been automatically marked as stale and been closed because it has not had recent activity. Thank you for your contributions. If you think this still needs to be addressed please comment on this thread. |
same with @trias702 @yuanenming |
Environment info
transformers
version:Who can help
Longformer/Reformer: @patrickvonplaten
-->
Information
Model I am using (LongformerForQuestionAnswering):
The problem arises when using:
The tasks I am working on is:
To reproduce
Steps to reproduce the behavior:
Expected behavior
The text was updated successfully, but these errors were encountered: