New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when using gloo as DDP backend #30
Comments
@Saltychtao Hey! So I recently fixed a bug with distributed training (and yet another one this morning in regards to What version are you on? Could you retry on the latest version, and if that doesn't work, send me a script for reproducing the error? |
@lucidrains Hello, I have tried the newest version of the code, however it still hangs on when using |
I am using this library in |
Hello! Thank you for your great work on implementing VQ layer. When I use the VQ layer in DDP mode and use
gloo
as the backend as suggested in README, I got the following error:terminate called after throwing an instance of 'gloo::EnforceNotMet' what(): [enforce fail at ../third_party/gloo/gloo/transport/tcp/pair.cc:510] op.preamble.length <= op.nbytes. 8773632 vs 8386560
Do you have any ideas on how to solve this problem?
I also tried to use
nccl
as the backend, however the program only hangs forever...The text was updated successfully, but these errors were encountered: