Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Distributed package doesn't have NCCL built in (On Windows machine) #2

Closed
justinjohn0306 opened this issue Jan 17, 2023 · 4 comments

Comments

@justinjohn0306
Copy link

Capture
Any ideas on how to fix this?

@leng-yue
Copy link
Member

This is probably a PyTorch / Lightning DDP environment problem, I believe there are some solutions on Stack overflow :)

@leng-yue leng-yue reopened this Jan 19, 2023
@leng-yue
Copy link
Member

Please don't send emails directly to my mailbox :) Using GitHub issues can help others to know and solve problems.

Original Email: Windows don't have NCCL if you can switch to gloo it might do the trick but I have no idea how to do that

To use gloo backend, you can edit the trainer config: configs/_base_/trainers/base.py

strategy=DDPStrategy(find_unused_parameters=True, process_group_backend="gloo"),

You can find it in pytorch-lightning's document

@justinjohn0306
Copy link
Author

Capture
Yes! That did the trick. Thanks :)

@leng-yue
Copy link
Member

Sounds good :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants