-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
All tensors must be on devices[0]: 0 #177
Comments
Hi, did you build colossalai from source? |
yes, using |
Ok, let me try to reproduce this error. May I know which GPU you are using and how many GPUs are available on your machine? |
Tesla-V100 and 2GPUs are avaliable in my machine. |
Got, let me try to reproduce this issue. I will get back to you soon! |
Hi, sorry for my late reply. We only got A100 machines so it took a while for me to look for a V100 machine. This bug can be reproduced on torch 1.8 but not torch 1.10. This bug is due to an optional argument in pytorch DistributedDataParallel. This bug will be fixed in #194 . |
馃悰 Describe the bug
For https://github.com/hpcaitech/ColossalAI-Examples/tree/main/image/resnet, when use
python -m torch.distributed.launch --nproc_per_node 2 --master_addr localhost --master_port 29500 run_resnet_cifar10_with_engine.py
, there is an error thatAll tensors must be on devices[0]: 0
Environment
torch=1.8.1
The text was updated successfully, but these errors were encountered: