New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mismatched ALLREDUCE CPU/GPU #748
Comments
@weiwenjiang, can you run |
hi, I got the same error. here is dmseg log. |
Same issue :( |
@ssstuvz, can you check whether there are any errors in |
The same error, when I try to
when run dmesg, the output as these:
I try to communicate with two computer with different NVIDIA, one is 980Ti with num 1, the other is 2080Ti with num 2, my friend run with 2080Ti without any problem, but when I try with different NVIDIA, it get error, the horovod is not support with different NVIDIA? or any suggestion to solve this isses? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I have 4 nodes, each with 2 GPUs.
Nodes 1-3 work smoothly, the 4th node can work well independently using itself 2 GPUs.
But when the 4th node work with others, errors come out. It reports the error of "Mismatched ALLREDUCE CPU/GPU", as follows.
[1,1]:tensorflow.python.framework.errors_impl.FailedPreconditionError: Mismatched ALLREDUCE CPU/GPU device selection: One rank specified device GPU, but another rank specified device CPU.
[1,1]: [[{{node training/Adadelta/DistributedAdadelta_Allreduce/HorovodAllreduce_training_Adadelta_gradients_dense_2_MatMul_grad_MatMul_1_0}} = HorovodAllreduceT=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
I have checked the existing solutions, such as reinstall tensorflow-gpu, horovod, NCCL, but the problem cannot be solved.
The text was updated successfully, but these errors were encountered: