-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
Description
Version
Microsoft Windows [Version 10.0.19044.1706]
WSL Version
- WSL 2
- WSL 1
Kernel Version
5.10.102.1
Distro Version
Ubuntu 20.04
Other Software
No response
Repro Steps
pytorch 1.10.0+ cuda 10.2(from wsl2 kernel)
Expected Behavior
Work normal for multi-gpu usage
Actual Behavior
When training with pytorch using multiple gpus, it will crash about 10-20 minutes.
I have try 2 cases.
First, train with multiple gpus with ddp in pytorch.
Second, train with one gpu in pytorch and open 2 program to train different model.
Both of them will crash in 20 minutes.
Diagnostic Logs
No response
jdonley