-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
🐛 Describe the bug
My workstation is equipped with an AMD Threadripper 3960x and GPUs of 30 series, with Ubuntu 22.04.1 installed.
Before I activated the AMD SEV, everything is normal. After I activated the AMD SEV by:
- change the line <GRUB_CMDLINE_LINUX_DEFAULT="quite splash"> to <GRUB_CMDLINE_LINUX_DEFAULT="quite splash mem_encrypt=on kvm_amd.sev=1"> in /etc/default/grub to enable AMD SEV.
- run 'sudo update-grub'.
- reboot.
The ".to('cuda')" or ".cuda()" operations get stucked for Pytorch tensor: I tested in the terminal
import torch
a = torch.rand(5).cuda()
and got stucked (no response for a long time). Meanwhile, I founded that the GPU memory does not change and there was only one CPU core busy.
After I disabled the AMD SEV by restoring <GRUB_CMDLINE_LINUX_DEFAULT="quite splash"> in /etc/default/grub, the bug disappeared.
Versions
CPU: AMD Threadripper 3960x
OS: Ubuntu 22.04.01
Linux kernel: 5.15.0-52-generic
GCC version: gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Tested driver: nvidia-driver-510, nvidia-driver-520
Tested pytorch version: 1.12, 1.3 (latest)
Tested python version: 3.10.4, 3.10.6.
cc @ngimel @jeffdaily @sunway513 @jithunnair-amd @ROCmSupport