Skip to content

Moving tensor to GPU by .cuda() gets stucked when AMD Secure Encripted Virtualization (SEV) is activated #88189

@chichidd

Description

@chichidd

🐛 Describe the bug

My workstation is equipped with an AMD Threadripper 3960x and GPUs of 30 series, with Ubuntu 22.04.1 installed.

Before I activated the AMD SEV, everything is normal. After I activated the AMD SEV by:

  1. change the line <GRUB_CMDLINE_LINUX_DEFAULT="quite splash"> to <GRUB_CMDLINE_LINUX_DEFAULT="quite splash mem_encrypt=on kvm_amd.sev=1"> in /etc/default/grub to enable AMD SEV.
  2. run 'sudo update-grub'.
  3. reboot.

The ".to('cuda')" or ".cuda()" operations get stucked for Pytorch tensor: I tested in the terminal

import torch
a = torch.rand(5).cuda()

and got stucked (no response for a long time). Meanwhile, I founded that the GPU memory does not change and there was only one CPU core busy.

After I disabled the AMD SEV by restoring <GRUB_CMDLINE_LINUX_DEFAULT="quite splash"> in /etc/default/grub, the bug disappeared.

Versions

CPU: AMD Threadripper 3960x
OS: Ubuntu 22.04.01
Linux kernel: 5.15.0-52-generic
GCC version: gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

Tested driver: nvidia-driver-510, nvidia-driver-520
Tested pytorch version: 1.12, 1.3 (latest)
Tested python version: 3.10.4, 3.10.6.

cc @ngimel @jeffdaily @sunway513 @jithunnair-amd @ROCmSupport

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: cudaRelated to torch.cuda, and CUDA support in generalmodule: rocmAMD GPU support for PytorchtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions