torch.cuda.OutOfMemoryError to infer for a large protein #4

yingnan-hou · 2024-04-11T09:29:34Z

Hi, when I have attempted to infer (not train) a protein with a sequence length of 525 using Str2Str, a memory error occurred as following:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.19 GiB. GPU 0 has a total capacity of 44.40 GiB of which 4.20 GiB is free. Including non-PyTorch memory, this process has 40.20 GiB memory in use. Of the allocated memory 39.85 GiB is allocated by PyTorch, and 40.98 MiB is reserved by PyTorch but unallocated.
The code was conducted on the NVIDIA A40-Xeon-48GB GPU. Is there any way that I can successfully infer this protein using Str2Str within the constraints of this computing resources? Looking forward to your reply, Thanks.

The text was updated successfully, but these errors were encountered:

lujiarui · 2024-04-11T15:47:50Z

Hi @yingnan-hou, for CUDA mem issue in general, you may use lower precision such as fp16 without changing any code on the same device. Besides, note that the inference script will use an default inference batch size larger than one (see this), if this is the case, you may use smaller batch size. Hopefully it addressed your concern.

lujiarui closed this as completed Apr 12, 2024

lujiarui mentioned this issue May 30, 2024

Is there any recommended configuration for training Str2Str? #5

Closed

Hhuangsj mentioned this issue Jun 22, 2024

Change default hparam for sampling s2011 #7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch.cuda.OutOfMemoryError to infer for a large protein #4

torch.cuda.OutOfMemoryError to infer for a large protein #4

yingnan-hou commented Apr 11, 2024

lujiarui commented Apr 11, 2024

torch.cuda.OutOfMemoryError to infer for a large protein #4

torch.cuda.OutOfMemoryError to infer for a large protein #4

Comments

yingnan-hou commented Apr 11, 2024

lujiarui commented Apr 11, 2024