-
Notifications
You must be signed in to change notification settings - Fork 24.9k
Description
In the example below, after calling torch.matmul, the gpu memory usage increases by 181796864 bytes, which is almost the sum of the sizes of c and b.transpose(2,3). So I guess the unreferenced intermediate result b.transpose(2,3) is stored in gpu memory. How could I release the gpu memory allocated to this intermediate result to save gpu memory?
import torch
from torch.autograd import Variable
a = Variable(torch.rand(32, 8, 151, 1024), requires_grad=True).cuda()
b = Variable(torch.rand(32, 8, 151, 1024), requires_grad=True).cuda()
torch.cuda.memory_allocated(0) # 316669952
c=torch.matmul(a, b.transpose(2,3))
torch.cuda.memory_allocated(0) # 498466816, increased by 181796864
c.element_size() * c.nelement() # 23348224
b.transpose(2,3).element_size() * b.transpose(2,3).nelement() #158334976
Environment
- PyTorch Version (e.g., 1.0): 1.0.1
- OS (e.g., Linux): centos
- How you installed PyTorch (
conda
,pip
, source): pip - Build command you used (if compiling from source):
- Python version: 3.6.9
- CUDA/cuDNN version: cuda9.2/cudnn7.4.2
- GPU models and configuration:NVIDIA 1080TI
- Any other relevant information:
cc @ngimel