How to release gpu  memory  of intermediate result tensor

In the example below, after calling torch.matmul, the gpu memory usage increases by  181796864 bytes, which is almost the sum of the sizes of c and b.transpose(2,3).  So I guess the unreferenced intermediate result b.transpose(2,3) is stored in gpu memory.  How could I release the gpu memory allocated to this intermediate result to save gpu memory?


import torch
from torch.autograd import Variable
a = Variable(torch.rand(32, 8, 151, 1024), requires_grad=True).cuda()
b = Variable(torch.rand(32, 8, 151, 1024), requires_grad=True).cuda()
torch.cuda.memory_allocated(0) # 316669952
c=torch.matmul(a, b.transpose(2,3))
torch.cuda.memory_allocated(0) # 498466816, increased by 181796864
c.element_size() * c.nelement() # 23348224
b.transpose(2,3).element_size() * b.transpose(2,3).nelement() #158334976

## Environment

 - PyTorch Version (e.g., 1.0): 1.0.1
 - OS (e.g., Linux): centos
 - How you installed PyTorch (`conda`, `pip`, source): pip
 - Build command you used (if compiling from source):
 - Python version: 3.6.9
 - CUDA/cuDNN version: cuda9.2/cudnn7.4.2
 - GPU models and configuration:NVIDIA 1080TI
 - Any other relevant information:



cc @ngimel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to release gpu memory of intermediate result tensor #29802

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to release gpu memory of intermediate result tensor #29802

Description

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions