# Hugging Face Transformers Memory Leak Demo
This notebook demonstrates the possiblity of GPU memory leaks when using Huggingface's Transformers.

In [None]:
import torch # PyTorch
import gc # Garbage Collector
import os
from transformers import AutoModelForCausalLM, AutoTokenizer
HF_CACHE_LOCATION = os.getenv("HF_CACHE_DIR")

torch_device = "cuda" if torch.cuda.is_available() else "cpu"



Before performing inference, we can check the GPU memory usage using `nvidia-smi` command. We can see that the GPU memory usage is 0% before running the code.

In [None]:
!nvidia-smi

We can also verify that PyTorch is not using any GPU memory by running `torch.cuda.memory_allocated()`.

In [None]:
print("CUDA Memory: " + str(torch.cuda.memory_reserved())) # Expect Zero

In [None]:
prompt = "Functional Programming is"
draft_model_name = "facebook/opt-125m"
model = AutoModelForCausalLM.from_pretrained(checkpoint, cache_dir = HF_CACHE_LOCATION).to(torch_device)
tokenizer = AutoTokenizer.from_pretrained(checkpoint, cache_dir = HF_CACHE_LOCATION)
inputs = tokenizer(prompt, return_tensors="pt").input_ids.to(torch_device)

with torch.no_grad():
    output = model.generate(inputs, max_length= 100)
    print(tokenizer.batch_decode(output, skip_special_tokens=True))
    del output

del prompt, torch_device, checkpoint, model, tokenizer, inputs
gc.collect()
torch.cuda.empty_cache()
