Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing memory/deleting a model: how to properly do this #6753

Closed
2 of 4 tasks
yakazimir opened this issue Aug 26, 2020 · 5 comments
Closed
2 of 4 tasks

Removing memory/deleting a model: how to properly do this #6753

yakazimir opened this issue Aug 26, 2020 · 5 comments

Comments

@yakazimir
Copy link

yakazimir commented Aug 26, 2020

Environment info

  • transformers version: 2.11.0
  • Platform:
  • Python version: 3.6.7
  • PyTorch version (GPU?): 1.4.0
  • Tensorflow version (GPU?):
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: no

Who can help

@patrickvonplaten

Information

Model I am using (Bert, XLNet ...): T5-large, T5-3b, bert-base-uncased

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

  1. Load a model
  2. Try to remove it via del, clear GPU memory and cache
import torch
from transformers import AutoTokenizer, AutoModelWithLMHead
model = AutoModelWithLMHead.from_pretrained("t5-large") # same behavior for `bert-base-uncased`, larger T5 models..
model = model.cuda()
model = model.train()

## delete model 
del model 
torch._C._cuda_emptyCache()
## alternatively 
# with torch.cuda.device("cuda:0"): 
#   ...:     torch.cuda.empty_cache()

## (as per the discussion here: https://discuss.pytorch.org/t/how-to-debug-causes-of-gpu-memory-leaks/6741/3, seeing all the hanging tensors)
import gc
for obj in gc.get_objects():
    try:
        if torch.is_tensor(obj) or (hasattr(obj, 'data') and torch.is_tensor(obj.data)):
            print(type(obj), obj.size())
    except:
        pass

Expected behavior

I would expect this to clear the GPU memory, though the tensors still seem to linger (fuller context: In a larger Pytorch-Lightning script, I'm simply trying to re-load the best model after training (and exiting the pl.Trainer) to run a final evaluation; behavior seems the same as in this simple example (ultimately I run out of memory when loading the best model because the model is the absolutely massive T5-3b).).

@patrickvonplaten
Copy link
Contributor

I encountered similar problems of freeing GPU memory while implementing the benchmark tools. A trick that worked for me was to wrap the function into a multi-process. Maybe you can take a look at this implementation and change your code accordingly so that the model is run in a subprocess:

def separate_process_wrapper_fn(func: Callable[[], None], do_multi_processing: bool) -> Callable[[], None]:

@yakazimir
Copy link
Author

yakazimir commented Sep 1, 2020

Thanks for getting back!

After investigating a bit further, my particular problems seem to be partly related to PyTorch-Lightning (specificially, related to not properly detaching tensors in some of the eval code), but this general bit of advice is good since this seems to be a more general problem that I've seen in other contexts (like you mentioned). I will look more closely at running a multi-process.

As a terrible hack (which probably shouldn't be repeated), I found that converting all models/tensors/training params/.. to cpu then deleting them and applying manual garbage collection fixed my issue.

@jeanmonet
Copy link

jeanmonet commented Apr 21, 2021

I encountered similar problems of freeing GPU memory while implementing the benchmark tools. A trick that worked for me was to wrap the function into a multi-process. Maybe you can take a look at this implementation and change your code accordingly so that the model is run in a subprocess:

def separate_process_wrapper_fn(func: Callable[[], None], do_multi_processing: bool) -> Callable[[], None]:

@patrickvonplaten have you ran into the following error using this method?

Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Tried setting the context as follows with no success:

import multiprocessing as mp
mp.set_start_method('spawn')

@iammeizu
Copy link

met the same problem, anything update ?

@junjingfn
Copy link

Very useful!! Thank you so much for sharing your solution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants