Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clearing GPU memory in Keras #12625

Closed
sepehrghafari opened this issue Apr 5, 2019 · 5 comments

Comments

@sepehrghafari
Copy link

commented Apr 5, 2019

80% my GPU memory get's full after loading pre-trained Xception model. but after deleting my model , memory doesn't get empty or flush.
I've also used codes like : K.clear_session() , gc.collect() , tf.reset_default_graph() , del model but none of them worked. Gpu properties say's 85% of memory is full.

Nothing flush gpu memory except numba.cuda.close() but won't allow me to use my gpu again. The only way to clear it is restarting kernel and rerun my code.

I'm looking for any script code to add my code allow me to use my code in for loop and clear gpu in every loop.

Part of my code :

image_input = Input(shape=(224, 224, 3))
base_model = Xception(input_tensor=image_input, include_top=False,weights='imagenet')
base_model.compile(loss='categorical_crossentropy',optimizer='adadelta',metrics=['accuracy'])
hist = base_model.fit(X,Y,epochs=2)

System information

Have I written custom code :
Windows 10 64-bit
TensorFlow installed from conda install tensorflow-gpu
TensorFlow version: 1.3
Python version: 3.6
CUDA/cuDNN version: 9.2
GPU model and memory: Asus GTX 1060 6gb
@nateraw

This comment has been minimized.

Copy link

commented Apr 9, 2019

This function looks promising, stolen from fastai forums

from keras.backend.tensorflow_backend import set_session
from keras.backend.tensorflow_backend import clear_session
from keras.backend.tensorflow_backend import get_session
import tensorflow

# Reset Keras Session
def reset_keras():
    sess = get_session()
    clear_session()
    sess.close()
    sess = get_session()

    try:
        del classifier # this is from global space - change this as you need
    except:
        pass

    print(gc.collect()) # if it's done something you should see a number being outputted

    # use the same config as you used to create the session
    config = tensorflow.ConfigProto()
    config.gpu_options.per_process_gpu_memory_fraction = 1
    config.gpu_options.visible_device_list = "0"
    set_session(tensorflow.Session(config=config))
@sepehrghafari

This comment has been minimized.

Copy link
Author

commented Apr 11, 2019

@nateraw
Thank you very much !!!
The only script that works to rerun training session without restarting.
you helped me not to run every single time of rerun and now i can use for loop to train multiple times.

@Moondra

This comment has been minimized.

Copy link

commented Jun 19, 2019

This function looks promising, stolen from fastai forums

from keras.backend.tensorflow_backend import set_session
from keras.backend.tensorflow_backend import clear_session
from keras.backend.tensorflow_backend import get_session
import tensorflow

# Reset Keras Session
def reset_keras():
    sess = get_session()
    clear_session()
    sess.close()
    sess = get_session()

    try:
        del classifier # this is from global space - change this as you need
    except:
        pass

    print(gc.collect()) # if it's done something you should see a number being outputted

    # use the same config as you used to create the session
    config = tensorflow.ConfigProto()
    config.gpu_options.per_process_gpu_memory_fraction = 1
    config.gpu_options.visible_device_list = "0"
    set_session(tensorflow.Session(config=config))

I keep getting "CUDA_ERROR_OUT_OF_MEMORY" when running the above function.
The only thing that clears up my memory is restarting my computer.

@ambigus9

This comment has been minimized.

Copy link

commented Sep 6, 2019

This function looks promising, stolen from fastai forums

from keras.backend.tensorflow_backend import set_session
from keras.backend.tensorflow_backend import clear_session
from keras.backend.tensorflow_backend import get_session
import tensorflow

# Reset Keras Session
def reset_keras():
    sess = get_session()
    clear_session()
    sess.close()
    sess = get_session()

    try:
        del classifier # this is from global space - change this as you need
    except:
        pass

    print(gc.collect()) # if it's done something you should see a number being outputted

    # use the same config as you used to create the session
    config = tensorflow.ConfigProto()
    config.gpu_options.per_process_gpu_memory_fraction = 1
    config.gpu_options.visible_device_list = "0"
    set_session(tensorflow.Session(config=config))

Thanks! Works perfect!!

@cpoptic

This comment has been minimized.

Copy link

commented Oct 2, 2019

running the above reset_keras() function still throws an OOM error on Ubuntu with 16GB of GPU memory.

InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory

TensorFlow installed from conda install tensorflow-gpu
TensorFlow version: 1.14
Python version: 3.6
CUDA/cuDNN version: 10.0.168
GPU model and memory: Tesla V100-PCIE-16GB 16gb

Same when I try running:

#from keras import backend as K
import tensorflow as tf
from tensorflow.keras import backend as K
curr_session = tf.get_default_session()
# close current session
if curr_session is not None:
    curr_session.close()
# reset graph
K.clear_session()
# create new session
s = tf.InteractiveSession()
K.set_session(s)

I find it fascinating that the TensorFlow team has not made a very straightforward way to clear GPU memory from a session. So much is broken with TF. Little annoyances like this; a user reasonably expects TF to handle clearing CUDA memory or have memory leaks, yet there appears no explicit way to handle this. Even K.clear_session() doesn't work. This is not unreasonable. Maybe the blame should be directed towards nvidia, as even the following code doesn't clear the memory:

from numba import cuda
cuda.select_device(0)
cuda.close()

After several hours of scouring StackOverflow and the Github issues, and trying the above approaches (none of which worked for some reason), I'm left with the decidedly inelegant approach of restarting the entire kernel. Frustrating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.