GLM keeps crashing when running a loop. GPU memory gets full #175

khevn · 2017-09-14T00:01:23Z

Kernel died after few hundred iterations with wide data

def lasso_search(m, n, s, verbose = True):
    # generating the data
    X = np.random.uniform(-100, 100, size = (m, n))
    coefs = np.random.randn(n)
    const_coef = np.random.randn(1)

    # index of sparse coefficients
    zero_coef_loc = random.sample(range(n), s)
    coefs[zero_coef_loc] = 0

    y = np.dot(X, coefs) + const_coef

    start = time.time()
    lasso = h2o4gpu.Lasso()
    lasso_model = lasso.fit(X, y)
    print('time to train:', time.time() - start)
    

    zero_coef_index = np.where(lasso_model.X[0] == 0)
    
    if verbose:
        print('original coeffs', coefs)
        print('Lasso coefficients:', lasso_model.X[0])
        print(const_coef)
        print(zero_coef_loc)
        print(zero_coef_index[0])
    
    if np.array_equiv(zero_coef_index[0], np.sort(zero_coef_loc)):
        sr = 1
    else:
        print(zero_coef_index[0], s)
        sr = 1.0 * zero_coef_index[0].shape[0] / s
    
    return sr


m_list = []
s_list = []
sr_list = []

n= 1000
m_ratio = 2
s_ratios = [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]

m = int(m_ratio * n)
for s_ratio in s_ratios:
    s = int(s_ratio * n)
    s_list.append(s)
    for i in range(100):
        print(i, m, n, s)
        sr = lasso_search(m, n, s, verbose = False)
        sr_list.append(sr)

The text was updated successfully, but these errors were encountered:

khevn · 2017-09-21T20:24:11Z

The Kernel crash is due to GPU memory being full. After the model is trained, it's not released from GPU memory. Therefore, the memory gets full very quickly when training multiple models in a loop.

Also, trying del model in python doesn't work

mdymczyk · 2017-09-22T06:01:39Z

Tried this yesterday and noticed the same problem (though with notebooks, so memory was increasing only by ~2M with each fit). The finish methods (which are supposed to free up all the pointers) also didn't seem to work.

rockNroll87q · 2018-10-31T16:55:37Z

Any solution to this?
Still present in 0.3.0.9999, with Ridge Regression. Python3.6 crashes with the error:
Cuda failure /root/repo/src/gpu/matrix/matrix_dense.cu:1954 'out of memory'

pseudotensor · 2018-10-31T17:16:11Z

#204

It's a long-term bug. I recommend using xgboost's glm.

socathie · 2018-12-11T07:52:20Z

This bug is extremely inconvenient. My jupyter notebook crashes every few iterations and was unable to finish the job..

My current solution is to do the following:
from numba import cuda
and after each iteration, do:
cuda.close()

sh1ng · 2019-02-22T10:49:49Z

@socathie Could you try Bleeding edge version.

khevn changed the title ~~GLM keeps crashing when running with Lasso penalty in a loop~~ GLM keeps crashing when running a loop. GPU memory gets full Sep 21, 2017

navdeep-G added the glm label Mar 10, 2018

socathie mentioned this issue Dec 19, 2018

CPU memory maxed out even when computing with GPU #703

Open

sh1ng self-assigned this Feb 7, 2019

sh1ng mentioned this issue Feb 18, 2019

Fix #175, #204 #722

Merged

sh1ng closed this as completed in fe5b7a3 Feb 21, 2019

sh1ng mentioned this issue Apr 1, 2020

Lightgbm hangs sometimes #830

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GLM keeps crashing when running a loop. GPU memory gets full #175

GLM keeps crashing when running a loop. GPU memory gets full #175

khevn commented Sep 14, 2017 •

edited

Loading

khevn commented Sep 21, 2017

mdymczyk commented Sep 22, 2017

rockNroll87q commented Oct 31, 2018 •

edited

Loading

pseudotensor commented Oct 31, 2018

socathie commented Dec 11, 2018 •

edited

Loading

sh1ng commented Feb 22, 2019

GLM keeps crashing when running a loop. GPU memory gets full #175

GLM keeps crashing when running a loop. GPU memory gets full #175

Comments

khevn commented Sep 14, 2017 • edited Loading

khevn commented Sep 21, 2017

mdymczyk commented Sep 22, 2017

rockNroll87q commented Oct 31, 2018 • edited Loading

pseudotensor commented Oct 31, 2018

socathie commented Dec 11, 2018 • edited Loading

sh1ng commented Feb 22, 2019

khevn commented Sep 14, 2017 •

edited

Loading

rockNroll87q commented Oct 31, 2018 •

edited

Loading

socathie commented Dec 11, 2018 •

edited

Loading