Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GLM keeps crashing when running a loop. GPU memory gets full #175

Closed
khevn opened this issue Sep 14, 2017 · 6 comments
Closed

GLM keeps crashing when running a loop. GPU memory gets full #175

khevn opened this issue Sep 14, 2017 · 6 comments
Assignees
Labels

Comments

@khevn
Copy link
Contributor

khevn commented Sep 14, 2017

Kernel died after few hundred iterations with wide data

def lasso_search(m, n, s, verbose = True):
    # generating the data
    X = np.random.uniform(-100, 100, size = (m, n))
    coefs = np.random.randn(n)
    const_coef = np.random.randn(1)

    # index of sparse coefficients
    zero_coef_loc = random.sample(range(n), s)
    coefs[zero_coef_loc] = 0

    y = np.dot(X, coefs) + const_coef

    start = time.time()
    lasso = h2o4gpu.Lasso()
    lasso_model = lasso.fit(X, y)
    print('time to train:', time.time() - start)
    

    zero_coef_index = np.where(lasso_model.X[0] == 0)
    
    if verbose:
        print('original coeffs', coefs)
        print('Lasso coefficients:', lasso_model.X[0])
        print(const_coef)
        print(zero_coef_loc)
        print(zero_coef_index[0])
    
    if np.array_equiv(zero_coef_index[0], np.sort(zero_coef_loc)):
        sr = 1
    else:
        print(zero_coef_index[0], s)
        sr = 1.0 * zero_coef_index[0].shape[0] / s
    
    return sr


m_list = []
s_list = []
sr_list = []

n= 1000
m_ratio = 2
s_ratios = [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]

m = int(m_ratio * n)
for s_ratio in s_ratios:
    s = int(s_ratio * n)
    s_list.append(s)
    for i in range(100):
        print(i, m, n, s)
        sr = lasso_search(m, n, s, verbose = False)
        sr_list.append(sr)
@khevn khevn changed the title GLM keeps crashing when running with Lasso penalty in a loop GLM keeps crashing when running a loop. GPU memory gets full Sep 21, 2017
@khevn
Copy link
Contributor Author

khevn commented Sep 21, 2017

The Kernel crash is due to GPU memory being full. After the model is trained, it's not released from GPU memory. Therefore, the memory gets full very quickly when training multiple models in a loop.

Also, trying del model in python doesn't work

@mdymczyk
Copy link
Contributor

Tried this yesterday and noticed the same problem (though with notebooks, so memory was increasing only by ~2M with each fit). The finish methods (which are supposed to free up all the pointers) also didn't seem to work.

@navdeep-G navdeep-G added the glm label Mar 10, 2018
@rockNroll87q
Copy link

rockNroll87q commented Oct 31, 2018

Any solution to this?
Still present in 0.3.0.9999, with Ridge Regression. Python3.6 crashes with the error:
Cuda failure /root/repo/src/gpu/matrix/matrix_dense.cu:1954 'out of memory'

@pseudotensor
Copy link
Collaborator

#204

It's a long-term bug. I recommend using xgboost's glm.

@socathie
Copy link

socathie commented Dec 11, 2018

This bug is extremely inconvenient. My jupyter notebook crashes every few iterations and was unable to finish the job..

My current solution is to do the following:
from numba import cuda
and after each iteration, do:
cuda.close()

@sh1ng
Copy link
Contributor

sh1ng commented Feb 22, 2019

@socathie Could you try Bleeding edge version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants