New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory issue of _class_cov #10898

bobchennan opened this Issue Mar 31, 2018 · 5 comments


None yet
5 participants

bobchennan commented Mar 31, 2018

When I was running linear discriminant training, I noticed very high memory usage.
In my case training set consists of 500k samples from 20k classes.
Feature dimension is less than 400.
Memory usage reached more than 100GB.

After debugging I found the problem:

In implementation:

    covs = []
    for group in classes:
        Xg = X[y == group, :]
        covs.append(np.atleast_2d(_cov(Xg, shrinkage)))
    return np.average(covs, axis=0, weights=priors)

The problem is that covariance matrices are stored in a list, which is already not necessary since we only need average of those matrices. After that np.average is called which will copy to convert the list to numpy.array. Both these factors dramatically increase memory usage.

I think we can simply take the sum in the loop if precision is not a problem.


This comment has been minimized.


jnothman commented Apr 2, 2018

I think you're right, and don't think that precision was the intention here. _class_means uses a similar idiom in a way that wastes memory (but much less so). PR welcome.


This comment has been minimized.


julietcl commented Apr 2, 2018

I would like to claim this issue.


This comment has been minimized.


bobchennan commented Apr 2, 2018

@julietcl sorry I just saw your comment. Finish the code already.


This comment has been minimized.

nsorros commented Apr 5, 2018

@julietcl @jnothman @bobchennan is that issue taken? If not, I would like to claim it as my first issue.


This comment has been minimized.


qinhanmin2014 commented Apr 5, 2018

It has been taken, see the referenced PR above (#10904)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment