Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory issue of _class_cov #10898

Closed
bobchennan opened this issue Mar 31, 2018 · 5 comments
Closed

memory issue of _class_cov #10898

bobchennan opened this issue Mar 31, 2018 · 5 comments
Labels
Easy Well-defined and straightforward way to resolve Enhancement help wanted

Comments

@bobchennan
Copy link
Contributor

When I was running linear discriminant training, I noticed very high memory usage.
In my case training set consists of 500k samples from 20k classes.
Feature dimension is less than 400.
Memory usage reached more than 100GB.

After debugging I found the problem:

In implementation:

    covs = []
    for group in classes:
        Xg = X[y == group, :]
        covs.append(np.atleast_2d(_cov(Xg, shrinkage)))
    return np.average(covs, axis=0, weights=priors)

The problem is that covariance matrices are stored in a list, which is already not necessary since we only need average of those matrices. After that np.average is called which will copy to convert the list to numpy.array. Both these factors dramatically increase memory usage.

I think we can simply take the sum in the loop if precision is not a problem.

@jnothman
Copy link
Member

jnothman commented Apr 2, 2018

I think you're right, and don't think that precision was the intention here. _class_means uses a similar idiom in a way that wastes memory (but much less so). PR welcome.

@jnothman jnothman added Easy Well-defined and straightforward way to resolve Enhancement help wanted labels Apr 2, 2018
@julietcl
Copy link
Contributor

julietcl commented Apr 2, 2018

I would like to claim this issue.

@bobchennan
Copy link
Contributor Author

@julietcl sorry I just saw your comment. Finish the code already.

@nsorros
Copy link

nsorros commented Apr 5, 2018

@julietcl @jnothman @bobchennan is that issue taken? If not, I would like to claim it as my first issue.

@qinhanmin2014
Copy link
Member

It has been taken, see the referenced PR above (#10904)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Easy Well-defined and straightforward way to resolve Enhancement help wanted
Projects
None yet
Development

No branches or pull requests

5 participants