Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory issue of _class_cov #10898

Closed
bobchennan opened this issue Mar 31, 2018 · 5 comments
Closed

memory issue of _class_cov #10898

bobchennan opened this issue Mar 31, 2018 · 5 comments

Comments

@bobchennan
Copy link
Contributor

@bobchennan bobchennan commented Mar 31, 2018

When I was running linear discriminant training, I noticed very high memory usage.
In my case training set consists of 500k samples from 20k classes.
Feature dimension is less than 400.
Memory usage reached more than 100GB.

After debugging I found the problem:

In implementation:

    covs = []
    for group in classes:
        Xg = X[y == group, :]
        covs.append(np.atleast_2d(_cov(Xg, shrinkage)))
    return np.average(covs, axis=0, weights=priors)

The problem is that covariance matrices are stored in a list, which is already not necessary since we only need average of those matrices. After that np.average is called which will copy to convert the list to numpy.array. Both these factors dramatically increase memory usage.

I think we can simply take the sum in the loop if precision is not a problem.

@jnothman
Copy link
Member

@jnothman jnothman commented Apr 2, 2018

I think you're right, and don't think that precision was the intention here. _class_means uses a similar idiom in a way that wastes memory (but much less so). PR welcome.

@julietcl
Copy link
Contributor

@julietcl julietcl commented Apr 2, 2018

I would like to claim this issue.

@bobchennan
Copy link
Contributor Author

@bobchennan bobchennan commented Apr 2, 2018

@julietcl sorry I just saw your comment. Finish the code already.

@nsorros
Copy link

@nsorros nsorros commented Apr 5, 2018

@julietcl @jnothman @bobchennan is that issue taken? If not, I would like to claim it as my first issue.

@qinhanmin2014
Copy link
Member

@qinhanmin2014 qinhanmin2014 commented Apr 5, 2018

It has been taken, see the referenced PR above (#10904)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants