memory issue of _class_cov #10898

bobchennan · 2018-03-31T19:43:19Z

When I was running linear discriminant training, I noticed very high memory usage.
In my case training set consists of 500k samples from 20k classes.
Feature dimension is less than 400.
Memory usage reached more than 100GB.

After debugging I found the problem:

In implementation:

    covs = []
    for group in classes:
        Xg = X[y == group, :]
        covs.append(np.atleast_2d(_cov(Xg, shrinkage)))
    return np.average(covs, axis=0, weights=priors)

The problem is that covariance matrices are stored in a list, which is already not necessary since we only need average of those matrices. After that np.average is called which will copy to convert the list to numpy.array. Both these factors dramatically increase memory usage.

I think we can simply take the sum in the loop if precision is not a problem.

The text was updated successfully, but these errors were encountered:

jnothman · 2018-04-02T12:37:55Z

I think you're right, and don't think that precision was the intention here. _class_means uses a similar idiom in a way that wastes memory (but much less so). PR welcome.

julietcl · 2018-04-02T15:32:42Z

I would like to claim this issue.

bobchennan · 2018-04-02T15:44:42Z

@julietcl sorry I just saw your comment. Finish the code already.

nsorros · 2018-04-05T08:52:05Z

@julietcl @jnothman @bobchennan is that issue taken? If not, I would like to claim it as my first issue.

qinhanmin2014 · 2018-04-05T09:45:05Z

It has been taken, see the referenced PR above (#10904)

jnothman added Easy Well-defined and straightforward way to resolve Enhancement help wanted labels Apr 2, 2018

bobchennan mentioned this issue Apr 2, 2018

[MRG+1]Update discriminant analysis code for better memory usage #10904

Merged

TomDLT closed this as completed in #10904 Apr 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory issue of _class_cov #10898

memory issue of _class_cov #10898

bobchennan commented Mar 31, 2018

jnothman commented Apr 2, 2018

julietcl commented Apr 2, 2018

bobchennan commented Apr 2, 2018

nsorros commented Apr 5, 2018

qinhanmin2014 commented Apr 5, 2018

memory issue of _class_cov #10898

memory issue of _class_cov #10898

Comments

bobchennan commented Mar 31, 2018

jnothman commented Apr 2, 2018

julietcl commented Apr 2, 2018

bobchennan commented Apr 2, 2018

nsorros commented Apr 5, 2018

qinhanmin2014 commented Apr 5, 2018