Randomized PCA.transform uses a lot of RAM #11102

vortexdev · 2018-05-16T21:57:00Z

Description

Randomized sklearn.decomposition.PCA uses about2*n_samples*n_features memory (RAM), including specified samples.
While fbpca (https://github.com/facebook/fbpca) uses 2 times less.

Is this expected behavour?
(I understand that sklearn version computes more things like explained_variance_)

Steps/Code to Reproduce

sklearn version:

import numpy as np
from sklearn.decomposition import PCA

samples = np.random.random((20000, 16384))
pca = PCA(copy=False, n_components=128, svd_solver='randomized', iterated_power=4)
pca.fit_transform(samples)

fbpca version:

import numpy as np
import fbpca

samples = np.random.random((20000, 16384))
(U, s, Va) = fbpca.pca(samples, k=128, n_iter=4)

Expected Results

Randomized sklearn.decomposition.PCA uses aboutn_samples*n_features + n_samples*n_components + <variance matrices etc.> memory (RAM).

Actual Results

Randomized sklearn.decomposition.PCA uses about2*n_samples*n_features memory (RAM).
We see peaks at transform step.

(generated with memory_profiler and gnuplot)

Versions

Darwin-17.4.0-x86_64-i386-64bit
Python 3.6.1 |Continuum Analytics, Inc.| (default, May 11 2017, 13:04:09)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
NumPy 1.14.3
SciPy 1.1.0
Scikit-Learn 0.19.1
(tested on different Linux machines as well)

P.S.

We are trying to perform PCA for large matrices (2m x 16k, ~110GB). IncrementalPCA is very slow for us. Randomized PCA is very fast, but we are trying to reduce memory consumption to use cheaper instances.

Thank you.

The text was updated successfully, but these errors were encountered:

agramfort · 2018-05-17T07:22:14Z

yes we can probably remove the peak usage at the end of the fit. PR welcome.

Micky774 · 2022-02-14T01:03:23Z

Current situation

After looking into this for a while, I believe that extra memory footprint is generated by pca calculating and storing

total_var = np.var(X, ddof=1, axis=0)

Now, two observations:

We only ever use total_var.sum(), not the entirety of total_var itself.
That is the last time we use X in that method.

Potential solution

So I wonder if we could potentially replace np.var().sum() with some method that utilizes X in-place for the intermediate calculations ~~and then deletes X after~~ returning the scalar equivalent to total_var.sum(). ~~Not sure how to approach this when copy=False though.~~

Edit: current semantics are that when the user sets copy=False they expect that the underlying array may be mutated for memory efficiency, therefore this solution works for copy=True and copy=False

Example

def in_place_var(X): 
    N = X.shape[0]-1
    X -= X.mean(axis=0)
    np.square(X, out=X)
    np.sum(X, axis=0, out=X[0])
    X = X[0]
    X /= N
    out = np.sum(X)
    del X
    return out

And when replacing

scikit-learn/sklearn/decomposition/_pca.py

Lines 620 to 630 in f9d0467

    
           # Get variance explained by singular values 
        
           self.explained_variance_ = (S ** 2) / (n_samples - 1) 
        
           total_var = np.var(X, ddof=1, axis=0) 
        
           self.explained_variance_ratio_ = self.explained_variance_ / total_var.sum() 
        
           self.singular_values_ = S.copy()  # Store the singular values. 
        
           if self.n_components_ < min(n_features, n_samples): 
        
               self.noise_variance_ = total_var.sum() - self.explained_variance_.sum() 
        
               self.noise_variance_ /= min(n_features, n_samples) - n_components 
        
           else: 
        
               self.noise_variance_ = 0.0

with

# Get variance explained by singular values
self.explained_variance_ = (S ** 2) / (n_samples - 1)

# Use in-place calculation if self.copy
total_var = in_place_var(X) if self.copy else np.var(X, ddof=1, axis=0).sum()
self.explained_variance_ratio_ = self.explained_variance_ / total_var
self.singular_values_ = S.copy()  # Store the singular values.

if self.n_components_ < min(n_features, n_samples):
    self.noise_variance_ = total_var - self.explained_variance_.sum()
    self.noise_variance_ /= min(n_features, n_samples) - n_components
else:
    self.noise_variance_ = 0.0

the memory profile over time goes from

to

Disclaimer

Obviously I have put no thought into error handling or checking for correctness/precision-- this is strictly meant to be a demonstration since there are surely better ways of handling it. In my manual tests over several random initializations of samples = np.random.random((20000, 16384)) the two approaches had the same results, but this is by no means all-inclusive.

Thoughts? @thomasjpfan

agramfort · 2022-02-14T07:25:42Z

what is the shape of X you used here?

…

Message ID: ***@***.***>

Micky774 · 2022-02-14T15:45:15Z

what is the shape of X you used here?

X.shape = (num_samples, num_features)

agramfort · 2022-02-14T16:36:55Z

sure but give me numbers ? is it an effect you only see with for example n_features >> n_samples and n_features > 10_000?

…

Message ID: ***@***.***>

Micky774 · 2022-02-14T17:28:00Z

This was something I only observed for very large sample sizes and feature counts. Specifically, I was testing around (20000, 16384) as originally suggested, but observed this even for sample sizes and feature coutns an order of magnitude smaller. I haven't spent too much time checking out the mem profile of many different shape configurations, though I do note that the effect doesn't seem that apparent or strong in smaller cases.

agramfort · 2022-02-15T12:50:40Z

can you share a full code snippet to replicate easily eg using random data?

…

On Mon, Feb 14, 2022 at 6:28 PM Meekail Zain ***@***.***> wrote: This was something I only observed for very large sample sizes and feature counts. Specifically, I was testing around (20000, 16384) as originally suggested, but observed this even for sample sizes and feature coutns an order of magnitude smaller. I haven't spent too much time checking out the mem profile of many different shape configurations, though I do note that the effect doesn't seem that apparent or strong in smaller cases. — Reply to this email directly, view it on GitHub <#11102 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABHKHBMI5UNG3XKADQCD3LU3E3SXANCNFSM4FAINHUA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you commented.Message ID: ***@***.***>

Micky774 · 2022-02-15T16:05:25Z

import numpy as np
from sklearn.decomposition import PCA

samples = np.random.random((20000, 16384))
pca = PCA(copy=False, n_components=128, svd_solver='randomized', iterated_power=4)
pca.fit_transform(samples)

I'm using scalane for memory profiling, although my graphs were generated using memory_profiler
Between trials for comparison I am changing the code in _pca.py to reflect the changes proposed in my comments

vortexdev changed the title ~~Randomized PCA uses a lot of RAM~~ Randomized PCA.transform uses a lot of RAM May 16, 2018

vortexdev mentioned this issue May 16, 2018

Reduce memory consumption for PCA when float32 matrix is specified. facebookarchive/fbpca#5

Closed

tygert mentioned this issue Jul 16, 2018

Distributed / Spark version facebookarchive/fbpca#6

Closed

cmarmo added module:decomposition Performance labels Jan 15, 2022

Micky774 mentioned this issue Feb 13, 2022

.fit.transform != .fit_transform inconsistency in PCA results #18941

Open

This was referenced Feb 19, 2022

Deferred calculation of expensive fitted attributes #22552

Closed

FIX Reduces memory usage of PCA.transform #22553

Merged

thomasjpfan closed this as completed in #22553 Mar 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Randomized PCA.transform uses a lot of RAM #11102

Randomized PCA.transform uses a lot of RAM #11102

vortexdev commented May 16, 2018 •

edited

agramfort commented May 17, 2018 via email

Micky774 commented Feb 14, 2022 •

edited

agramfort commented Feb 14, 2022 via email

Micky774 commented Feb 14, 2022

agramfort commented Feb 14, 2022 via email

Micky774 commented Feb 14, 2022

agramfort commented Feb 15, 2022 via email

Micky774 commented Feb 15, 2022

Randomized PCA.transform uses a lot of RAM #11102

Randomized PCA.transform uses a lot of RAM #11102

Comments

vortexdev commented May 16, 2018 • edited

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

P.S.

agramfort commented May 17, 2018 via email

Micky774 commented Feb 14, 2022 • edited

Current situation

Potential solution

Example

Disclaimer

agramfort commented Feb 14, 2022 via email

Micky774 commented Feb 14, 2022

agramfort commented Feb 14, 2022 via email

Micky774 commented Feb 14, 2022

agramfort commented Feb 15, 2022 via email

Micky774 commented Feb 15, 2022

vortexdev commented May 16, 2018 •

edited

Micky774 commented Feb 14, 2022 •

edited