Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Randomized PCA.transform uses a lot of RAM #11102

Closed
vortexdev opened this issue May 16, 2018 · 8 comments · Fixed by #22553
Closed

Randomized PCA.transform uses a lot of RAM #11102

vortexdev opened this issue May 16, 2018 · 8 comments · Fixed by #22553

Comments

@vortexdev
Copy link

vortexdev commented May 16, 2018

Description

Randomized sklearn.decomposition.PCA uses about2*n_samples*n_features memory (RAM), including specified samples.
While fbpca (https://github.com/facebook/fbpca) uses 2 times less.

Is this expected behavour?
(I understand that sklearn version computes more things like explained_variance_)

Steps/Code to Reproduce

sklearn version:

import numpy as np
from sklearn.decomposition import PCA

samples = np.random.random((20000, 16384))
pca = PCA(copy=False, n_components=128, svd_solver='randomized', iterated_power=4)
pca.fit_transform(samples)

fbpca version:

import numpy as np
import fbpca

samples = np.random.random((20000, 16384))
(U, s, Va) = fbpca.pca(samples, k=128, n_iter=4)

Expected Results

Randomized sklearn.decomposition.PCA uses aboutn_samples*n_features + n_samples*n_components + <variance matrices etc.> memory (RAM).

Actual Results

Randomized sklearn.decomposition.PCA uses about2*n_samples*n_features memory (RAM).
We see peaks at transform step.
pca_memory_test
(generated with memory_profiler and gnuplot)

Versions

Darwin-17.4.0-x86_64-i386-64bit
Python 3.6.1 |Continuum Analytics, Inc.| (default, May 11 2017, 13:04:09)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
NumPy 1.14.3
SciPy 1.1.0
Scikit-Learn 0.19.1
(tested on different Linux machines as well)

P.S.

We are trying to perform PCA for large matrices (2m x 16k, ~110GB). IncrementalPCA is very slow for us. Randomized PCA is very fast, but we are trying to reduce memory consumption to use cheaper instances.

Thank you.

@vortexdev vortexdev changed the title Randomized PCA uses a lot of RAM Randomized PCA.transform uses a lot of RAM May 16, 2018
@agramfort
Copy link
Member

agramfort commented May 17, 2018 via email

@Micky774
Copy link
Contributor

Micky774 commented Feb 14, 2022

Current situation

After looking into this for a while, I believe that extra memory footprint is generated by pca calculating and storing

total_var = np.var(X, ddof=1, axis=0)

Now, two observations:

  1. We only ever use total_var.sum(), not the entirety of total_var itself.
  2. That is the last time we use X in that method.

Potential solution

So I wonder if we could potentially replace np.var().sum() with some method that utilizes X in-place for the intermediate calculations and then deletes X after returning the scalar equivalent to total_var.sum(). Not sure how to approach this when copy=False though.

Edit: current semantics are that when the user sets copy=False they expect that the underlying array may be mutated for memory efficiency, therefore this solution works for copy=True and copy=False

Example

def in_place_var(X): 
    N = X.shape[0]-1
    X -= X.mean(axis=0)
    np.square(X, out=X)
    np.sum(X, axis=0, out=X[0])
    X = X[0]
    X /= N
    out = np.sum(X)
    del X
    return out

And when replacing

# Get variance explained by singular values
self.explained_variance_ = (S ** 2) / (n_samples - 1)
total_var = np.var(X, ddof=1, axis=0)
self.explained_variance_ratio_ = self.explained_variance_ / total_var.sum()
self.singular_values_ = S.copy() # Store the singular values.
if self.n_components_ < min(n_features, n_samples):
self.noise_variance_ = total_var.sum() - self.explained_variance_.sum()
self.noise_variance_ /= min(n_features, n_samples) - n_components
else:
self.noise_variance_ = 0.0

with

# Get variance explained by singular values
self.explained_variance_ = (S ** 2) / (n_samples - 1)

# Use in-place calculation if self.copy
total_var = in_place_var(X) if self.copy else np.var(X, ddof=1, axis=0).sum()
self.explained_variance_ratio_ = self.explained_variance_ / total_var
self.singular_values_ = S.copy()  # Store the singular values.

if self.n_components_ < min(n_features, n_samples):
    self.noise_variance_ = total_var - self.explained_variance_.sum()
    self.noise_variance_ /= min(n_features, n_samples) - n_components
else:
    self.noise_variance_ = 0.0

the memory profile over time goes from
regular_var
to
in_place_var

Disclaimer

Obviously I have put no thought into error handling or checking for correctness/precision-- this is strictly meant to be a demonstration since there are surely better ways of handling it. In my manual tests over several random initializations of samples = np.random.random((20000, 16384)) the two approaches had the same results, but this is by no means all-inclusive.

Thoughts? @thomasjpfan

@agramfort
Copy link
Member

agramfort commented Feb 14, 2022 via email

@Micky774
Copy link
Contributor

what is the shape of X you used here?

X.shape = (num_samples, num_features)

@agramfort
Copy link
Member

agramfort commented Feb 14, 2022 via email

@Micky774
Copy link
Contributor

This was something I only observed for very large sample sizes and feature counts. Specifically, I was testing around (20000, 16384) as originally suggested, but observed this even for sample sizes and feature coutns an order of magnitude smaller. I haven't spent too much time checking out the mem profile of many different shape configurations, though I do note that the effect doesn't seem that apparent or strong in smaller cases.

@agramfort
Copy link
Member

agramfort commented Feb 15, 2022 via email

@Micky774
Copy link
Contributor

import numpy as np
from sklearn.decomposition import PCA

samples = np.random.random((20000, 16384))
pca = PCA(copy=False, n_components=128, svd_solver='randomized', iterated_power=4)
pca.fit_transform(samples)

I'm using scalane for memory profiling, although my graphs were generated using memory_profiler
Between trials for comparison I am changing the code in _pca.py to reflect the changes proposed in my comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants