Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Made PCA expose the singular values #7685

Merged
merged 28 commits into from Oct 30, 2016
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
c637fb0
Manually resolved merge conflict
Jan 24, 2013
8897ab5
Added sparse NIPALS
Jan 24, 2013
d2f913f
Added sparse PCA (L1 penalised)
Jan 25, 2013
4687649
Work in progress. Added SVD and PLS-R.
Jan 28, 2013
ef16c4c
Work in progress. Updated PLS-R and added soft thresholding to it
Jan 29, 2013
2324939
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn
Jan 29, 2013
e807663
Work in progress: Updated PLS-R.
Jan 29, 2013
23c701a
Work in progress: Added several unit tests.
Jan 30, 2013
510a9a1
Work in progress: Rescue-save.
Jan 31, 2013
c11f175
Work in progress: Rescue-save.
Jan 31, 2013
5b4c183
Work in progress.
Jan 31, 2013
948f19a
Work in progress.
Feb 5, 2013
fd6c962
Merge.
tomlof Oct 17, 2016
38d7e1d
Merge.
tomlof Oct 17, 2016
e3acdbf
ENH: Added the singular values to PCA by a singular_values_ instance …
tomlof Oct 17, 2016
51a71f0
DOC: Updates as per PR review.
tomlof Oct 18, 2016
e9fe5e7
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
tomlof Oct 25, 2016
c4a4943
TEST: Added unit tests for PCA, IncrementalPCA and TruncatedSVD.
tomlof Oct 25, 2016
44ab311
BUG: Removed the use of new features from numpy.
tomlof Oct 26, 2016
15816fe
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
tomlof Oct 26, 2016
566f49d
TEST: Reduced the error thresholds in PCA singular value tests.
tomlof Oct 26, 2016
d3df7ed
TEST: Reduced the error thresholds in PCA singular value tests.
tomlof Oct 26, 2016
ce17751
MAINT: PEP8 compliance.
tomlof Oct 26, 2016
279fd60
MAINT: Merge.
tomlof Oct 29, 2016
9fc3420
DOC: Updated whats_new.rst to include news in PCA classes.
tomlof Oct 29, 2016
ae86e2f
TEST: Fixed doctests for truncated PCA.
tomlof Oct 29, 2016
2b731f0
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
tomlof Oct 30, 2016
b5a6356
DOC: Fixed typo in whats_new.rst.
tomlof Oct 30, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 7 additions & 1 deletion sklearn/decomposition/incremental_pca.py
Expand Up @@ -71,7 +71,12 @@ class IncrementalPCA(_BasePCA):
explained_variance_ratio_ : array, shape (n_components,)
Percentage of variance explained by each of the selected components.
If all components are stored, the sum of explained variances is equal
to 1.0
to 1.0.

singular_values_ : array, shape (n_components,)
The singular values corresponding to each of the selected components.
The singular values are equal to the 2-norms of the ``n_components``
variables in the lower-dimensional space.

mean_ : array, shape (n_features,)
Per-feature empirical mean, aggregate over calls to ``partial_fit``.
Expand Down Expand Up @@ -166,6 +171,7 @@ def fit(self, X, y=None):
self.singular_values_ = None
self.explained_variance_ = None
self.explained_variance_ratio_ = None
self.singular_values_ = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to add this here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, but the other attributes were set there, so I thought it would be good for consistency to have the singular values be set there as well.

self.noise_variance_ = None

X = check_array(X, copy=self.copy, dtype=[np.float64, np.float32])
Expand Down
58 changes: 41 additions & 17 deletions sklearn/decomposition/pca.py
Expand Up @@ -186,23 +186,28 @@ class PCA(_BasePCA):

Attributes
----------
components_ : array, [n_components, n_features]
components_ : array, shape (n_components, n_features)
Principal axes in feature space, representing the directions of
maximum variance in the data. The components are sorted by
``explained_variance_``.

explained_variance_ : array, [n_components]
explained_variance_ : array, shape (n_components,)
The amount of variance explained by each of the selected components.

.. versionadded:: 0.18

explained_variance_ratio_ : array, [n_components]
explained_variance_ratio_ : array, shape (n_components,)
Percentage of variance explained by each of the selected components.

If ``n_components`` is not set then all components are stored and the
sum of explained variances is equal to 1.0.

mean_ : array, [n_features]
singular_values_ : array, shape (n_components,)
The singular values corresponding to each of the selected components.
The singular values are equal to the 2-norms of the ``n_components``
variables in the lower-dimensional space.

mean_ : array, shape (n_features,)
Per-feature empirical mean, estimated from the training set.

Equal to `X.mean(axis=1)`.
Expand Down Expand Up @@ -250,22 +255,28 @@ class PCA(_BasePCA):
>>> pca.fit(X)
PCA(copy=True, iterated_power='auto', n_components=2, random_state=None,
svd_solver='auto', tol=0.0, whiten=False)
>>> print(pca.explained_variance_ratio_) # doctest: +ELLIPSIS
>>> print(pca.explained_variance_ratio_) # doctest: +ELLIPSIS
[ 0.99244... 0.00755...]
>>> print(pca.singular_values_) # doctest: +ELLIPSIS
[ 6.30061... 0.54980...]

>>> pca = PCA(n_components=2, svd_solver='full')
>>> pca.fit(X) # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE
PCA(copy=True, iterated_power='auto', n_components=2, random_state=None,
svd_solver='full', tol=0.0, whiten=False)
>>> print(pca.explained_variance_ratio_) # doctest: +ELLIPSIS
>>> print(pca.explained_variance_ratio_) # doctest: +ELLIPSIS
[ 0.99244... 0.00755...]
>>> print(pca.singular_values_) # doctest: +ELLIPSIS
[ 6.30061... 0.54980...]

>>> pca = PCA(n_components=1, svd_solver='arpack')
>>> pca.fit(X)
PCA(copy=True, iterated_power='auto', n_components=1, random_state=None,
svd_solver='arpack', tol=0.0, whiten=False)
>>> print(pca.explained_variance_ratio_) # doctest: +ELLIPSIS
>>> print(pca.explained_variance_ratio_) # doctest: +ELLIPSIS
[ 0.99244...]
>>> print(pca.singular_values_) # doctest: +ELLIPSIS
[ 6.30061...]

See also
--------
Expand Down Expand Up @@ -385,6 +396,7 @@ def _fit_full(self, X, n_components):
explained_variance_ = (S ** 2) / n_samples
total_var = explained_variance_.sum()
explained_variance_ratio_ = explained_variance_ / total_var
singular_values_ = S.copy() # Store the singular values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be calculated by the user as np.sqrt(explained_variance_ * n_samples)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. Stupid question. I've read the issue and figure this is all about making something comparable available in TruncatedSVD.


# Postprocess the number of components required
if n_components == 'mle':
Expand All @@ -409,6 +421,7 @@ def _fit_full(self, X, n_components):
self.explained_variance_ = explained_variance_[:n_components]
self.explained_variance_ratio_ = \
explained_variance_ratio_[:n_components]
self.singular_values_ = singular_values_[:n_components]

return U, S, V

Expand Down Expand Up @@ -463,6 +476,7 @@ def _fit_truncated(self, X, n_components, svd_solver):
total_var = np.var(X, axis=0)
self.explained_variance_ratio_ = \
self.explained_variance_ / total_var.sum()
self.singular_values_ = S.copy() # Store the singular values.
if self.n_components_ < n_features:
self.noise_variance_ = (total_var.sum() -
self.explained_variance_.sum())
Expand Down Expand Up @@ -520,9 +534,11 @@ def score(self, X, y=None):
return np.mean(self.score_samples(X))


@deprecated("RandomizedPCA was deprecated in 0.18 and will be removed in 0.20. "
@deprecated("RandomizedPCA was deprecated in 0.18 and will be removed in "
"0.20. "
"Use PCA(svd_solver='randomized') instead. The new implementation "
"DOES NOT store whiten ``components_``. Apply transform to get them.")
"DOES NOT store whiten ``components_``. Apply transform to get "
"them.")
class RandomizedPCA(BaseEstimator, TransformerMixin):
"""Principal component analysis (PCA) using randomized SVD

Expand All @@ -549,8 +565,8 @@ class RandomizedPCA(BaseEstimator, TransformerMixin):
.. versionchanged:: 0.18

whiten : bool, optional
When True (False by default) the `components_` vectors are multiplied by
the square root of (n_samples) and divided by the singular values to
When True (False by default) the `components_` vectors are multiplied
by the square root of (n_samples) and divided by the singular values to
ensure uncorrelated outputs with unit component-wise variances.

Whitening will remove some information from the transformed signal
Expand All @@ -564,15 +580,20 @@ class RandomizedPCA(BaseEstimator, TransformerMixin):

Attributes
----------
components_ : array, [n_components, n_features]
components_ : array, shape (n_components, n_features)
Components with maximum variance.

explained_variance_ratio_ : array, [n_components]
explained_variance_ratio_ : array, shape (n_components,)
Percentage of variance explained by each of the selected components.
k is not set then all components are stored and the sum of explained
variances is equal to 1.0
If k is not set then all components are stored and the sum of explained
variances is equal to 1.0.

singular_values_ : array, shape (n_components,)
The singular values corresponding to each of the selected components.
The singular values are equal to the 2-norms of the ``n_components``
variables in the lower-dimensional space.

mean_ : array, [n_features]
mean_ : array, shape (n_features,)
Per-feature empirical mean, estimated from the training set.

Examples
Expand All @@ -584,8 +605,10 @@ class RandomizedPCA(BaseEstimator, TransformerMixin):
>>> pca.fit(X) # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE
RandomizedPCA(copy=True, iterated_power=2, n_components=2,
random_state=None, whiten=False)
>>> print(pca.explained_variance_ratio_) # doctest: +ELLIPSIS
>>> print(pca.explained_variance_ratio_) # doctest: +ELLIPSIS
[ 0.99244... 0.00755...]
>>> print(pca.singular_values_) # doctest: +ELLIPSIS
[ 6.30061... 0.54980...]

See also
--------
Expand Down Expand Up @@ -663,6 +686,7 @@ def _fit(self, X):
self.explained_variance_ = exp_var = (S ** 2) / n_samples
full_var = np.var(X, axis=0).sum()
self.explained_variance_ratio_ = exp_var / full_var
self.singular_values_ = S # Store the singular values.

if self.whiten:
self.components_ = V / S[:, np.newaxis] * sqrt(n_samples)
Expand Down
23 changes: 16 additions & 7 deletions sklearn/decomposition/truncated_svd.py
Expand Up @@ -71,26 +71,33 @@ class TruncatedSVD(BaseEstimator, TransformerMixin):
----------
components_ : array, shape (n_components, n_features)

explained_variance_ratio_ : array, [n_components]
Percentage of variance explained by each of the selected components.

explained_variance_ : array, [n_components]
explained_variance_ : array, shape (n_components,)
The variance of the training samples transformed by a projection to
each component.

explained_variance_ratio_ : array, shape (n_components,)
Percentage of variance explained by each of the selected components.

singular_values_ : array, shape (n_components,)
The singular values corresponding to each of the selected components.
The singular values are equal to the 2-norms of the ``n_components``
variables in the lower-dimensional space.

Examples
--------
>>> from sklearn.decomposition import TruncatedSVD
>>> from sklearn.random_projection import sparse_random_matrix
>>> X = sparse_random_matrix(100, 100, density=0.01, random_state=42)
>>> svd = TruncatedSVD(n_components=5, n_iter=7, random_state=42)
>>> svd.fit(X) # doctest: +NORMALIZE_WHITESPACE
>>> svd.fit(X) # doctest: +NORMALIZE_WHITESPACE
TruncatedSVD(algorithm='randomized', n_components=5, n_iter=7,
random_state=42, tol=0.0)
>>> print(svd.explained_variance_ratio_) # doctest: +ELLIPSIS
>>> print(svd.explained_variance_ratio_) # doctest: +ELLIPSIS
[ 0.0782... 0.0552... 0.0544... 0.0499... 0.0413...]
>>> print(svd.explained_variance_ratio_.sum()) # doctest: +ELLIPSIS
>>> print(svd.explained_variance_ratio_.sum()) # doctest: +ELLIPSIS
0.279...
>>> print(svd.singular_values_) # doctest: +ELLIPSIS
[ 2.6318... 2.2215... 2.1939... 2.1010... 1.9317...]

See also
--------
Expand Down Expand Up @@ -185,6 +192,8 @@ def fit_transform(self, X, y=None):
else:
full_var = np.var(X, axis=0).sum()
self.explained_variance_ratio_ = exp_var / full_var
self.singular_values_ = Sigma # Store the singular values.

return X_transformed

def transform(self, X):
Expand Down