Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
n_components in PCA explicitly limited by n_features only #8484
As shown in #7947, if n_samples < n_components (inputed by the user) < n_features, PCA (in pca.py) proceeds without raising any error but returns a result with a number of components equal to n_samples (the latter is the normal PCA algorithm result). This lack of an error message taking n_samples into account in the same way there is one taking n_features into account results in a number of inconsistencies in the code. There are also a number of inconsistencies in documentation which I address in my pull request.
I am aware @amueller and @jnothman indicated an error message would not be necessary, but my understanding is that this was not saying that the optimal solution would not be indeed to return such an error and deal with whatever related issues there would be. Please correct me if I am wrong.
Some of the main inconsistencies:
On the n_components_ attribute, see also my message dated 21 Feb 2017 currently at the bottom of the discussion in #7947.
Steps/Code to Reproduce
import numpy as np from .pca import PCA X = np.array([[-1, -1,3,4,-1, -1,3,4], [-2, -1,5,-1, -1,3,4,2], [-3, -2,1,-1, -1,3,4,1], [1, 1,4,-1, -1,3,4,2], [2, 1,0,-1, -1,3,4,2], [3, 2,10,-1, -1,3,4,10]]) pca = PCA(n_components = 7, svd_solver= "arpack") pca.fit(X)
Returns following error