n_components in PCA explicitly limited by n_features only #8484
Comments
This was referenced Mar 1, 2017
Closed
(I should maybe clarify I created this Issue for reference for my pull request.) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
As shown in #7947, if n_samples < n_components (inputed by the user) < n_features, PCA (in pca.py) proceeds without raising any error but returns a result with a number of components equal to n_samples (the latter is the normal PCA algorithm result). This lack of an error message taking n_samples into account in the same way there is one taking n_features into account results in a number of inconsistencies in the code. There are also a number of inconsistencies in documentation which I address in my pull request.
I am aware @amueller and @jnothman indicated an error message would not be necessary, but my understanding is that this was not saying that the optimal solution would not be indeed to return such an error and deal with whatever related issues there would be. Please correct me if I am wrong.
Some of the main inconsistencies:
On the n_components_ attribute, see also my message dated 21 Feb 2017 currently at the bottom of the discussion in #7947.
Here is the code for 2):
Steps/Code to Reproduce
Results
Returns following error
Versions
Darwin-16.4.0-x86_64-i386-64bit
('Python', '2.7.13 |Anaconda custom (x86_64)| (default, Dec 20 2016, 23:05:08) \n[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]')
('NumPy', '1.12.0')
('SciPy', '0.18.1')
('Scikit-Learn', '0.19.dev0')
The text was updated successfully, but these errors were encountered: