Skip to content

IncrementalPCA fails if data size % batch size < n_components #12234

@janezd

Description

@janezd

Description

IncrementalPCA throwsn_components=%r must be less or equal to the batch number of samples %d

The error occurs because the last batch generated by utils.gen_batch may be smaller than batch_size.

Steps/Code to Reproduce

from sklearn.datasets import load_iris
from sklearn.decomposition import PCA, IncrementalPCA
   
iris = load_iris()
X = iris.data[:101]
ipca = IncrementalPCA(n_components=2, batch_size=10)
X_ipca = ipca.fit_transform(X)

I reduced the iris data to 101 instances, so the last batch has only a single data instance, which is less than the number of components.

As far as I see, none of the current unit tests run into this. (test_incremental_pca_batch_signs could, if the code that raises the exception would compare self.n_components_ with n_samples - which it should, but doesn't).

Skipping the last batch if it is to small, that is, changing

        for batch in gen_batches(n_samples, self.batch_size_):
                self.partial_fit(X[batch], check_input=False)

to

        for batch in gen_batches(n_samples, self.batch_size_):
            if self.n_components is None \
                    or X[batch].shape[0] >= self.n_components:
                self.partial_fit(X[batch], check_input=False)

fixes the problem. @kastnerkyle, please confirm that this solution seems OK before I go preparing the PR and tests.

Expected Results

No error is thrown.

Actual Results

ValueError: n_components=2 must be less or equal to the batch number of samples 1.

Versions

Darwin-18.0.0-x86_64-i386-64bit
Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:14:59)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
NumPy 1.15.2
SciPy 1.1.0
Scikit-Learn 0.20.0

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions