Skip to content

GradientBoostingClassifier.fit accepts sparse X, but .predict does not #6101

@aflaxman

Description

@aflaxman

I have a sparse dataset that is too large for main memory if I call X.todense(). If I understand correctly, GradientBoostingClassifier.fit will accept my sparse X, but it is not currently possible to use GradientBoostingClassifier.predict on the results. It would be great if that were not the case.

Here is a minimal example of the issue:

from scipy import sparse
from sklearn.datasets.samples_generator import make_classification
from sklearn.ensemble import GradientBoostingClassifier

X, y = make_classification(n_samples=20, n_features=5, random_state=0)
X_sp = sparse.coo_matrix(X)

clf = GradientBoostingClassifier()
clf.fit(X,y)
clf.predict(X)  # works

clf.fit(X_sp, y)  # works
clf.predict(X_sp)  # fails with TypeError: A sparse matrix was passed, but dense data is required.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions