Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PolynomialFeatures' docstring does not mention that sparse data is allowed for fit #16646

Closed
mfeurer opened this issue Mar 5, 2020 · 11 comments · Fixed by #16656
Closed

PolynomialFeatures' docstring does not mention that sparse data is allowed for fit #16646

mfeurer opened this issue Mar 5, 2020 · 11 comments · Fixed by #16656
Assignees
Labels
Documentation Easy Well-defined and straightforward way to resolve good first issue Easy with clear instructions to resolve

Comments

@mfeurer
Copy link
Contributor

mfeurer commented Mar 5, 2020

Describe the issue linked to the documentation

The docs for PolynomialFeatures.fit() does not mention that sparse data is allowed. The same holds true for fit_transform.

Suggest a potential alternative/fix

The docs for PolynomialFeatures.fit() and PolynomialFeatures.fit_transform() mention that sparse data is allowed.

@rth
Copy link
Member

rth commented Mar 5, 2020

PolynomialFeatures should work with sparse input

If the degree is 2 or 3, the method described in "Leveraging
Sparsity to Speed Up Polynomial Feature Expansions of CSR Matrices
Using K-Simplex Numbers" by Andrew Nystrom and John Hughes is

what error are you getting?

@NicolasHug
Copy link
Member

I think it's just a doc issue,

instead of

X : array-like, shape (n_samples, n_features)

this should be

X : {array-like, sparse matrix}, shape (n_samples, n_features)

for fit and fit_transform

@NicolasHug NicolasHug added Easy Well-defined and straightforward way to resolve good first issue Easy with clear instructions to resolve help wanted labels Mar 5, 2020
@mfeurer
Copy link
Contributor Author

mfeurer commented Mar 5, 2020

what error are you getting?

There's no error, the docs for fit() and fit_transform() simply don't mention that sparse data is allowed. I expected the docs to say X : array-like or sparse matrix of shape (n_samples, n_features) instead of X : array-like of shape (n_samples, n_features) as for example done by TruncatedSVD

@amueller
Copy link
Member

amueller commented Mar 5, 2020

there should probably be a tag and we should test consistency of tag and documentation and actual behavior, right?

@genziano
Copy link
Contributor

genziano commented Mar 6, 2020

Hi all, may I take care of this one?
It would be my first contribution here, and the tag "good first issue" seems appropriate :)

@NicolasHug
Copy link
Member

Go ahead @Alemaudit , thanks!

(focus on the docstring, you can ignore the tag comment which is more advanced stuff).

In order to claim the issue for yourself, please comment exactly take below for the CI to automatically assign the issue to you.

@genziano
Copy link
Contributor

genziano commented Mar 6, 2020

take

@genziano
Copy link
Contributor

genziano commented Mar 6, 2020

Thank you @NicolasHug! Done

@genziano
Copy link
Contributor

genziano commented Mar 7, 2020

Quick question:

The .transform() method's doc mentions only the possibility of CSC/CSR sparse matrices. However, it accepts any type of matrices from scipy.sparse, converting the other types to CSR afterward.

X : array-like or CSR/CSC sparse matrix, shape [n_samples, n_features]
The data to transform, row by row.

X = check_array(X, order='F', dtype=FLOAT_DTYPES,
accept_sparse=('csr', 'csc'))

Is it an idea do document .fit() and transform() the same way?

X : {array-like, sparse matrix}, shape (n_samples, n_features)

@jnothman
Copy link
Member

jnothman commented Mar 8, 2020

Is it an idea do document .fit() and transform() the same way?

Go ahead

genziano added a commit to genziano/scikit-learn that referenced this issue Mar 8, 2020
genziano added a commit to genziano/scikit-learn that referenced this issue Mar 8, 2020
genziano added a commit to genziano/scikit-learn that referenced this issue Mar 10, 2020
genziano added a commit to genziano/scikit-learn that referenced this issue Mar 10, 2020
genziano added a commit to genziano/scikit-learn that referenced this issue Mar 11, 2020
genziano added a commit to genziano/scikit-learn that referenced this issue Mar 11, 2020
genziano added a commit to genziano/scikit-learn that referenced this issue Mar 12, 2020
genziano added a commit to genziano/scikit-learn that referenced this issue Mar 17, 2020
genziano added a commit to genziano/scikit-learn that referenced this issue Mar 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Easy Well-defined and straightforward way to resolve good first issue Easy with clear instructions to resolve
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants