Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add warning for pandas sparse Dataframe in check_array #16021

Merged
merged 20 commits into from
Jan 27, 2020

Conversation

rushabh-v
Copy link
Contributor

Fixes #15976

@rushabh-v
Copy link
Contributor Author

rushabh-v commented Jan 4, 2020

The added test fails and gives ValueError: shape mismatch: value array of shape (2,) could not be broadcast to indexing result of shape (1,10) on only Linux py35_conda_openblas. And this error is raised by pandas. Any hints why it only fails on Linux py35_conda_openblas?
logs here

@rushabh-v
Copy link
Contributor Author

rushabh-v commented Jan 4, 2020

The added test fails and gives ValueError: shape mismatch: value array of shape (2,) could not be broadcast to indexing result of shape (1,10) on only Linux py35_conda_openblas. And this error is raised by pandas. Any hints why it only fails on Linux py35_conda_openblas?
logs here

Linux pylatest_pip_openblas_pandas passes this test.

@rushabh-v
Copy link
Contributor Author

This bug was fixed in numpy/numpy#5710.
But the Linux py35_conda_openblas is using numpy version 1.11.0

@jnothman
Copy link
Member

jnothman commented Jan 8, 2020

Is there a pd 0.23 equivalent of pd.api.types.is_sparse, @TomAugspurger?

I suppose I'm fine with having this feature limited to recent Pandas

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jan 8, 2020 via email

@rushabh-v
Copy link
Contributor Author

rushabh-v commented Jan 8, 2020

Can somebody please guide me to get rid of these failing codecov/patch checks?

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @rushabh-v !

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @rushabh-v !

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise lgtm

from pandas.api.types import is_sparse
if array.dtypes.apply(is_sparse).any():
warnings.warn(
"pandas Dataframe having sparse columns found."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"pandas Dataframe having sparse columns found."
"pandas.DataFrame having sparse columns found."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if array.dtypes.apply(is_sparse).any():
warnings.warn(
"pandas Dataframe having sparse columns found."
"It will be inflated automatically."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "inflated" the formal term? We may use "densified" elsewhere

Copy link
Contributor Author

@rushabh-v rushabh-v Jan 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have used "densified" there now.

@rushabh-v rushabh-v requested a review from jnothman January 14, 2020 03:15
@rushabh-v
Copy link
Contributor Author

@jnothman Can you review it again for the changes you requested, please?

@jnothman
Copy link
Member

I don't think another review from me us needed. Please await another core developer's review. Thanks

@rushabh-v rushabh-v requested review from qinhanmin2014, amueller, rth and thomasjpfan and removed request for thomasjpfan January 25, 2020 11:20
Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, otherwise LGTM thanks!

@@ -131,3 +131,6 @@ Changelog

- |Enhancement| improve error message in :func:`utils.validation.column_or_1d`.
:pr:`15926` by :user:`Loïc Estève <lesteve>`.
- |Enhancement| add warning in :func:`utils.validation.check_array` for
pandas sparse Dataframe.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pandas sparse Dataframe.
pandas sparse DataFrame.

pd = pytest.importorskip('pandas')
# restrict the pd versions < '0.24.0' as they have a bug in is_sparse func
if LooseVersion(pd.__version__) < '0.24.0':
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return
pytest.skip(reason="pandas 0.24+ required.")

from pandas.api.types import is_sparse
if array.dtypes.apply(is_sparse).any():
warnings.warn(
"pandas.DataFrame having sparse columns found."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"pandas.DataFrame having sparse columns found."
"pandas.DataFrame with sparse columns found."

if array.dtypes.apply(is_sparse).any():
warnings.warn(
"pandas.DataFrame having sparse columns found."
"It will be densified automatically."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"It will be densified automatically."
"It will be converted to a dense numpy array."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made all the changes

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@rth rth merged commit c2ede74 into scikit-learn:master Jan 27, 2020
@rushabh-v rushabh-v deleted the patch-3 branch January 27, 2020 15:32
thomasjpfan pushed a commit to thomasjpfan/scikit-learn that referenced this pull request Feb 22, 2020
panpiort8 pushed a commit to panpiort8/scikit-learn that referenced this pull request Mar 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Throwing a warning for Pandas SparseArrays
5 participants