-
Notifications
You must be signed in to change notification settings - Fork 855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frequent Patterns Sparse Support #667
Conversation
I believe that e3ff3a7 fixes all issues with pandas 1.0, except failures with |
Thanks for the feedback! To prevent the codebase becoming more and more cluttered, I thought that even dropping SparseDataFrame support entirely may be a good idea (in favor of the new sparse data handling in Pandas >= 1.0). I added a check Thanks your points, most of the issues are now addressed. One thing though is that in some tests, we check columns and dtypes of the columns. However, when doing sth like
this will produce a sparse scipy matrix, which doesn't have dtypes and columns. So we'd just need to do
I guess. But is this now really "sparse" or do we need to do sth extra? |
Good idea, you can then remove With |
Ah yes, thanks, will do!
Thanks! I was a bit suspicious for some reason ... |
I think I addressed everything now. Will make a new "bugfix" version so that there is a version that works for people who installed the latest pandas version. Thanks for your help! |
(df.dtypes == bool)).all() | ||
else: | ||
all_bools = (df.dtypes == bool).all() | ||
all_bools = (df.dtypes == bool).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should have kept
all_bools = ((df.dtypes == pd.SparseDtype(bool)) |
(df.dtypes == bool)).all()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why I deleted this! Now, after adding it back, for some reason
(df.dtypes == pd.SparseDtype(bool))
produces a deprecation warning for some reason.
DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
res_values = op(left, right)
Can't reproduce this issue outside the code though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, that's weird; I do not find how to avoid this warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to work; is_bool_dtype
had been introduced in pandas 0.19, so there is no need to check version:
# Fast path: if all columns are boolean, there is nothing to checks
all_bools = df.dtypes.apply(pd.api.types.is_bool_dtype).all()
Awesome, that works! Thanks a lot! |
Attempt to add support for Pandas 1.0 in frequent pattern mining functions.
It looks like pandas 1.0 had quite a revamp regarding sparse data frames. I attempted to use the migration guide at https://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating but it's a bit tricky, because they don't have fill values anymore etc.
@dbarbier and @DanielMorales9 , do you have experience with porting code to pandas 1.0? I would really appreciate your feedback for how to handle these cases.