Frequent Patterns Sparse Support #667

rasbt · 2020-02-21T19:10:55Z

Attempt to add support for Pandas 1.0 in frequent pattern mining functions.

It looks like pandas 1.0 had quite a revamp regarding sparse data frames. I attempted to use the migration guide at https://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#migrating but it's a bit tricky, because they don't have fill values anymore etc.

@dbarbier and @DanielMorales9 , do you have experience with porting code to pandas 1.0? I would really appreciate your feedback for how to handle these cases.

dbarbier · 2020-02-22T14:14:09Z

I believe that e3ff3a7 fixes all issues with pandas 1.0, except failures with test_sparse_with_zero. But this is a because pandas 1.0 does not handle sparse dataframes with zeros, see pandas-dev/pandas#29814. You could either remove this test, or discard it with pandas >= 1.0.

This reverts commit b8a1cd9.

pep8speaks · 2020-02-22T19:09:27Z

Hello @rasbt! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-02-24 20:39:52 UTC

rasbt · 2020-02-22T19:15:42Z

Thanks for the feedback! To prevent the codebase becoming more and more cluttered, I thought that even dropping SparseDataFrame support entirely may be a good idea (in favor of the new sparse data handling in Pandas >= 1.0). I added a check valid_input_check() (in fpcommon.py) to let users now that we only support the "new" way.

Thanks your points, most of the issues are now addressed. One thing though is that in some tests, we check columns and dtypes of the columns. However, when doing sth like

dfs = df.astype(pd.SparseDtype("int", np.nan)).sparse.to_coo()

this will produce a sparse scipy matrix, which doesn't have dtypes and columns. So we'd just need to do

dfs = df.astype(pd.SparseDtype("int", np.nan))

I guess. But is this now really "sparse" or do we need to do sth extra?

dbarbier · 2020-02-22T22:58:36Z

Thanks for the feedback! To prevent the codebase becoming more and more cluttered, I thought that even dropping SparseDataFrame support entirely may be a good idea (in favor of the new sparse data handling in Pandas >= 1.0). I added a check valid_input_check() (in fpcommon.py) to let users now that we only support the "new" way.

Good idea, you can then remove if hasattr(df, "to_coo") blocks which become useless.

With dfs = df.astype(pd.SparseDtype(...)), all columns become sparse, yes.

rasbt · 2020-02-23T22:14:50Z

Good idea, you can then remove if hasattr(df, "to_coo") blocks which become useless.

Ah yes, thanks, will do!

dfs = df.astype(pd.SparseDtype(...))

Thanks! I was a bit suspicious for some reason ...

rasbt · 2020-02-23T22:59:22Z

I think I addressed everything now. Will make a new "bugfix" version so that there is a version that works for people who installed the latest pandas version. Thanks for your help!

dbarbier · 2020-02-23T23:19:37Z

mlxtend/frequent_patterns/fpcommon.py

-                     (df.dtypes == bool)).all()
-    else:
-        all_bools = (df.dtypes == bool).all()
+    all_bools = (df.dtypes == bool).all()


You should have kept

all_bools = ((df.dtypes == pd.SparseDtype(bool)) | (df.dtypes == bool)).all()

not sure why I deleted this! Now, after adding it back, for some reason

(df.dtypes == pd.SparseDtype(bool))

produces a deprecation warning for some reason.

DeprecationWarning: elementwise comparison failed; this will raise an error in the future. res_values = op(left, right)

Can't reproduce this issue outside the code though.

Indeed, that's weird; I do not find how to avoid this warning.

This seems to work; is_bool_dtype had been introduced in pandas 0.19, so there is no need to check version:

# Fast path: if all columns are boolean, there is nothing to checks all_bools = df.dtypes.apply(pd.api.types.is_bool_dtype).all()

rasbt · 2020-02-24T21:34:50Z

Awesome, that works! Thanks a lot!

sparse attempt

0c6db23

rasbt added 3 commits February 22, 2020 12:57

remove SparseDataFrame support

b8a1cd9

Revert "remove SparseDataFrame support"

ab13c90

This reverts commit b8a1cd9.

remove SparseDataFrame support

0fac24d

cleanup

88c3758

fixes isssues with new sparse format

3c9191f

rasbt changed the title ~~Frequent Patterns Sparse Support [WIP]~~ Frequent Patterns Sparse Support Feb 23, 2020

upd docs

addd4af

dbarbier reviewed Feb 23, 2020

View reviewed changes

rasbt added 2 commits February 23, 2020 18:35

add back bool comp

28dde81

fix bool check

0c101fc

rasbt merged commit 213fd02 into master Feb 24, 2020

rasbt deleted the fix-pandas branch November 12, 2020 17:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Frequent Patterns Sparse Support #667

Frequent Patterns Sparse Support #667

rasbt commented Feb 21, 2020

dbarbier commented Feb 22, 2020

pep8speaks commented Feb 22, 2020 •

edited

Loading

rasbt commented Feb 22, 2020

dbarbier commented Feb 22, 2020

rasbt commented Feb 23, 2020

rasbt commented Feb 23, 2020

dbarbier Feb 23, 2020

rasbt Feb 24, 2020

dbarbier Feb 24, 2020

dbarbier Feb 24, 2020

rasbt commented Feb 24, 2020

Frequent Patterns Sparse Support #667

Frequent Patterns Sparse Support #667

Conversation

rasbt commented Feb 21, 2020

dbarbier commented Feb 22, 2020

pep8speaks commented Feb 22, 2020 • edited Loading

Comment last updated at 2020-02-24 20:39:52 UTC

rasbt commented Feb 22, 2020

dbarbier commented Feb 22, 2020

rasbt commented Feb 23, 2020

rasbt commented Feb 23, 2020

dbarbier Feb 23, 2020

Choose a reason for hiding this comment

rasbt Feb 24, 2020

Choose a reason for hiding this comment

dbarbier Feb 24, 2020

Choose a reason for hiding this comment

dbarbier Feb 24, 2020

Choose a reason for hiding this comment

rasbt commented Feb 24, 2020

pep8speaks commented Feb 22, 2020 •

edited

Loading