add support for newer pandas sparse dataframes in frequent_patterns #621

dbarbier · 2019-11-04T22:44:14Z

Description

SparseDataFrame has been deprecated, pandas 0.24 recommands to create
standard DataFrames and store sparse Series as SparseArray. This
allows to combine both dense and sparse columns.

Improve valid_input_check with sparse values; it previously uncompressed
the whole dataframe to check for invalid values, which defeats its purpose.
It now only checks existing values, which prevents memory error, and is
also much faster.

Update apriori.ipynb notebook to use the new pandas sparse DataFrame.

Related issues or pull requests

Pull Request Checklist

Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file (if applicable)
Added appropriate unit test functions in the ./mlxtend/*/tests directories (if applicable)
Modify documentation in the corresponding Jupyter Notebook under mlxtend/docs/sources/ (if applicable)
Ran PYTHONPATH='.' pytest ./mlxtend -sv and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., PYTHONPATH='.' pytest ./mlxtend/classifier/tests/test_stacking_cv_classifier.py -sv)
Checked for style issues by running flake8 ./mlxtend

SparseDataFrame has been deprecated, pandas now recommands to create standard DataFrames and store sparse Series as SparseArray. This allows to combine both dense and sparse columns. Improve valid_input_check with sparse values; it previously uncompressed the whole dataframe to check for invalid values, which defeats its purpose. It now only checks existing values, which prevents memory error, and is also much faster. Update apriori.ipynb notebook to use the new pandas sparse DataFrame.

coveralls · 2019-11-04T22:55:10Z

Coverage decreased (-0.02%) to 92.337% when pulling 058ec7c on dbarbier:db/sparse into 2f928cb on rasbt:master.

rasbt

Thanks a lot. Good catch regarding the checking utility. Also big thanks for modernizing it wrt the SparseDataFrame deprecation. Really appreciate it.

mlxtend/frequent_patterns/fpcommon.py

mlxtend/frequent_patterns/tests/test_fpbase.py

Contrary to what I thought in 0a8fa8f, this issue is still not fixed.

dbarbier · 2019-11-05T17:45:30Z

@rasbt I will investigate travis failures, but I have an unrelated question: are issues labelled stat479 reserved for your students? I have some ideas to improve fpgrowth/fpmax with sparse input, but do not want to interfere here.

rasbt · 2019-11-05T17:54:24Z

will investigate travis failures

Thanks, and no rush!

stat479 reserved for your students?

No worries; the I have some honors students who would like to learn how to contribute to open source projects; these are simply ones I tagged as suggestions but no one is working on these right now.

rasbt · 2019-11-06T01:46:42Z

Looks good to me know. Thanks again for another great PR!

dbarbier force-pushed the db/sparse branch from cfe9b0c to 0a8fa8f Compare November 4, 2019 22:45

rasbt reviewed Nov 5, 2019

View reviewed changes

mlxtend/frequent_patterns/fpcommon.py Outdated Show resolved Hide resolved

mlxtend/frequent_patterns/tests/test_fpbase.py Outdated Show resolved Hide resolved

mlxtend/frequent_patterns/tests/test_fpbase.py Outdated Show resolved Hide resolved

dbarbier added 2 commits November 5, 2019 09:54

use pandas version number to detect new sparse dataframes

ead4f32

limitations on columns of sparse DaraFrame still apply

22915a8

Contrary to what I thought in 0a8fa8f, this issue is still not fixed.

fix bug introduced by commit 22915a8 with empty DataFrame

058ec7c

dbarbier force-pushed the db/sparse branch from f71c38c to 058ec7c Compare November 5, 2019 18:27

rasbt merged commit fa643e2 into rasbt:master Nov 6, 2019

dbarbier deleted the db/sparse branch November 6, 2019 06:41

rasbt mentioned this pull request Jan 29, 2020

v0.17.1 #660

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for newer pandas sparse dataframes in frequent_patterns #621

add support for newer pandas sparse dataframes in frequent_patterns #621

dbarbier commented Nov 4, 2019

coveralls commented Nov 4, 2019 •

edited

Loading

rasbt left a comment

dbarbier commented Nov 5, 2019

rasbt commented Nov 5, 2019

rasbt commented Nov 6, 2019

add support for newer pandas sparse dataframes in frequent_patterns #621

add support for newer pandas sparse dataframes in frequent_patterns #621

Conversation

dbarbier commented Nov 4, 2019

Description

Related issues or pull requests

Pull Request Checklist

coveralls commented Nov 4, 2019 • edited Loading

rasbt left a comment

Choose a reason for hiding this comment

dbarbier commented Nov 5, 2019

rasbt commented Nov 5, 2019

rasbt commented Nov 6, 2019

coveralls commented Nov 4, 2019 •

edited

Loading