Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for newer pandas sparse dataframes in frequent_patterns #621

Merged
merged 4 commits into from
Nov 6, 2019

Conversation

dbarbier
Copy link
Contributor

@dbarbier dbarbier commented Nov 4, 2019

Description

SparseDataFrame has been deprecated, pandas 0.24 recommands to create
standard DataFrames and store sparse Series as SparseArray. This
allows to combine both dense and sparse columns.

Improve valid_input_check with sparse values; it previously uncompressed
the whole dataframe to check for invalid values, which defeats its purpose.
It now only checks existing values, which prevents memory error, and is
also much faster.

Update apriori.ipynb notebook to use the new pandas sparse DataFrame.

Related issues or pull requests

Pull Request Checklist

  • Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file (if applicable)
  • Added appropriate unit test functions in the ./mlxtend/*/tests directories (if applicable)
  • Modify documentation in the corresponding Jupyter Notebook under mlxtend/docs/sources/ (if applicable)
  • Ran PYTHONPATH='.' pytest ./mlxtend -sv and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., PYTHONPATH='.' pytest ./mlxtend/classifier/tests/test_stacking_cv_classifier.py -sv)
  • Checked for style issues by running flake8 ./mlxtend

SparseDataFrame has been deprecated, pandas now recommands to create
standard DataFrames and store sparse Series as SparseArray.  This
allows to combine both dense and sparse columns.

Improve valid_input_check with sparse values; it previously uncompressed
the whole dataframe to check for invalid values, which defeats its purpose.
It now only checks existing values, which prevents memory error, and is
also much faster.

Update apriori.ipynb notebook to use the new pandas sparse DataFrame.
@coveralls
Copy link

coveralls commented Nov 4, 2019

Coverage Status

Coverage decreased (-0.02%) to 92.337% when pulling 058ec7c on dbarbier:db/sparse into 2f928cb on rasbt:master.

Copy link
Owner

@rasbt rasbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot. Good catch regarding the checking utility. Also big thanks for modernizing it wrt the SparseDataFrame deprecation. Really appreciate it.

mlxtend/frequent_patterns/fpcommon.py Outdated Show resolved Hide resolved
mlxtend/frequent_patterns/tests/test_fpbase.py Outdated Show resolved Hide resolved
mlxtend/frequent_patterns/tests/test_fpbase.py Outdated Show resolved Hide resolved
@dbarbier
Copy link
Contributor Author

dbarbier commented Nov 5, 2019

@rasbt I will investigate travis failures, but I have an unrelated question: are issues labelled stat479 reserved for your students? I have some ideas to improve fpgrowth/fpmax with sparse input, but do not want to interfere here.

@rasbt
Copy link
Owner

rasbt commented Nov 5, 2019

will investigate travis failures

Thanks, and no rush!

stat479 reserved for your students?

No worries; the I have some honors students who would like to learn how to contribute to open source projects; these are simply ones I tagged as suggestions but no one is working on these right now.

@rasbt
Copy link
Owner

rasbt commented Nov 6, 2019

Looks good to me know. Thanks again for another great PR!

@rasbt rasbt merged commit fa643e2 into rasbt:master Nov 6, 2019
@dbarbier dbarbier deleted the db/sparse branch November 6, 2019 06:41
@rasbt rasbt mentioned this pull request Jan 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants