-
Notifications
You must be signed in to change notification settings - Fork 861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for newer pandas sparse dataframes in frequent_patterns #621
Conversation
SparseDataFrame has been deprecated, pandas now recommands to create standard DataFrames and store sparse Series as SparseArray. This allows to combine both dense and sparse columns. Improve valid_input_check with sparse values; it previously uncompressed the whole dataframe to check for invalid values, which defeats its purpose. It now only checks existing values, which prevents memory error, and is also much faster. Update apriori.ipynb notebook to use the new pandas sparse DataFrame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot. Good catch regarding the checking utility. Also big thanks for modernizing it wrt the SparseDataFrame deprecation. Really appreciate it.
Contrary to what I thought in 0a8fa8f, this issue is still not fixed.
@rasbt I will investigate travis failures, but I have an unrelated question: are issues labelled stat479 reserved for your students? I have some ideas to improve fpgrowth/fpmax with sparse input, but do not want to interfere here. |
Thanks, and no rush!
No worries; the I have some honors students who would like to learn how to contribute to open source projects; these are simply ones I tagged as suggestions but no one is working on these right now. |
Looks good to me know. Thanks again for another great PR! |
Description
SparseDataFrame has been deprecated, pandas 0.24 recommands to create
standard DataFrames and store sparse Series as SparseArray. This
allows to combine both dense and sparse columns.
Improve
valid_input_check
with sparse values; it previously uncompressedthe whole dataframe to check for invalid values, which defeats its purpose.
It now only checks existing values, which prevents memory error, and is
also much faster.
Update apriori.ipynb notebook to use the new pandas sparse DataFrame.
Related issues or pull requests
Pull Request Checklist
./docs/sources/CHANGELOG.md
file (if applicable)./mlxtend/*/tests
directories (if applicable)mlxtend/docs/sources/
(if applicable)PYTHONPATH='.' pytest ./mlxtend -sv
and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g.,PYTHONPATH='.' pytest ./mlxtend/classifier/tests/test_stacking_cv_classifier.py -sv
)flake8 ./mlxtend