Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apriori/one hot speedup #327

Merged
merged 3 commits into from
Feb 12, 2018
Merged

Apriori/one hot speedup #327

merged 3 commits into from
Feb 12, 2018

Conversation

jaksmid
Copy link

@jaksmid jaksmid commented Feb 9, 2018

On my computer, the Apriori algorithm could not finish in reasonable time with the document containing millions of rows.
I optimized the code so the algorithm runs much faster

  • Transform: One hot is much faster
  • Apriori is much faster - some invalid combinations omitted

* Transform: Apriori is much faster - some invalid combinations omitted
@pep8speaks
Copy link

pep8speaks commented Feb 9, 2018

Hello @jaksmid! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on February 12, 2018 at 14:18 Hours UTC

@coveralls
Copy link

coveralls commented Feb 9, 2018

Coverage Status

Coverage increased (+0.02%) to 91.562% when pulling 3f0ae0d on jaksmid:master into ebc692b on rasbt:master.

@rasbt
Copy link
Owner

rasbt commented Feb 9, 2018

This is great, thanks a lot!

Yeah, omitting items/itemsets without enough support in the previous step actually makes sense.

Not sure, but do you have "allow maintainer edits" enabled on this PR? (I wish GitHub would add a visual clue for that). I wanted to adjust the documentation a bit for consistency.

@jaksmid
Copy link
Author

jaksmid commented Feb 10, 2018

Hello @rasbt,
glad that you like my PR.

I see allow edits from maintainers as checked. But if you send me what you want to have changed I will be happy to incorporate the amendments for you.

@rasbt
Copy link
Owner

rasbt commented Feb 10, 2018

Thanks! There are actually only two little things that come to mind:

  • it's not included in the API doc because it's an internal function, but could you change the docstring of "generate_new_combinations" to the NumPy-style docstring format for consistency with other docstrings?

  • An entry in the Changelog (mlxtend/docs/sources/CHANGELOG.md) would be nice. I just see that there's no category for "improvements" but I think the "Changes" category should suffice :)

@jaksmid
Copy link
Author

jaksmid commented Feb 12, 2018

Done. Is this something you had in mind @rasbt?

@rasbt
Copy link
Owner

rasbt commented Feb 12, 2018

Thanks for the update and the PR. Looks good to me now and I am happy to merge this :)

@rasbt rasbt merged commit 98ba8a9 into rasbt:master Feb 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants