-
Notifications
You must be signed in to change notification settings - Fork 855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fpgrowth algorithm to frequent itemset package #550
Conversation
Hmm, I thought I didn't need to sync my fork, but it was out of date so now I have an extra merge commit here (master->feature branch). Let me know if you want me to close this and redo, so there isn't that ugly commit. |
Thanks a lot for the PR! Regarding
no need to worry about such things as "ugly commits" if everything seems to be complete. I haven't gone through all the code yet (currently traveling), but one issue I saw is that somehow the content of the mlxtend/mlxtend/frequent_patterns/tests/test_apriori.py file got deleted (and replaced by some setup code for |
Semi-intentional :). The class name was a typo, I meant to call it FPTestApriori, but otherwise yes, I meant to move most of the contents of that file into a different file (i.e., This is related to what I was describing in (2) in the first comment. Because fpgrowth and apriori solve the same thing and have the same API it made sense to have shared unit test code. So, to use the same unit tests between them, I created a base class (FPTestBase) in I will wait to make any changes related to the unit tests until you let me know if you want to handle the unit test design for this in a different manner. |
…pgrowth function so function was unused
|
||
self.df = pd.DataFrame(self.one_ary, columns=self.cols) | ||
|
||
self.fpalgo = apriori |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line is not really needed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think you are right.
However, if I remove it, it creates red error lines in my editor because it's not defined (but it will still work because it gets defined).
I guess the best way to handle it may be to force the argument in the constructor:
test_fpbase.py
def setUp(self, fpalgo):
self.one_ary = np.array(
[[0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1],
[0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1],
[1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1],
[0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0]])
self.cols = ['Apple', 'Corn', 'Dill', 'Eggs', 'Ice cream',
'Kidney Beans', 'Milk',
'Nutmeg', 'Onion', 'Unicorn', 'Yogurt']
self.df = pd.DataFrame(self.one_ary, columns=self.cols)
self.fpalgo = fpalgo
test_fpgrowth.py
class FPTestGrowth(unittest.TestCase, FPTestBase):
def setUp(self):
FPTestBase.setUp(self, fpgrowth)
test_apriori.py
class FPTestApriori(unittest.TestCase, FPTestBase):
def setUp(self):
FPTestBase.setUp(self, apriori)
Oh I see now ... that is actually quite an elegant solution to avoid duplicating the tests and also making sure that both apriori and fpgrowth produce the same results. What do you think about creating documentation? I'd suggest we make a standalone Jupyter Notebook like for apriori but then only add one simple, single example and mention that the usage is the same as for apriori and refer readers to see the apriori docs for more examples. We could then also add a benchmark example based on the numbers you quoted in the other "issue thread." |
That documentation plan sounds good to me 👍 |
Hello @harenbergsd! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-06-10 07:05:01 UTC |
Alright, just added the docs. Once the unit tests pass, I guess it's good to merge. This really is a great PR, and I really appreciate it! |
Cool, looks good! Thanks for doing the doc. |
Description
Added FP-Growth algorithm for mining frequent itemsets.
Related issues or pull requests
Addresses issue #509 and #248
Pull Request Checklist
./docs/sources/CHANGELOG.md
file (if applicable)./mlxtend/*/tests
directories (if applicable)mlxtend/docs/sources/
(if applicable)PYTHONPATH='.' pytest ./mlxtend -sv
and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g.,PYTHONPATH='.' pytest ./mlxtend/classifier/tests/test_stacking_cv_classifier.py -sv
)flake8 ./mlxtend
A couple notes to discuss.
(1) Regarding the documentation, there is a lot of overlap with apriori; same concept, same API, etc. Maybe we should have a generic documentation, that would basically be what 'Apriori' is now, except have it mention both algorithms and their APIs.
(2) Regarding the unit tests, I figured that we might as well reuse the same tests for both apriori and fpgrowth. I created a base class that defines the tests and then the tests for each algorithm subclass that base class and set a function object.