Add fpgrowth algorithm to frequent itemset package #550

harenbergsd · 2019-06-09T01:02:32Z

Description

Added FP-Growth algorithm for mining frequent itemsets.

Related issues or pull requests

Addresses issue #509 and #248

Pull Request Checklist

Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file (if applicable)
Added appropriate unit test functions in the ./mlxtend/*/tests directories (if applicable)
Modify documentation in the corresponding Jupyter Notebook under mlxtend/docs/sources/ (if applicable)
Ran PYTHONPATH='.' pytest ./mlxtend -sv and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., PYTHONPATH='.' pytest ./mlxtend/classifier/tests/test_stacking_cv_classifier.py -sv)
Checked for style issues by running flake8 ./mlxtend

A couple notes to discuss.

(1) Regarding the documentation, there is a lot of overlap with apriori; same concept, same API, etc. Maybe we should have a generic documentation, that would basically be what 'Apriori' is now, except have it mention both algorithms and their APIs.

(2) Regarding the unit tests, I figured that we might as well reuse the same tests for both apriori and fpgrowth. I created a base class that defines the tests and then the tests for each algorithm subclass that base class and set a function object.

harenbergsd · 2019-06-09T01:17:03Z

Hmm, I thought I didn't need to sync my fork, but it was out of date so now I have an extra merge commit here (master->feature branch). Let me know if you want me to close this and redo, so there isn't that ugly commit.

coveralls · 2019-06-09T01:27:14Z

Coverage increased (+0.06%) to 92.16% when pulling d240c87 on harenbergsd:fpgrowth into 6937439 on rasbt:master.

rasbt · 2019-06-09T07:50:32Z

Thanks a lot for the PR! Regarding

Let me know if you want me to close this and redo, so there isn't that ugly commit.

no need to worry about such things as "ugly commits" if everything seems to be complete. I haven't gone through all the code yet (currently traveling), but one issue I saw is that somehow the content of the mlxtend/mlxtend/frequent_patterns/tests/test_apriori.py file got deleted (and replaced by some setup code for FPTestGrowth, is that on purpose or an accident?

harenbergsd · 2019-06-09T16:04:13Z

one issue I saw is that somehow the content of the mlxtend/mlxtend/frequent_patterns/tests/test_apriori.py file got deleted (and replaced by some setup code for FPTestGrowth, is that on purpose or an accident?

Semi-intentional :). The class name was a typo, I meant to call it FPTestApriori, but otherwise yes, I meant to move most of the contents of that file into a different file (i.e., test_fpbase.py).

This is related to what I was describing in (2) in the first comment. Because fpgrowth and apriori solve the same thing and have the same API it made sense to have shared unit test code. So, to use the same unit tests between them, I created a base class (FPTestBase) in test_fpbase.py and two subclasses (FPTestGrowth and FPTestApriori) in test_fpgrowth.py and test_apriori.py. The common unit test code (originally the code in test_apriori.py) went into FPTestBase.

I will wait to make any changes related to the unit tests until you let me know if you want to handle the unit test design for this in a different manner.

…pgrowth function so function was unused

rasbt · 2019-06-09T16:45:14Z

mlxtend/frequent_patterns/tests/test_fpbase.py

+
+        self.df = pd.DataFrame(self.one_ary, columns=self.cols)
+
+        self.fpalgo = apriori


this line is not really needed, right?

Yeah, I think you are right.

However, if I remove it, it creates red error lines in my editor because it's not defined (but it will still work because it gets defined).

I guess the best way to handle it may be to force the argument in the constructor:

test_fpbase.py

def setUp(self, fpalgo): self.one_ary = np.array( [[0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1], [0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1], [1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0], [0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1], [0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0]]) self.cols = ['Apple', 'Corn', 'Dill', 'Eggs', 'Ice cream', 'Kidney Beans', 'Milk', 'Nutmeg', 'Onion', 'Unicorn', 'Yogurt'] self.df = pd.DataFrame(self.one_ary, columns=self.cols) self.fpalgo = fpalgo

test_fpgrowth.py

class FPTestGrowth(unittest.TestCase, FPTestBase): def setUp(self): FPTestBase.setUp(self, fpgrowth)

test_apriori.py

class FPTestApriori(unittest.TestCase, FPTestBase): def setUp(self): FPTestBase.setUp(self, apriori)

rasbt · 2019-06-09T16:48:46Z

Oh I see now ... that is actually quite an elegant solution to avoid duplicating the tests and also making sure that both apriori and fpgrowth produce the same results.

What do you think about creating documentation? I'd suggest we make a standalone Jupyter Notebook like for apriori but then only add one simple, single example and mention that the usage is the same as for apriori and refer readers to see the apriori docs for more examples. We could then also add a benchmark example based on the numbers you quoted in the other "issue thread."

harenbergsd · 2019-06-09T17:13:15Z

That documentation plan sounds good to me 👍

pep8speaks · 2019-06-10T06:52:49Z

Hello @harenbergsd! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-06-10 07:05:01 UTC

rasbt · 2019-06-10T06:54:17Z

Alright, just added the docs. Once the unit tests pass, I guess it's good to merge. This really is a great PR, and I really appreciate it!

harenbergsd · 2019-06-10T23:23:39Z

Cool, looks good! Thanks for doing the doc.

harenbergsd added 2 commits June 8, 2019 20:57

Add fpgrowth algorithm to frequent itemset package

dd49b6e

Merge branch 'master' into fpgrowth

b2329e9

Remove build_tree function. The code was incorporated into the main f…

cc6aadf

…pgrowth function so function was unused

rasbt reviewed Jun 9, 2019

View reviewed changes

add documentation for fp growth

dbe4687

fix pep8

d240c87

rasbt merged commit 8dcb6e7 into rasbt:master Jun 10, 2019

harenbergsd deleted the fpgrowth branch June 10, 2019 23:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fpgrowth algorithm to frequent itemset package #550

Add fpgrowth algorithm to frequent itemset package #550

harenbergsd commented Jun 9, 2019 •

edited by rasbt

Loading

harenbergsd commented Jun 9, 2019

coveralls commented Jun 9, 2019 •

edited

Loading

rasbt commented Jun 9, 2019

harenbergsd commented Jun 9, 2019

rasbt Jun 9, 2019

harenbergsd Jun 9, 2019

rasbt commented Jun 9, 2019

harenbergsd commented Jun 9, 2019

pep8speaks commented Jun 10, 2019 •

edited

Loading

rasbt commented Jun 10, 2019

harenbergsd commented Jun 10, 2019


		self.df = pd.DataFrame(self.one_ary, columns=self.cols)

		self.fpalgo = apriori

Add fpgrowth algorithm to frequent itemset package #550

Add fpgrowth algorithm to frequent itemset package #550

Conversation

harenbergsd commented Jun 9, 2019 • edited by rasbt Loading

Description

Related issues or pull requests

Pull Request Checklist

harenbergsd commented Jun 9, 2019

coveralls commented Jun 9, 2019 • edited Loading

rasbt commented Jun 9, 2019

harenbergsd commented Jun 9, 2019

rasbt Jun 9, 2019

Choose a reason for hiding this comment

harenbergsd Jun 9, 2019

Choose a reason for hiding this comment

rasbt commented Jun 9, 2019

harenbergsd commented Jun 9, 2019

pep8speaks commented Jun 10, 2019 • edited Loading

Comment last updated at 2019-06-10 07:05:01 UTC

rasbt commented Jun 10, 2019

harenbergsd commented Jun 10, 2019

harenbergsd commented Jun 9, 2019 •

edited by rasbt

Loading

coveralls commented Jun 9, 2019 •

edited

Loading

pep8speaks commented Jun 10, 2019 •

edited

Loading