Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fpgrowth algorithm to frequent itemset package #550

Merged
merged 5 commits into from
Jun 10, 2019

Conversation

harenbergsd
Copy link
Contributor

@harenbergsd harenbergsd commented Jun 9, 2019

Description

Added FP-Growth algorithm for mining frequent itemsets.

Related issues or pull requests

Addresses issue #509 and #248

Pull Request Checklist

  • Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file (if applicable)
  • Added appropriate unit test functions in the ./mlxtend/*/tests directories (if applicable)
  • Modify documentation in the corresponding Jupyter Notebook under mlxtend/docs/sources/ (if applicable)
  • Ran PYTHONPATH='.' pytest ./mlxtend -sv and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., PYTHONPATH='.' pytest ./mlxtend/classifier/tests/test_stacking_cv_classifier.py -sv)
  • Checked for style issues by running flake8 ./mlxtend

A couple notes to discuss.

(1) Regarding the documentation, there is a lot of overlap with apriori; same concept, same API, etc. Maybe we should have a generic documentation, that would basically be what 'Apriori' is now, except have it mention both algorithms and their APIs.

(2) Regarding the unit tests, I figured that we might as well reuse the same tests for both apriori and fpgrowth. I created a base class that defines the tests and then the tests for each algorithm subclass that base class and set a function object.

@harenbergsd
Copy link
Contributor Author

Hmm, I thought I didn't need to sync my fork, but it was out of date so now I have an extra merge commit here (master->feature branch). Let me know if you want me to close this and redo, so there isn't that ugly commit.

@coveralls
Copy link

coveralls commented Jun 9, 2019

Coverage Status

Coverage increased (+0.06%) to 92.16% when pulling d240c87 on harenbergsd:fpgrowth into 6937439 on rasbt:master.

@rasbt
Copy link
Owner

rasbt commented Jun 9, 2019

Thanks a lot for the PR! Regarding

Let me know if you want me to close this and redo, so there isn't that ugly commit.

no need to worry about such things as "ugly commits" if everything seems to be complete. I haven't gone through all the code yet (currently traveling), but one issue I saw is that somehow the content of the mlxtend/mlxtend/frequent_patterns/tests/test_apriori.py file got deleted (and replaced by some setup code for FPTestGrowth, is that on purpose or an accident?

@harenbergsd
Copy link
Contributor Author

one issue I saw is that somehow the content of the mlxtend/mlxtend/frequent_patterns/tests/test_apriori.py file got deleted (and replaced by some setup code for FPTestGrowth, is that on purpose or an accident?

Semi-intentional :). The class name was a typo, I meant to call it FPTestApriori, but otherwise yes, I meant to move most of the contents of that file into a different file (i.e., test_fpbase.py).

This is related to what I was describing in (2) in the first comment. Because fpgrowth and apriori solve the same thing and have the same API it made sense to have shared unit test code. So, to use the same unit tests between them, I created a base class (FPTestBase) in test_fpbase.py and two subclasses (FPTestGrowth and FPTestApriori) in test_fpgrowth.py and test_apriori.py. The common unit test code (originally the code in test_apriori.py) went into FPTestBase.

I will wait to make any changes related to the unit tests until you let me know if you want to handle the unit test design for this in a different manner.


self.df = pd.DataFrame(self.one_ary, columns=self.cols)

self.fpalgo = apriori
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line is not really needed, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think you are right.

However, if I remove it, it creates red error lines in my editor because it's not defined (but it will still work because it gets defined).

I guess the best way to handle it may be to force the argument in the constructor:

test_fpbase.py

    def setUp(self, fpalgo):
        self.one_ary = np.array(
            [[0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1],
                [0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1],
                [1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0],
                [0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1],
                [0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0]])

        self.cols = ['Apple', 'Corn', 'Dill', 'Eggs', 'Ice cream',
                     'Kidney Beans', 'Milk',
                     'Nutmeg', 'Onion', 'Unicorn', 'Yogurt']

        self.df = pd.DataFrame(self.one_ary, columns=self.cols)

        self.fpalgo = fpalgo

test_fpgrowth.py

class FPTestGrowth(unittest.TestCase, FPTestBase):
    def setUp(self):
        FPTestBase.setUp(self, fpgrowth)

test_apriori.py

class FPTestApriori(unittest.TestCase, FPTestBase):
    def setUp(self):
        FPTestBase.setUp(self, apriori)

@rasbt
Copy link
Owner

rasbt commented Jun 9, 2019

Oh I see now ... that is actually quite an elegant solution to avoid duplicating the tests and also making sure that both apriori and fpgrowth produce the same results.

What do you think about creating documentation? I'd suggest we make a standalone Jupyter Notebook like for apriori but then only add one simple, single example and mention that the usage is the same as for apriori and refer readers to see the apriori docs for more examples. We could then also add a benchmark example based on the numbers you quoted in the other "issue thread."

@harenbergsd
Copy link
Contributor Author

That documentation plan sounds good to me 👍

@pep8speaks
Copy link

pep8speaks commented Jun 10, 2019

Hello @harenbergsd! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-06-10 07:05:01 UTC

@rasbt
Copy link
Owner

rasbt commented Jun 10, 2019

Alright, just added the docs. Once the unit tests pass, I guess it's good to merge. This really is a great PR, and I really appreciate it!

@rasbt rasbt merged commit 8dcb6e7 into rasbt:master Jun 10, 2019
@harenbergsd harenbergsd deleted the fpgrowth branch June 10, 2019 23:18
@harenbergsd
Copy link
Contributor Author

Cool, looks good! Thanks for doing the doc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants