Fpmax #553

harenbergsd · 2019-06-18T03:32:10Z

Description

Added FPMax algorithm to frequent pattern mining. FPMax find maximal itemsets.

Related issues or pull requests

Related to #509

Pull Request Checklist

Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file (if applicable)
Added appropriate unit test functions in the ./mlxtend/*/tests directories (if applicable)
Modify documentation in the corresponding Jupyter Notebook under mlxtend/docs/sources/ (if applicable)
Ran PYTHONPATH='.' pytest ./mlxtend -sv and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., PYTHONPATH='.' pytest ./mlxtend/classifier/tests/test_stacking_cv_classifier.py -sv)
Checked for style issues by running flake8 ./mlxtend

…eeded

coveralls · 2019-06-19T03:30:34Z

Coverage increased (+0.06%) to 92.104% when pulling 85816aa on harenbergsd:fpmax into b20e57c on rasbt:master.

rasbt · 2019-06-19T15:13:54Z

mlxtend/frequent_patterns/apriori.py

-            s = ('The allowed values for a DataFrame'
-                 ' are True, False, 0, 1. Found value %s' % (val))
-            raise ValueError(s)
+    idxs = np.where((df.values != 1) & (df.values != 0))


Do you know if that captures True and False values as well by chance?

Yeah, in python, 1 == True (==1.0)

haha, right, seems like my basic Python skill are getting a tad rusty

rasbt · 2019-06-19T15:16:11Z

Again, thanks so much for this very nice PR!

I can take care of the documentation part.

harenbergsd · 2019-06-19T21:56:32Z

Ok, cool, thanks!

harenbergsd · 2019-06-19T22:29:44Z

BTW, one design decision to discuss regarding max_len. It's not quite so clear what to do with max_len for this problem, where you are enumerating maximal itemsets. There are two options:

Given the set of all maximal itemsets, return those that are less than max_len
Given the set of all itemsets less than max_len, return those that are maximal

The latter is including max_len in the maximality constraint.

For example, if {a,b,c,d} and {c,d,e} are maximal itemsets. If you user gives a max_len of 3 the options would return:

{c,d,e}
{a,b,c}, {a,b,d}, {a,c,d}, {b,c,d}, {c,d,e}

I chose to go with the former, option (1), because with option (2) you can return itemsets that are not maximal wrt to the complete database. In other words, option (1) gives you subset of the complete set of solutions.

The other way would be fairly easy to implement as well, so if you think it's better, we can do it.

rasbt · 2019-06-19T23:04:56Z

Good point. I think there would be edge cases where someone would prefer one over the other. However, when I think of maximal itemsets, intuitively, I would think about it wrt the complete database as in scenario 1. I'd say we should go with 1, like you also suggest.

rasbt · 2019-06-21T23:12:29Z

Regarding what's currently implemented ... looking at the code (

mlxtend/mlxtend/frequent_patterns/fpmax.py

Line 86 in 9be453c

if max_len is None or len(largest_set) <= max_len:

), it looks like it's option 1, correct?

harenbergsd · 2019-06-21T23:56:41Z

Yep, option 1.

rasbt · 2019-06-23T16:37:35Z

Once the unit tests pass, I think this should be good to merge. Thanks a lot once again. Looks like quite some effort went into the code, and I particularly like the engineering best-practices regarding refactoring and encapsulation.

harenbergsd added 9 commits June 17, 2019 23:21

Add fpmax algorithm to frequent patterns module

53c1075

Refactor unit tests for frequent patterns

1ec3532

Small fix to fpmax

516a74a

Add unit tests for fpmax

128602c

Fix unit tests for apriori and growth plus more refactoring

4a67dfc

Change EOL to match rest of repo (LF instead of CRLF)

39f2ffc

Remove unittest parent class from frequent pattern tests as it is unn…

a0b2acf

…eeded

Improve valid val check performance in frequent patterns

962eac6

Fix some pep8 issues

28369e1

harenbergsd added 2 commits June 19, 2019 00:13

Fix pytest issues

00f95e3

Refactor fpgrowth

55905ce

harenbergsd force-pushed the fpmax branch from 8170d87 to 55905ce Compare June 19, 2019 04:14

rasbt reviewed Jun 19, 2019

View reviewed changes

add boolean array to unit tests

9be453c

rasbt added 2 commits June 23, 2019 11:26

add documentation

283aa77

Merge branch 'master' into fpmax

85816aa

rasbt merged commit cbe17d7 into rasbt:master Jun 23, 2019

rasbt mentioned this pull request Jun 23, 2019

Performance Improvement of apriori function in frequent_patterns #549

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fpmax #553

Fpmax #553

harenbergsd commented Jun 18, 2019 •

edited by rasbt

Loading

coveralls commented Jun 19, 2019 •

edited

Loading

rasbt Jun 19, 2019

harenbergsd Jun 19, 2019

rasbt Jun 19, 2019

rasbt commented Jun 19, 2019

harenbergsd commented Jun 19, 2019

harenbergsd commented Jun 19, 2019

rasbt commented Jun 19, 2019

rasbt commented Jun 21, 2019

harenbergsd commented Jun 21, 2019

rasbt commented Jun 23, 2019

Fpmax #553

Fpmax #553

Conversation

harenbergsd commented Jun 18, 2019 • edited by rasbt Loading

Description

Related issues or pull requests

Pull Request Checklist

coveralls commented Jun 19, 2019 • edited Loading

rasbt Jun 19, 2019

Choose a reason for hiding this comment

harenbergsd Jun 19, 2019

Choose a reason for hiding this comment

rasbt Jun 19, 2019

Choose a reason for hiding this comment

rasbt commented Jun 19, 2019

harenbergsd commented Jun 19, 2019

harenbergsd commented Jun 19, 2019

rasbt commented Jun 19, 2019

rasbt commented Jun 21, 2019

harenbergsd commented Jun 21, 2019

rasbt commented Jun 23, 2019

harenbergsd commented Jun 18, 2019 •

edited by rasbt

Loading

coveralls commented Jun 19, 2019 •

edited

Loading