Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FPMax issue with large datasets. #570

Closed
Sanjoth opened this issue Aug 2, 2019 · 11 comments · Fixed by #573
Closed

FPMax issue with large datasets. #570

Sanjoth opened this issue Aug 2, 2019 · 11 comments · Fixed by #573

Comments

@Sanjoth
Copy link

Sanjoth commented Aug 2, 2019

Masked Input Data: https://filebin.net/vno7ks0ilbdkmocz/list_itemset.json
Total Transactions: 49613

Params:
min_support=6.046802249410437e-05
Ohe transformed input data

c:\users\xuser\appdata\local\programs\python\python37\lib\site-packages\mlxtend\frequent_patterns\fpmax.py in fpmax_step(tree, minsup, mfit, colnames, max_len, verbose)
86 mfit.insert_itemset(largest_set)
87 if max_len is None or len(largest_set) <= max_len:
---> 88 support = min([tree.nodes[i][0].count for i in items])
89 yield support, largest_set
90

ValueError: min() arg is an empty sequence

@rasbt
Copy link
Owner

rasbt commented Aug 2, 2019

Do you have the code for parsing the json file into the transaction array by chance?

@Sanjoth
Copy link
Author

Sanjoth commented Aug 2, 2019

Sure.

import json

with open('list_itemset.json') as fp:
rs = json.load(fp)

@rasbt
Copy link
Owner

rasbt commented Aug 2, 2019

Hm, that's weird. Seems to work for me:

Screen Shot 2019-08-02 at 11 35 33 PM

I was running this via mlxtend 0.17.0, you can check via

import mlxtend 
mlxtend.__version__

@Sanjoth
Copy link
Author

Sanjoth commented Aug 3, 2019

Actually, the issue is with the fpmax function, fpgrowth seems to be working fine.

@rasbt
Copy link
Owner

rasbt commented Aug 3, 2019

Oh good point. It looks like an issue indeed. Regarding the potential fix, there should probably be a check for itemset[0] < len(self.nodes) in line 160:

        for basenode in self.nodes[itemset[0]]:

but I am CC'ing @harenbergsd, who implemented this function and may have more insights to why this function fails in this case.

@harenbergsd
Copy link
Contributor

Hmm yeah I will look at this. It may be a simple check, but I want to think and make sure this condition makes sense from the algorithm perspective and I haven't missed something.

@rasbt
Copy link
Owner

rasbt commented Aug 4, 2019

Thanks!

@rasbt rasbt closed this as completed in #573 Aug 6, 2019
rasbt pushed a commit that referenced this issue Aug 6, 2019
* fix fpmax issue (#570) with fptrees that contain no nodes

* Add additional unit test for pattern mining. Also refactored tests.

* update changelog

* bumb version to 0.18.0dev0

* add unit test for min_support=0.
@johnny123852
Copy link

Is this fixed?
I face the same issue:ValueError: min() arg is an empty sequence
My transactions:75730 with miniSupport = 0.01
mlxtend.version :'0.17.0'

@rasbt
Copy link
Owner

rasbt commented Nov 15, 2019

Which version of mlxtend are you using? If you could double-check and let us know what the following prints in your case, that would be very helpful:

>>> import mlxtend
>>> print(mlxtend.__version__)

@johnny123852
Copy link

Hi @rasbt thanks for reply me,
my mlxtend.version :0.17.0

@rasbt
Copy link
Owner

rasbt commented Nov 16, 2019

Thanks for the info. I think the change may not be in the latest release version yet. Please try to install the latest development version of mlxtend to see if this issue still persists. To install the latest development version, you can do

pip install git+git://github.com/rasbt/mlxtend.git

which will directly install the latest version from the master branch here on GitHub.

The version should then be

In [1]: import mlxtend                                                          

In [2]: mlxtend.__version__                                                     
Out[2]: '0.18.0dev0'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants