New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UpSet plot fails for certain number of entries in input #193
Comments
I am able to replicate it when using the generate_samples function:
@jnothman is this a limitation of how many categories/sets you can plot? using n_sets=40 in the above code works. I'm working on data sets where there could be 150+ memberships per element. E.x. you have 10 people each with 150 different characteristics, what is the overlap amongst the 10 people. |
I can't reproduce with your snippet, @RockyCanyon, but I can with @mumichae's. I've got a fix in #202. The issue here was one of integer overflow. We create a decimal representation of the binary sequence indicated by the category membership masks. For over 63 categories we start going into negative number territory. In the case of temp2_big.csv`, the last several 64 columns are all false. So repeatedly multiplying an already overflowed number by 2 and adding 0 resulted in the integer 0 for both rows of the dataset. Thus we crashed on duplicate values. |
For some datasets with a large number of sets I get the following error.
temp2_big.csv
Error Traceback
I have no idea where the error could come from, given that I use
pandas.value_counts
which should give me a unique index. The way I constructed the value counts seems to work when I use a smaller set of samples, however.Unfortunately I couldn't reproduce the error on a random dataset, which I find more confusing.
The text was updated successfully, but these errors were encountered: