Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inverse_X issue and cluster select. #68

Closed
WolVesz opened this issue Mar 12, 2018 · 6 comments
Closed

Inverse_X issue and cluster select. #68

WolVesz opened this issue Mar 12, 2018 · 6 comments

Comments

@WolVesz
Copy link

WolVesz commented Mar 12, 2018

Hello All,

Really like this project, better than all the other mapper tools on python right now imo.

However, I am consistently having a problem where whenever I include inverse_X in the mapping, the mapping returns 0 nodes and 0 edges. I have tried this with a number of different data sets without success. I have done this on data sets with 300D and 200,000 rows and 200D, 100D, and all your sample sets with the same result. Its possible I am just ignorant, but I believe that all of these data sets should work with Inverse_X given it is just projecting along the original data. There are no errors produced for this problem. If inverse_X is not included, the mapper works as expected.

On another note, I am going to start working on a right click selection tool for displaying contents of nodes as i need this for my research. No idea how successful i'll be. Ill let you know.

@sauln
Copy link
Member

sauln commented Mar 12, 2018

Hi!

Thanks for the feedback. I look forward to seeing how the right click tool works out.

I'm not sure what you mean by include inverse_X in the mapping? Are you including it as the second argument of the map method? Or by not including a second argument? If you want to build mapper on your data without using a lens or filter function, it should work just fine

Here is how you would do that with the cat example.

data = np.genfromtxt('cat-reference.csv',delimiter=',')
graph = mapper.map(data,
                   clusterer=sklearn.cluster.DBSCAN(eps=0.1, min_samples=5),
                   coverer=km.Cover(nr_cubes=15, overlap_perc=0.2))

Is this what you're looking for?

@WolVesz
Copy link
Author

WolVesz commented Mar 13, 2018

image
image
I am referring to the first case of the above photo. If i had just passed the projected_data, it would have worked just fine (shown in second photo). The inclusion of the Inverse_X function causes 0 edges and 0 nodes to be developed (shown in first photo).

@sauln
Copy link
Member

sauln commented Mar 13, 2018

I suspect the parameters for DBSCAN are too small when it's clustering in the full 161 dimensions. For example, if I lower eps in the cat example, I also get 0 nodes and 0 edges:

>>> graph = mapper.map(lens,
                   inverse_X=data,
                   clusterer=sklearn.cluster.DBSCAN(eps=0.0001, min_samples=5),
                   coverer=km.Cover(nr_cubes=15, overlap_perc=0.2))
Created 0 edges and 0 nodes in 0:00:00.052353.

@MLWave
Copy link
Member

MLWave commented Mar 14, 2018

It can also be the min_samples = 5, because with 15.625 hypercubes it may be not any have 5 or more samples in them (you can see this by setting verbose=2).

If you want to force nodes and links, without worrying about eps, try clustering with sklearn.cluster.KMeans(2) or probably better sklearn.cluster.AgglomerativeClustering(2). hdbscan.HBSCAN() also does not require guessing the correct eps.

@sauln
Copy link
Member

sauln commented Apr 11, 2018

@WolVesz were you able to get this working?

@sauln
Copy link
Member

sauln commented Apr 29, 2018

I assume the issue has been cleared up. If not, please reopen and we will figure out what's wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants