Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation Enhancement Request: Example of Spectral Clustering for a graph #9481

Closed
alexlenail opened this issue Aug 2, 2017 · 5 comments

Comments

@alexlenail
Copy link
Contributor

Spectral Clustering is one of the primary ways of clustering graphs. It would very helpful to document a recipe for using this code to cluster graphs.

@jnothman
Copy link
Member

jnothman commented Aug 2, 2017 via email

@alexlenail
Copy link
Contributor Author

The "Iris Dataset" in the graph theory world is called the Karate Club Dataset. It's quite small, and the goal should be to predict two clusters. Libraries like networkx include this graph as a method.

@alexlenail
Copy link
Contributor Author

@jnothman bump

@alexlenail
Copy link
Contributor Author

@jnothman I would love to do this myself, but that's just it -- I'm don't know how to use spectral clustering from this library for graphs. If you give me the general gist, i can experiment and figure it out myself perhaps, and then try to write this documentation. Let me know.

@sschnug
Copy link

sschnug commented Sep 17, 2017

(Coming from the StackOverflow-question by the author)

I tried to approach the karate-club task with Spectral-Clustering with minimal knowledge and only using sklearn's docs and some definition of Normalized Graph Cuts (to see if that's what we want; yes). See above link for more information.

Tuning assign_labels, this gives a nearly perfect result.

If someone thinks about adding an example, one probably has to think if this dataset should be somehow included (as i'm taking it from networkx).

import numpy as np
import networkx as nx
from sklearn.cluster import SpectralClustering
from sklearn import metrics
np.random.seed(1)

G = nx.karate_club_graph()
gt_dict = nx.get_node_attributes(G, 'club')
gt = [gt_dict[i] for i in G.nodes()]
gt = np.array([0 if i == 'Mr. Hi' else 1 for i in gt])
adj_mat = nx.to_numpy_matrix(G)
print('ground truth')
print(gt)

sc = SpectralClustering(2, affinity='precomputed', n_init=100, assign_labels='discretize')
sc.fit(adj_mat)
print('spectral clustering')
print(sc.labels_)
print('just for better-visualization: invert clusters (permutation)')
print(np.abs(sc.labels_ - 1))

print(metrics.adjusted_rand_score(gt, sc.labels_))
print(metrics.adjusted_mutual_info_score(gt, sc.labels_))

Output:

ground truth
[0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1]
spectral clustering
[0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1]
just for better-visualization: invert clusters (permutation)
[1 1 0 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0]
0.771725032425
0.722546051351

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants