Skip to content

nmonath/scc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sub-Cluster Component Clustering Algorithm

This is a scipy /numpy / python implementation of SCC. For relatively sparse graph inputs, it should scale relatively easily to datasets of millions of nodes. This implementation assumes similarities are given.

There is an example use in demo.py. This demo shows:

upper = 1.0
lower = 0.1
num_rounds = 50
X = np.random.randn(100,5)
graph = graph_from_vectors(X, k=25, batch_size=5000)
taus = np.geomspace(start=upper, stop=lower, num=num_rounds)

scc = SCC(graph, num_rounds, taus)
scc.fit()

# How to inspect this? 
# this gives the things stored in the 3rd round of the alg.  (0 based)
scc.rounds[3].__dict__

# the cluster assignment of the 18th point of the dataset. (0 based)
scc.rounds[3].cluster_assignments[18]

# the id of the parent in the next round of node 2 (0 based)
scc.rounds[3].parents[2]

Citation:

@article{scc2020arxiv,
  author    = {Nicholas Monath and
               Avinava Dubey and
               Guru Guruganesh and
               Manzil Zaheer and
               Amr Ahmed and
               Andrew McCallum and
               G{\"{o}}khan Mergen and
               Marc Najork and
               Mert Terzihan and
               Bryon Tjanaka and
               Yuan Wang and
               Yuchen Wu},
  title     = {Scalable Bottom-Up Hierarchical Clustering},
  journal   = {arXiv preprint, 2010.11821},
  year      = {2020}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages