Sub-Cluster Component Clustering Algorithm

This is a scipy /numpy / python implementation of SCC. For relatively sparse graph inputs, it should scale relatively easily to datasets of millions of nodes. This implementation assumes similarities are given.

There is an example use in demo.py. This demo shows:

upper = 1.0
lower = 0.1
num_rounds = 50
X = np.random.randn(100,5)
graph = graph_from_vectors(X, k=25, batch_size=5000)
taus = np.geomspace(start=upper, stop=lower, num=num_rounds)

scc = SCC(graph, num_rounds, taus)
scc.fit()

# How to inspect this? 
# this gives the things stored in the 3rd round of the alg.  (0 based)
scc.rounds[3].__dict__

# the cluster assignment of the 18th point of the dataset. (0 based)
scc.rounds[3].cluster_assignments[18]

# the id of the parent in the next round of node 2 (0 based)
scc.rounds[3].parents[2]

Citation:

@article{scc2020arxiv,
  author    = {Nicholas Monath and
               Avinava Dubey and
               Guru Guruganesh and
               Manzil Zaheer and
               Amr Ahmed and
               Andrew McCallum and
               G{\"{o}}khan Mergen and
               Marc Najork and
               Mert Terzihan and
               Bryon Tjanaka and
               Yuan Wang and
               Yuchen Wu},
  title     = {Scalable Bottom-Up Hierarchical Clustering},
  journal   = {arXiv preprint, 2010.11821},
  year      = {2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
examples		examples
scc		scc
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sub-Cluster Component Clustering Algorithm

About

Releases

Packages

Languages

License

nmonath/scc

Folders and files

Latest commit

History

Repository files navigation

Sub-Cluster Component Clustering Algorithm

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages