Skip to content

nitinsaroha/Spectral-Clustering-on-Apache-Spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Spectral Clustering

Implementation of Spectral Clustering in Apache Spark.

Dataset used is synthetic data, generated on-the-fly using random number generators (specifically, the scikit-learn samples generators); they don’t represent any “real” data

Used Matplot Library for plotting the clusters

How to Run

$ spark-submit ~/absolute/path/to/the/directory/spectral_clustering.py 3 10 1.0 a5_data/blobs.txt

  • Argument 1 is the number of clusters
  • Argument 2 is the upper bound
  • Argument 3 is the value of gamma

Note:- spark-submit should be in path

About

Spark’s built- in power iteration clustering (PIC) to simulate an approximate variant of spectral clustering.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages