Skip to content

gaborbarna/spark_dbscan

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark DBSCAN is an implementation of the DBSCAN clustering algorithm on top of Apache Spark . It also includes 2 simple tools which will help you choose parameters of the DBSCAN algorithm.

Clusters identified by the DBSCAN algorithm

This software is EXPERIMENTAL , it supports only Euclidean and Manhattan distance measures ( why? ) and it is not well optimized yet. I tested it only on small datasets (millions of records with 2 features in each record).

You can use Spark DBSCAN as a standalone application which you can submit to a Spark cluster ( Learn how ). Alternatively, you can include it into your own app - its API is documented and easy to use ( Learn how ).

Learn more about:

Performance

Performance chart

Contact me

Any questions, comments, suggestions as well as criticism are welcome! You can contact me by:

About

DBSCAN clustering algorithm on top of Apache Spark

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • JavaScript 55.3%
  • Scala 38.7%
  • CSS 5.2%
  • R 0.8%