Clustering Data Analysis
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
lib
spec
.travis.yml
Gemfile
Gemfile.lock
LICENSE.txt
README.md
Rakefile
cluda.gemspec

README.md

Build Status

CluDA

The aim of CLuDA is to group the data points into clusters such that similar items are lumped together in the same cluster, using different classification supervised or unsupervised learning techniques.

#Installation

gem install cluda

#Usage

In the current version it only exist Kmeans as a clustering algorithm, but in future updates the idea is to have several options to choose for clustering.

CluDA is prepared to use any clustering algorithm that is implemented within it and call the method 'classify' to get the output. Classify is has 2 mandatory parameters and 2 optionals:

Cluda::X.classify( list, k: K, distance_method: DISTANCE, max_iterations: MAX )

Mandatory:

  • list => List of points that you wish to classify

Optional:

  • k => Number of clusters. 1 (default)
  • centroids => If you wish to work with specific initial centroids
  • distance_method => Should be a string in lowercase and can be: * 'euclidean' (default) * 'manhattan' * 'chebyshev'
  • be_smart => In case is necessary CluDA will create new centroids to the set passed as parameter. False (default)
  • margin_distance_percentage => In case using Smart Clustering be careful with the distances for the centroids. Cluda will create as many centroids as it sees from the data. This parameter is a way to control the number of clusters. Should be a number between 0 and 1. 0 (default)
  • max_iterations => Natural > 0 for local minimums. 50 (default)

The output will always be an hash with the centroids and the points clustered to the corresponding centroid.

##KMeans

Anytime that you want to use it, simply follow Cluda by the 'Kmeans' class. Showed in the example above:

  require 'cluda'
  ...
  points = [ { x: 1, y: 1}, { x: 2, y: 1}, { x: 1, y: 2}, { x: 2, y: 2}, { x: 4, y: 6}, { x: 5, y: 7}, { x: 5, y: 6}, { x: 5, y: 5}, { x: 6, y: 6}, { x: 6, y: 5}]
  Cluda::Kmeans.classify( points, k: 1)
  ...

Output

=> {{:x=>4, :y=>5}=>
  [{:x=>1, :y=>1},
   {:x=>2, :y=>1},
   {:x=>1, :y=>2},
   {:x=>2, :y=>2},
   {:x=>4, :y=>6},
   {:x=>5, :y=>7},
   {:x=>5, :y=>6},
   {:x=>5, :y=>5},
   {:x=>6, :y=>6},
   {:x=>6, :y=>5}]}

Other examples followed by the outputs:

  Cluda::Kmeans.classify( points, k: 2, distance_method: 'euclidean' )

Output

=> {{:x=>1, :y=>1}=>
  [{:x=>1, :y=>1}, {:x=>2, :y=>1}, {:x=>1, :y=>2}, {:x=>2, :y=>2}],
   {:x=>5, :y=>6}=>
  [{:x=>4, :y=>6},
   {:x=>5, :y=>7},
   {:x=>5, :y=>6},
   {:x=>5, :y=>5},
   {:x=>6, :y=>6},
   {:x=>6, :y=>5}]}

  Cluda::Kmeans.classify( points, k: 2, distance_method: 'manhattan' )

Output

=> {{:x=>5, :y=>6}=>
  [{:x=>4, :y=>6},
   {:x=>5, :y=>7},
   {:x=>5, :y=>6},
   {:x=>5, :y=>5},
   {:x=>6, :y=>6},
   {:x=>6, :y=>5}],
   {:x=>1, :y=>1}=>
  [{:x=>1, :y=>1}, {:x=>2, :y=>1}, {:x=>1, :y=>2}, {:x=>2, :y=>2}]}

  Cluda::Kmeans.classify( points, k: 2, distance_method: 'chebyshev' )

Output

=> {{:x=>1, :y=>1}=>
  [{:x=>1, :y=>1}, {:x=>2, :y=>1}, {:x=>1, :y=>2}, {:x=>2, :y=>2}],
   {:x=>5, :y=>6}=>
  [{:x=>4, :y=>6},
   {:x=>5, :y=>7},
   {:x=>5, :y=>6},
   {:x=>5, :y=>5},
   {:x=>6, :y=>6},
   {:x=>6, :y=>5}]}