# Cluster Analysis

This notebook demonstrates cluster analysis using KMeans Clustering, an unsupervised machine technique to find clusters among data.

The training loop is the following:

1. Select the number of clusters "K".
2. Choose K random points or centroids as starting points.
3. Run: 
   1. For each data point, match to the nearest centroid (be default, the minimum Euclidean distance is used here.).
   2. For each cluster formed, take data points of one cluster and compute their average and then move the centroids of that cluster to that location.
   3. Iteratively repeat until centroids stop moving.


Example Adapted from: https://learn.microsoft.com/en-us/dotnet/api/microsoft.ml.kmeansclusteringextensions.kmeans?view=ml-dotnet

## Install Dependencies

In [11]:
#r "nuget: Newtonsoft.Json"
#r "nuget: XPlot.Plotly"
#r "nuget: XPlot.Plotly.Interactive"
#r "nuget: BenchmarkDotNet"
#r "nuget: Microsoft.ML"

using Newtonsoft.Json;
using XPlot.Plotly;
using System.IO;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Trainers;

## Helper Methods and Class Definitions

In [9]:
 IEnumerable<DataPoint> GenerateRandomDataPoints(int count, int seed = 0)
{
    var random = new Random(seed);
    float randomFloat() => (float)random.NextDouble();
    for (int i = 0; i < count; i++)
    {
        int label = i < count / 2 ? 0 : 1;
        yield return new DataPoint
        {
            Label = (uint)label,
            // Create random features with two clusters.
            // The first half has feature values centered around 0.6, while
            // the second half has values centered around 0.4.
            Features = Enumerable.Repeat(label, 50)
                .Select(index => label == 0 ? randomFloat() + 0.1f :
                    randomFloat() - 0.1f).ToArray()
        };
    }
}

// Example with label and 50 feature values. A data set is a collection of
// such examples.
public sealed class DataPoint
{
    // The label is not used during training, just for comparison with the
    // predicted label.
    [KeyType(2)]
    public uint Label { get; set; }

    [VectorType(50)]
    public float[] Features { get; set; }
}

public sealed class Prediction
{
    // Original label (not used during training, just for comparison).
    public uint Label { get; set; }
    // Predicted label from the trainer.
    public uint PredictedLabel { get; set; }
}

## Setting and Training The Model

In [32]:
const int SEED = 1234; // for reproducibility.
var mlContext = new MLContext(seed: SEED);
IEnumerable<DataPoint> dataPoints = GenerateRandomDataPoints(count: 1000, seed: SEED);

// Convert the list of data points to an IDataView object, which is
// consumable by ML.NET API.
IDataView trainingData = mlContext.Data.LoadFromEnumerable(dataPoints);

 // Define trainer options.
var options = new KMeansTrainer.Options
{
    NumberOfClusters = 2,
    OptimizationTolerance = 1e-6f,
    NumberOfThreads = 1
};

// Define the trainer.
var pipeline = mlContext.Clustering.Trainers.KMeans(options);

// Train the model.
var model = pipeline.Fit(trainingData);

## Plotting the Clusters

In [41]:
var modelParameters = model.Model;
VBuffer<float>[] centroids = default;
modelParameters.GetClusterCentroids(ref centroids, out int k);

var centroid = centroids[0];

var layout = new Layout.Layout
{
    title = "Clusters Based on KMeans",
    xaxis = new Xaxis { title = "Value"},
    yaxis = new Yaxis { title = "Position"},
};

List<Scatter> scatters = new();
List<string> colors = new() { "red", "blue", "green"};


for (int centroidIdx = 0; centroidIdx < centroids.Length; centroidIdx++)
{
    var scatter = new Scatter
    {
        x = centroids[centroidIdx].Items().Select(i => i.Key),
        y = centroids[centroidIdx].Items().Select(i => i.Value),
        marker = new Marker { color = colors[centroidIdx] },
        mode = "markers",
        name = $"Centroid: {centroidIdx}"
    };
    
    scatters.Add(scatter);
}

Chart.Plot(scatters, layout)

In [10]:
#!about

0,1
,.NET Interactive© 2020 Microsoft CorporationVersion: 1.0.350406+612aa40cba7d6a1f734272f71657a65561394752Library version: 1.0.0-beta.22504.6+612aa40cba7d6a1f734272f71657a65561394752Build date: 2022-11-01T18:44:16.3455575Zhttps://github.com/dotnet/interactive
