# My First Classifier

Multi Class Classification is the task of assigning an input vector to a particular class based on some measure of similarity to other members of that class.

One of the simplest classifiers is a Minimum Distance Classifier and it is still an effective one. Given a two or more classes represented by a single class vector, we assign an input vector to itsclosest class.

Geometrically, if we think about a scatterplot of two of our features from the Wine dataset {Alchohol, ODxxx} we get a scatter plot:

In [3]:
var {loadUnlabelledWine, grid2} = require('./utils')
var Plot = require('plotly-notebook-js');

var {features, dataset} = loadUnlabelledWine();

var ALCOHOL = 0;
var ODxxx = 11;

var x = dataset.map(d => d[ALCOHOL]).map(f => parseFloat(f));
var y = dataset.map(d => d[ODxxx]).map(f => parseFloat(f));
var trace = { x, y, mode: 'markers', marker: { size: 8, }, type: 'scatter' };
var layout = { xaxis: { title: features[ALCOHOL] }, yaxis: { title: features[ODxxx] }, width: 700, height: 700 };

$$html$$ = Plot.createPlot([trace], layout).render()

### Manually Decide on Class Centroids

In order to classify we want to assign all of our points in the scatter plot to one of say three classes which I am going to pick manually. We are trying to pick the centroid or central point of the class.

Pick you own classes, I am picking:





In [9]:
var classes = [
    [12, 3.1],
    [13.75, 3.1],
    [13, 1.75]
];

var classTraces = classes.map(
    (c, idx) => ({ x: [c[0]], y: [c[1]], mode: 'markers', marker: { size: 14 }, type: 'scatter' }));

$$html$$ = Plot.createPlot([trace, ...classTraces], layout).render()

## Build a classifier

Let's build a simple "minimum distance" classifer using the euclidean distance. In this classifier, points are assigned to a class based on their distance from a single central point representing the class.

![minimum distance classification](images/minimum_distance.png)


### Goals

1. Load a the Wine dataset - DONE
1. Select Clase Centers - DONE
1. Select / extract 2D feature vectors from it. I'm using {Alcohol, ODxxx} attributes, but you might like a different 2 better [dataset summary is here](./0.3_hello_datasets.ipynb)
1. Build a classifier function `L = classify(classes, I)` where I (input) is a Nx2 array of arrays, L (labels) is a length N array and C (classes) is a 3x2 array of arrays
    1. For each input vector, I
        1. Compute the euclidean distance between I and each class vector
        1. Select the class with the minimum distance
        1. Store the label (e.g. an index 1, 2, 3) of the selected class
    1. Return the the list of labels L
1. Plot the I scatter plot and color the points depending in label.

In [31]:
var ml = require('ml-distance');

// Compute Distance, find minimun and assign label

// get the features I am interested in
var data = dataset.map(d => ([d[ALCOHOL], d[ODxxx]]));


// for each feature vector
function classify(data) {
    return data.map(row => {

        // for each class
        var dists = classes.map(c => {
            return ml.distance.euclidean(c, row);
        })

        var mindist = dists.reduce((a,d) => {
            return Math.min(a, d)
        }, 99999)

        var foundClass = -1;
        for (let i = 0; i < 3; i++) {
            if (mindist === dists[i]) {
                foundClass = i; 
                break;
            }
        }

        return [...row, foundClass+1];
    })
    
}

var results = classify(data)

'use strict'

In [26]:
// Plot the output
var classes = [
    [12, 3.1],
    [13.75, 3.1],
    [13, 1.75]
];
var color = results.map(d => d[2]);
var trace2 = { x, y, mode: 'markers', marker: { color, size: 8, }, type: 'scatter' };

var classTraces = classes.map(
    (c, idx) => ({ x: [c[0]], y: [c[1]], mode: 'markers', marker: { size: 14 }, type: 'scatter' }));

$$html$$ = Plot.createPlot([trace2, ...classTraces], layout).render()

### Plot the decision space

Repeat this prcess but instead of classifying the input data, instead create a coordinates matrix over the valid range of your feature values and create a labels matrix.

Hint: Check back to how we generated the distance function plots [distances_similarity_and_cost](./1.1_distances_similarity_and_cost.ipynb)

Plot that using a plotly heatmap!

(Note: to plot that you'll need to assign numerica values in your labels array rather than characters. e.g. [A,B,C] =[1,2,3] 

In [37]:
var X = [];
var step = 0.01;

// build a 2d array of coordinates
for (var i = 1.5; i <= 4; i += step) {
    var x = [];
    for (var j = 11; j <= 15; j += step) {
        x.push([i, j])
    }
    X.push(x)
}


var decision = classify(X);

var data = {
    z: decision[0],
    type: "heatmap"
}

var layout = {
    title: "decision space",
    width: 700,
    height: 700
}

$$html$$ = Plot.createPlot([data], layout).render()


#### Stretch Stretch Goal
If you are feeling really lucky, plot the feature vectors as a scatter plot on top of the heatmap.

# Decision Space Plots in Scikit Learn

![decision space plots](images/scikit-learn_decision-spaces.png)