#### Objectives

By the end of this notebook you will know:

 - what is unsupervised learning
 - how to run keman clustering
 - hot to perform minimum distance classification
 - be better at plotting
 - compute confidence
 - compute a confusion matrix and derive fomr key metrics
 - plot and interpret a confusion matrix 
 - the importance and impact of initialization

In [None]:
// boring imports
var {loadUnlabelledWine, grid2} = require('./utils')
var Plot = require('plotly-notebook-js');
var table = require('text-table');

# Unsupervised Learning

![Unsupervised Learning slide](images/slide_unsupervised.png)


# K Means Clustering


![means algorihtm](images/slide_kmeans.png)

### Setup

Load our dataset and pick out the two features of interest, the ones we were looking at on the previous page

In [None]:
var {features, dataset} = loadUnlabelledWine({ verbose: true });

In [None]:
var input = dataset.map(d => [ d[0], d[11] ]);

### Clustering

Now run the algorithm, check out the docs [here](https://mljs.github.io/kmeans/)

In [None]:
var KMEANS = require('ml-kmeans');

var K = 3;

var options = {
    maxIterations: 100,
    tolerance: 1e-6,
    withIterations: false,
    // distanceFunction: () => {}, // can specify our own distance but may not converge
    initialization: 'random'
}

var ans = KMEANS(input, 3, options)

var {converged, clusters, centroids, iterations} = ans;

if (converged) {
    console.log("Converged after", iterations, "iterations")
}
else {
    console.log("Did not converge after", iterations, "iterations")
}

### Plot the results

We now have class labels in `ans.clusters` corresponding to each feature vector in our `input` array. Let's scatterplot the results but color these by class label.

#### TODO: now add tha class centroids to the plot with larger markers

In [None]:
var trace = { 
    x: input.map(d => d[0]),
    y: input.map(d => d[1]),
    mode: 'markers',
    marker: { 
        color: clusters, // <- here are our results
        size: 8,
        colorbar: {
            xpad: 100
        }
    },
    type: 'scatter'
};

var centroidsTraces = centroids.map(d => ({
    x: [d.centroid[0]], y: [d.centroid[1]],
    mode: 'markers',
    marker: { 
        size: 20,
        line: { width: 2, color: '#000' },
        opacity: 1
    },
    opacity: 0.3,
    type: 'scatter',
}));

var layout = { width: 800, height: 700, xaxis: { title: features[0] }, yaxis: { title: features[11] }};

$$html$$ = Plot.createPlot([trace, ...centroidsTraces], layout).render()

### Plot the feature space partitioning

Now let's look at the decision boundaries in the feature space. 

Kmeans already labelled our training set for us but if we need to determine the class of a datapoint that we have not seen yet when we need ot use a classifer with the `centroids` that kmeans gave us.

This is known as a `forward pass`. KMeans has used the training set to *learn* the k class centroids, but to *generaise* to data we have not seen before we need to run a classifier.

Luckily we just created one in the last notebook. Either copy the function definition in here or create a new `.js` file, and `require` that (if you do, remember to restart the kernel otherwise the notebook won't see the new file).


In [None]:
function myClassifier(centroids, inputs) {
    throw new Error("You need to implement this")
}

var g = grid2();
var H = myClassifier(centroids, input); 

var trace = {
    z: H,
    type: "heatmap"
}

var layout = {
    title: "damped circular wavefront",
    width: 700,
    height: 700
}

$$html$$ = Plot.createPlot([trace], layout).render()

 ### Time to Play around
 
 Time to try a few different things out and see the effect on the clustering and classification
 
 - try with different initialisations
   - mostDistant
   - use the class centers that we picked manually
 - try with different values of K, what happens? why?
 - try with fewer points, how does restricting the training set affect class positions?
 - (time allowing) try with a different dataset

### Confidence

So we have assigned all of the samples in our training set to one of K classes and we've used the centroids produced to configure a classifier, so we can classify wines that we've never seen before, we've genrelised!


But are all wines in a class equal? how can we measure confidence of membership in any one class? (hint: look back at out scatter plot)


##### TODO Think of a confidence measure, compute it and display a new scatter plot with marker sizes adjusted for confidence

So for each of our N entires in `input` and `clusters`, we'll want a new N entry list `confidence`.


In [None]:
// derive a confidence measure and compute it

In [None]:
// grabs the scatter plot code form above and customise it to some

### Accuracy

Ok, we have classified some data and calculated how confident we are with each of the classifications. 

But how do we know whether our classification is right? We'll in many applications of unsupervised learning, we don't know for sure as we typically use this type of technique when we don't know the expected outputs before hand. 

However, in this example dataset, we do have training labels available but we've just not loaded them. So let's reload those and measure the accuracy.

In [None]:
var {loadLabelledWine} = require('./utils')
var labelledDataset = loadLabelledWine({ verbose: true }).dataset;

Notice the additional (last) column containing the known class index. We need to note that they are coulding forom `1`, whilst our class labels are from `0`, we'll need to compensate for that.

### Confusion Matrix

With mljs we can use the confusion matrix package to compute a suite of different measures to determine the performance on our clustering and classifer.

Dig into [the docs](https://mljs.github.io/confusion-matrix/) and compute:

 - overall `accuracy`
 - the `F1 score` for each class label. 
 - the average `F! score`
 
These metrics are in the range [0,1] or [appaling, 100% match]. print them out on the console.

NB: with this library we can compute a full suite of diagnisoc measures, see the table [here](https://en.wikipedia.org/wiki/F1_score#Diagnostic_testing)

In [None]:
var ConfusionMatrix = require('ml-confusion-matrix');
var actuals = labelledDataset.map(d => d[13]-1);
var predicted = clusters.map(d => d);

var C = ConfusionMatrix.fromLabels(actuals, predicted)

// compute and print out some metrics here

So accuracy might not have been as good as we hoped for.

Why? let's dig deeper and look at the confusion matrix.

In [None]:
var M = C.getMatrix();

var trace = { 
    x: [0,1,2],
    y: [0,1,2],
    z: M,
    type: 'heatmap',
    showscale: false,
    colorscale:[[0, '#3D9970'], [100, '#001f3f']]
};

var annotations = [];

M.map((a,y) => {
    a.map((b,x) => {
        annotations.push(
            {
                x: x,
                y: y,
                text: M[y][x],
                font: {
                    family: 'Arial',
                    size: 12,
                    color: 'white'
                  },
                showarrow: false
            }
        )
    })
})

var layout = { 
    xaxis: { title: "predicted", side: 'top' },
    yaxis: { title: "actuals", nticks: 6, autosize: false, autorange: 'reversed' },
    annotations,
    width: 500, height: 500};

$$html$$ = Plot.createPlot([trace], layout).render();

We can also plot the original labels

In [None]:
var trace = { 
    x: input.map(d => d[0]),
    y: input.map(d => d[1]),
    mode: 'markers',
    marker: { 
        color: actuals, // <- here are the true labels
        size: 8,
        colorbar: {
            xpad: 100
        }
    },
    type: 'scatter'
};

var layout = { width: 800, height: 700, xaxis: { title: features[0] }, yaxis: { title: features[11] }};

$$html$$ = Plot.createPlot([trace, ...centroidsTraces], layout).render()

Depending on the attributes you chose, we can see some gross misclassifications here. In high accuracy cases the main diagonal contains the most weight.

So why do you thing this has done so poorly? (if it has)

Try running the whole notebook again (Cell > Run All), a few times and watch the score and the confusion matrix, does it change? 

Any idea what is happening?

Any idea how to fix it?

#### Further Reading

More sophisticated techniques can produce different

 - bayes learning and 2nd order bayes classifers
 - gaussian mixture modelling
 - measuring performance in undersuipervised learning