In [1]:
// boring imports
var {loadUnlabelledWine, grid2} = require('./utils')
var Plot = require('plotly-notebook-js');
var table = require('text-table');

'use strict'

### Accuracy

Ok, we have classified some data and calculated how confident we are with each of the classifications. 

But how do we know whether our classification is right? We'll in many applications of unsupervised learning, we don't know for sure as we typically use this type of technique when we don't know the expected outputs before hand. 

However, in this example dataset, we do have training labels available but we've just not loaded them. So let's reload those and measure the accuracy.

In [7]:
var {loadLabelledWine} = require('./utils')
var labelledDataset = loadLabelledWine({ verbose: true }).dataset;

our dataset has 178 rows and  14 columns
Alcohol | Malic Acid | Ash | Alcalinity of ash | Magnesium | Total phenols | Flavanoids | Nonflavanoid phenols | Proanthocyanins | Color intensity | Hue | OD280/OD315 of diluted wines | Proline | Class
---------------------------
14.23 | 1.71 | 2.43 | 15.6 | 127 | 2.8  | 3.06 | 0.28 | 2.29 | 5.64 | 1.04 | 3.92 | 1065 | 1
 13.2 | 1.78 | 2.14 | 11.2 | 100 | 2.65 | 2.76 | 0.26 | 1.28 | 4.38 | 1.05 | 3.4  | 1050 | 1
13.16 | 2.36 | 2.67 | 18.6 | 101 | 2.8  | 3.24 | 0.3  | 2.81 | 5.68 | 1.03 | 3.17 | 1185 | 1
14.37 | 1.95 | 2.5  | 16.8 | 113 | 3.85 | 3.49 | 0.24 | 2.18 | 7.8  | 0.86 | 3.45 | 1480 | 1
13.24 | 2.59 | 2.87 | 21   | 118 | 2.8  | 2.69 | 0.39 | 1.82 | 4.32 | 1.04 | 2.93 | 735  | 1


'use strict'

Notice the additional (last) column containing the known class index. We need to note that they are coulding forom `1`, whilst our class labels are from `0`, we'll need to compensate for that.

### Confusion Matrix

With mljs we can use the confusion matrix package to compute a suite of different measures to determine the performance on our clustering and classifer.

Dig into [the docs](https://mljs.github.io/confusion-matrix/) and compute:

 - overall `accuracy`
 - the `F1 score` for each class label. 
 - the average `F! score`
 
These metrics are in the range [0,1] or [appaling, 100% match]. print them out on the console.

NB: with this library we can compute a full suite of diagnisoc measures, see the table [here](https://en.wikipedia.org/wiki/F1_score#Diagnostic_testing)

In [8]:
var ConfusionMatrix = require('ml-confusion-matrix');
var actuals = labelledDataset.map(d => d[13]-1);
var predicted = clusters.map(d => d);

var C = ConfusionMatrix.fromLabels(actuals, predicted)

// compute and print out some metrics here

'use strict'

So accuracy might not have been as good as we hoped for.

Why? let's dig deeper and look at the confusion matrix.

In [9]:
var M = C.getMatrix();

var trace = { 
    x: [0,1,2],
    y: [0,1,2],
    z: M,
    type: 'heatmap',
    showscale: false,
    colorscale:[[0, '#3D9970'], [100, '#001f3f']]
};

var annotations = [];

M.map((a,y) => {
    a.map((b,x) => {
        annotations.push(
            {
                x: x,
                y: y,
                text: M[y][x],
                font: {
                    family: 'Arial',
                    size: 12,
                    color: 'white'
                  },
                showarrow: false
            }
        )
    })
})

var layout = { 
    xaxis: { title: "predicted", side: 'top' },
    yaxis: { title: "actuals", nticks: 6, autosize: false, autorange: 'reversed' },
    annotations,
    width: 500, height: 500};

$$html$$ = Plot.createPlot([trace], layout).render();

In [10]:
console.log("Accuracy", C.getAccuracy())
console.log("F1 Class 1", C.getF1Score(0))
console.log("F1 Class 2", C.getF1Score(1))
console.log("F1 Class 3", C.getF1Score(2))


Accuracy 0.29775280898876405
F1 Class 1 0
F1 Class 2 0.08823529411764706
F1 Class 3 0.8867924528301887


We can also plot the original labels

In [11]:
var trace = { 
    x: input.map(d => d[0]),
    y: input.map(d => d[1]),
    mode: 'markers',
    marker: { 
        color: actuals, // <- here are the true labels
        size: 8,
        colorbar: {
            xpad: 100
        }
    },
    type: 'scatter'
};

var layout = { width: 800, height: 700, xaxis: { title: features[0] }, yaxis: { title: features[11] }};

$$html$$ = Plot.createPlot([trace, ...centroidsTraces], layout).render()

Depending on the attributes you chose, we can see some gross misclassifications here. In high accuracy cases the main diagonal contains the most weight.

So why do you thing this has done so poorly? (if it has)

Try running the whole notebook again (Cell > Run All), a few times and watch the score and the confusion matrix, does it change? 

Any idea what is happening?

Any idea how to fix it?

#### Hey, What about preprocessing and normalisation?

We've gone ahead and worked on the raw values here. Which is working, for now, but:

 - is it the best thing to do?
 - Do you think normalisation would affec our result might in this case?
 - why?

#### Further Reading

More sophisticated techniques can produce different

 - bayes learning and 2nd order bayes classifers
 - gaussian mixture modelling
 - measuring performance in undersuipervised learning