In [1]:
// boring imports
var Plot = require('plotly-notebook-js');
var {loadLabelledWine, plotClustersWithLabels} = require('./utils');

'use strict'

# Supervised Learning


![Supervised learning](images/slide_supervised.png)

### Techniques availble

 - **KNN - K Nearest Neighbors**
 - SVM - Scalar Vector Machines
 - Naive Bayes
 - Partial Least Squares [regression]



In [2]:
var {features, dataset} = loadLabelledWine({verbose: true});

our dataset has 178 rows and  14 columns
Alcohol | Malic Acid | Ash | Alcalinity of ash | Magnesium | Total phenols | Flavanoids | Nonflavanoid phenols | Proanthocyanins | Color intensity | Hue | OD280/OD315 of diluted wines | Proline | Class
---------------------------
14.23 | 1.71 | 2.43 | 15.6 | 127 | 2.8  | 3.06 | 0.28 | 2.29 | 5.64 | 1.04 | 3.92 | 1065 | 1
 13.2 | 1.78 | 2.14 | 11.2 | 100 | 2.65 | 2.76 | 0.26 | 1.28 | 4.38 | 1.05 | 3.4  | 1050 | 1
13.16 | 2.36 | 2.67 | 18.6 | 101 | 2.8  | 3.24 | 0.3  | 2.81 | 5.68 | 1.03 | 3.17 | 1185 | 1
14.37 | 1.95 | 2.5  | 16.8 | 113 | 3.85 | 3.49 | 0.24 | 2.18 | 7.8  | 0.86 | 3.45 | 1480 | 1
13.24 | 2.59 | 2.87 | 21   | 118 | 2.8  | 2.69 | 0.39 | 1.82 | 4.32 | 1.04 | 2.93 | 735  | 1


## K Nearest Neighbors


##### KNN in mljs

 - docs for the KNN module are [here](https://mljs.github.io/knn/)
 - docs for the various kernel options are [here](https://github.com/mljs/kernel)

Make KNN with inputs and labels.

In [3]:
var KNN = require('ml-knn');

var inputs = dataset.map(d => [d[0], d[10]]);
var labels = dataset.map(d => d[13]); // needs label domain of {-1,1} rather then our labels of [0,1,2]

var options = {
  k: 3
};

var knn = new KNN(inputs, labels, options);

'use strict'

Check the some slices from labels and predicted values by using `knn.predict(inputs)`.

### Make some predictions and show some results

Let's plot the original dataset including it's known labels and our predictions

In [5]:
var predictions = inputs.map(input => knn.predict(input))

In [6]:
$$html$$ = plotClustersWithLabels(inputs.map(d => d[0]), inputs.map(d => d[1]), labels, "Actual Labels");

In [7]:
$$html$$ = plotClustersWithLabels(inputs.map(d => d[0]), inputs.map(d => d[1]), predictions, "Predicted Labels");

### Measure Accuracy

Use the same confusion matrix approach as earlier to compute accuracy and f1-scores

In [9]:
var ConfusionMatrix = require('ml-confusion-matrix');

var actuals = labels;
var predicted = inputs.map(i => knn.predict(i))

var C = ConfusionMatrix.fromLabels(actuals, predicted)

var M = C.getMatrix();
var trace = { 
    x: [0,1,2],
    y: [0,1,2],
    z: M,
    type: 'heatmap',
    showscale: false,
    colorscale:[[0, '#3D9970'], [100, '#001f3f']]
};

console.log("Accuracy", C.getAccuracy())
console.log("F1 Class 1", C.getF1Score(1))
console.log("F1 Class 2", C.getF1Score(2))
console.log("F1 Class 3", C.getF1Score(3))

// everything below here is just plotting code, get it in a utility fn!
var annotations = [];
M.map((a,y) => {
    a.map((b,x) => {
        annotations.push(
            {
                x: x,
                y: y,
                text: M[y][x],
                font: {
                    family: 'Arial',
                    size: 12,
                    color: 'white'
                  },
                showarrow: false
            }
        )
    })
})

var layout = { 
    xaxis: { title: "predicted", side: 'top' },
    yaxis: { title: "actuals", nticks: 6, autosize: false, autorange: 'reversed' },
    annotations,
    width: 500, height: 500};

$$html$$ = Plot.createPlot([trace], layout).render();

Accuracy 0.9269662921348315
F1 Class 1 0.9193548387096774
F1 Class 2 0.9264705882352942
F1 Class 3 0.9375


#### Discussion

 - How does you accuracy compare with our unsupervised approaches?
 - Is this too good to be tru and should you be suspicous?
 - Can you think of how to get a better accuracy measurement?