## Classification using SVMs


In this notebook we'll play around with how Machine learning models can perform classification tasks. In particular we'll explore SVM's and in particular how biased datasets can lead to different modelling situations. As with the regression module we'll employ K-Fold cross-validation to ensure that our results generalize well. We'll also look into evaluation methods for classification models such as sensitivity, specificity and receiver operating characteristic curves (and area under the curve). 

## Creating your dataset

Bivariate SVM

First step as usual when working with new datasets is to perform some visualization. While we won't have the luxury to do this with high dimensional data which is probably most contexts in which classification is performed, playing with a low-dimensional case is good for building intuition:

As you can see from the data there's some separation between both classes of the data. Our goal is to train a Support Vector Machine classifier to model the separation between the classes. As with most machine learning tools, <code>sklearn</code> also has a support vector machine classifier:

First let's fit the model on the entire dataset just to get a quick hands-on peek on what you can do after fitting an SVM model:

Now we can inspect from properties of this model to get a better idea about how it performed on our full dataset. First we'll visualize the dividing line generated by this model:

First, note that since this particular SVM is a linear model, fitting the model results in a linear model much like linear regression. The only difference being is that this line is designed to cut across two classes rather than to  minimize the mean squared error as we did with linear regression:

Evaluation of the model requires us to write out the equation of the plane decided by the svm model and re-arrange the equation to solve for $x_1$ or $x_2$ (both are equivalent):

$$ax_1 + bx_2 + c = 0$$
$$x_2 = \frac{-ax_1 - c}{b}$$

Now that we've compute our linear boundary let's visualize what it looks like!

Furthermore, we can visualize which vectors were used as support vectors as well!

We can also plot the margins used in the SVM model as well. The margins of the SVM are described by the following equation:

For the top margin:
$$ ax_1 + bx_2 + c = 1 $$

For the bottom margin:
$$ ax_1 + bx_2 + c = -1 $$

Re-arranging the equations to solve for $x_2$ as usual (for the top margin):

$$x_2 = \frac{1- ax_1 - c}{b}$$

Now we can visualize the full SVM result!

This visualization will becoming increasingly useful as we start thinking about regularization!

Finally, recall that SVM's judge a data-point's class by calculating a score that is dependent on the distance of the point from the hyperplane. If we feed in a new point to the model we can evaluate the "decision function" which is the score given to data point. The SVM model then applies a threshold (which we can modify to tradeoff sensitivity vs specificity) to assign a class. Let's feed the model a point and see what score it gives it!

We can see that the SVM model assigns positive values to class 1 and negative values to class 0

Now let's do some things more properly and use holdout cross validation to evaluate our model then compute some important metrics such as **accuracy**, **sensitivity**, **specificity**, **Reciever Operating Characteristics**, and **Area under the curve**

***

## Computing Classification Metrics

First let's do some (more) responsible model training. First let's split our dataset into train and test, train our model on the training set then start evaluating some metrics to get an idea of how well our model handles cases. Let's move onto a higher dimensional space that is slightly harder to deal with:

Now that we've fit our model, we can start to calculate classification metrics on the test dataset. *Only metrics calculated on the test dataset are useful towards evaluating the expected performance of your model on unseen data!*. 

An easy way to generate these probabilities is to predict the classes in the test case, then use <code>sklearn.metrics.confusion_matrix</code> to generate our 2x2 table

Recall that the confusion matrix assesses in a table:

<table>
    <tr>
        <td> True Negatives </td>
        <td> False Positives </td>
    </tr>
    <tr>
        <td> False Negative </td>
        <td> True Positives </td>
</table>



### Exercise:
Using the confusion matrix table calculate:

1. Accuracy on test set
2. Specificity on test set
3. Sensitivity on test set

### Solution:

The following equations are used:

$$\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}$$


$$\text{Specificity} = \frac{TN}{TN + FP}$$


$$\text{Sensitivity} = \frac{TP}{TP + FN}$$

Looks like our model did pretty well! The final step is to explore the sensitivity/specificity trade-off and to plot the ROC curve. In order to explore the ROC curve we must first generate scores for each data point in the test set. Changing the threshold at which we classify data as being at class 0 or class 1 will yield the ROC curve:

As you can see SVMs perform quite well in high dimensional space, there are some theoretical reasons why this is the case but that topic is too advanced for an intro course. We could do better by performing dimensionality reduction techniques or regularization (which is a feature that SVMs actually have built-in, see the $C$ parameter)... 

Finally, you might have noticed that our SVM is a linear function. However, we can extend the SVM to non-linear cases using something called the **Kernel Trick**. We won't get into it in this course but the **Kernel Trick** is an extraordinary property of the SVM that allows it to be widely applicable!