# Supervised Machine Learning

2 of the main types of machine learning are **supervised** and **unsupervised** learning. These notebooks will give a crash course on some common models for **supervised learning**. In supervised learning, algorithms "learn" from **labeled data**. An algorithm determines what label should be given to unlabeled new data based on patterns recognized from the labeled data.


Supervised learning can be divided into 2 categories: **classification** and **regression**
- classification: predicts a **category** for a data point (ex malignant or benign, male or female)
- regression: predicts a **numerical value** for a data point (ex height, weight)

## Resource

https://medium.com/machine-learning-101
- really great series of articles on intro to machine learning, take a look if you have time!

# SVM (Support Vector Machine)

SVMs are algorithms that can be used for both classification and regression purposes, although they are more commonly used for **classification**.

An SVM classifies data by finding the optimal **hyperplane** (line) that best divides (classifies) a dataset into classes like in the example below. In two dimentional space this hyperplane is a line dividing a plane in two parts where in each class lay in either side.

Each data item is a vector point in n-dimensional space, where n is the number of features 

<img src="https://cdn-images-1.medium.com/max/1200/0*0o8xIA4k3gXUDCFU.png">

**Support vectors** are the data points nearest to the hyperplane, where if removed would alter the position of the dividing hyperplane (making them the most important elements of the dataset!)

The **margin** is the spacing between the hyperplane and support vectors.

Intuitively, **the further from the hyperplane our support vectors points lie (*aka the larger the margin is for both classes*), the more confident we are that they have been correctly classified**. We therefore want our data points to be as far away from the hyperplane as possible, while still being on the correct side of it.

<img src="https://miro.medium.com/max/720/1*fv8DDZLaR0t7SO-W6tdDAg.png">

So what happens when we have data that overlaps and *doesn't* have a clear hyperplane? We have these two options:

<img src="https://miro.medium.com/max/600/1*1dwut8cWQ-39POHV48tv4w.png">
<img src="https://miro.medium.com/max/600/1*gt_dkcA5p0ZTHjIpq1qnLQ.png"> 

And both options are correct! *However* there is a tradeoff. If you have a huge data set * *cough challenge problem cough* * then going with the second option may not be the best option as it will take a long time.

## Important Parameters

- Regularization
- Gamma
- Margin
- Kernel

## Regularization (C)
- tells the SVM optimization how much you want to avoid misclassifying each training example
> - C with a low value creates a smooth decision boundary
> - C with a high values encourages a decision boundary that correctly labels all training examples even if the boundary is very specific to the training example

## Gamma
- defines how far the influence of a single training example reaches

<img src="https://miro.medium.com/max/720/1*dGDQxV8j83VB90skHsXktw.png">
<img src="https://miro.medium.com/max/720/1*ClmsnU_yb1YtIwAAr7krmg.png">

## Margin
- separation of line to the support vectors

<img src="https://miro.medium.com/max/720/1*Ftns0ebfUHJDdpWt3Wvp-Q.png">
<img src="https://miro.medium.com/max/720/1*NbGV1iEtNuklACNUv74w7A.png">

### Kernel
- defines whether we want a linear of linear separation

<img src="https://miro.medium.com/max/720/1*C3j5m3E3KviEApHKleILZQ.png"> <br><br>
<img src="https://miro.medium.com/max/720/1*FLolUnVUjqV0EGm3CYBPLw.png"> <br><br>
<img src="https://miro.medium.com/max/720/1*NN5VCpVg9gPCLYrDl0YFYw.png"> <br><br>

♥ sklearn library's SVM ♥

# Example SVM

Next we will see if we can use an SVM to predict whether or not a patient has diabetes given some medical information about them. Load and view the data in the cells below.

## Loading the Data

In [15]:
#imports
import numpy as np
import pandas as po
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score
from sklearn.svm import SVC # "Support vector classifier"

In [16]:
#loading pima indians diabetes dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']

#  'preg': number of pregnancies  
#  'plas': plasma glucose concentration 
#  'pres': blood pressure 
#  'skin': skin thickness
#  'test': Insulin
#  'mass': BMI
#  'pedi': diabetes pedigree function
#  'age': age
#  'class': '0' means does not have diabetes and '1' means has diabetes

data = po.read_csv(url, names=names)

data.head()

Unnamed: 0,preg,plas,pres,skin,test,mass,pedi,age,class
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


## Splitting Testing/Training Data

In [17]:
# columns we will use to make predictions with (features!)
x_cols = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age']

# column that we want to predict
y_col = 'class'

# 80-20 split of dataset
test_size = 0.2
x_training, x_testing, y_training, y_testing = train_test_split(data[x_cols], data[y_col], test_size=test_size, random_state=0)

## Creating Model

In [25]:
# creating a model with sklearn's SVC, feel free to play around with the parameters!
svm = SVC(gamma = .01)

# training/fitting a model with training data
svm.fit(x_training, y_training)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.01, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

## Evaluating Model

In [26]:
#printing accuracy of testing & training data
y_train_pred= svm.predict(x_training)
print("Training Accuracy is ", accuracy_score(y_training, y_train_pred)*100)
y_test_pred= svm.predict(x_testing)
print("Testing Accuracy is ", accuracy_score(y_testing,y_test_pred)*100)

Training Accuracy is  96.74267100977198
Testing Accuracy is  72.07792207792207


## Notes

**Advantages**
- Works well on smaller cleaner datasets
- Accuracy
- Uses a subset of training points in the decision function (called support vectors), so it is also **memory efficient**.

**Disadvantages**
- Training time is long, isn't well suited for larger data sets
- Less effective on noisier datasets with overlapping classes
> - If the number of features is much greater than the number of samples, **avoid over-fitting**.