# Support Vector Machine (SVM)   

a powerful supervised machine learning model used for classification.  
An SVM makes classifications by defining a **decision boundary** and then seeing what side of the boundary an unclassified point falls on.

- when the data has two features, the decision boundary is a line
-  If there are three features, the decision boundary is a plane 
- more than 3 dimensions/ features,  the decision boundary is called "separating hyperplane"

The **support vectors** are the points in the training set closest to the decision boundary ( If you are using n features, there are at least n+1 support vectors).  


The distance between a support vector and the decision boundary is called the **margin**. We want to make the margin as large as possible.  


Because the support vectors are so critical in defining the decision boundary, many of the other training points can be ignored. This is one of the advantages of SVMs.  
Many supervised machine learning algorithms use every training point in order to make a prediction, even though many of those training points aren’t relevant. SVMs are fast because they only use the support vectors!  

________________________
**Outliers**  
SVMs have a parameter **C** that determines how much error the SVM will allow for.  
- If **C** is large, SVM has a hard margin — it won’t allow for misclassifications and as a result, the margin could be fairly small. 
   - If C is too large, model risks of overfitting (it relies too heavily on the training data, including the outliers)  
   

- if **C** gets too small, the model risks of underfitting.  
   
> Python :  
classifier = SVC(**C** = 0.01)  
(C will depend on the data)

## scikit-learn
(calculating the parameters of the best decision boundary is a fairly complex optimization problem. Luckily, Python’s scikit-learn library has implemented an SVM that will do this for us).

In [1]:
from sklearn.datasets import make_circles
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split

#Makes concentric circles
points, labels = make_circles(n_samples=300, factor=.2, noise=.05, random_state = 1)


training_data, validation_data, training_labels, validation_labels = train_test_split(points,
                                                                                      labels,
                                                                                      train_size = 0.8, 
                                                                                      test_size = 0.2,
                                                                                      random_state = 100)
# "degree = 2": we want to project on a plane of dimensions 2 in fine
classifier  = SVC(kernel = "poly", degree = 2)    # polynomial kernel transforms points into three dimensions
classifier1 = SVC(kernel = "poly")                # polynomial kernel transforms points into three dimensions
classifier2 = SVC(kernel = "linear", degree = 2)  # linear boundary
classifier3 = SVC(kernel = "linear")              # linear boundary
classifier.fit(training_data, training_labels)
classifier1.fit(training_data, training_labels)
classifier2.fit(training_data, training_labels) 
classifier3.fit(training_data, training_labels)
print(classifier.score(validation_data, validation_labels))
print(classifier1.score(validation_data, validation_labels))
print(classifier2.score(validation_data, validation_labels))
print(classifier3.score(validation_data, validation_labels))

1.0
0.5833333333333334
0.5666666666666667
0.5666666666666667


In [18]:
from sklearn.datasets import make_circles
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split

#Makes concentric circles
points, labels = make_circles(n_samples=300, factor=.2, noise=.05, random_state = 1)

#Makes training set and validation set.
training_data, validation_data, training_labels, validation_labels = train_test_split(points,
                                                                                      labels,
                                                                                      train_size = 0.8,
                                                                                      test_size = 0.2,
                                                                                      random_state = 100)

# linear kernel 
classifier = SVC(kernel = "linear", random_state = 1)
classifier.fit(training_data, training_labels)
print(classifier.score(validation_data, validation_labels))
print(training_data[0])



# polynomial kernel (detailled afterwards just below), transforms points into three dimensions
classifier1 = SVC(kernel = "poly", degree = 2)  # we want to project on a plane of dimensions 2 in fine
classifier1.fit(training_data, training_labels)
print(classifier1.score(validation_data, validation_labels))

# Important :
# what the polynomial kernel is doing, is detailed below : 
new_training = [[2 ** 0.5 * pt[0] * pt[1], pt[0] ** 2, pt[1] ** 2] for pt in training_data]
new_validation = [[2 ** 0.5 * pt[0] * pt[1], pt[0] ** 2, pt[1] ** 2] for pt in validation_data]

classifier.fit(new_training, training_labels)
print(classifier.score(new_validation, validation_labels))

0.5666666666666667
[0.31860062 0.11705731]
1.0
1.0


### Radial Bias Function Kernel
The most commonly used kernel in SVMs is a radial basis function (**rbf**) kernel. This is the default kernel used in scikit-learn’s SVC object. If you don’t specifically set the kernel to "linear", "poly" the SVC object will use an rbf kernel.  
rbf kernel transforms points into infinite dimensions

> Python:  
classifier = SVC(kernel = "rbf", **gamma** = 0.5, C = 2)

**gamma** is similar to the C parameter. You can essentially tune the model to be more or less sensitive to the training data. 
- Higher gamma, say 100, could result in overfitting (will put more importance on the training data).  


- Lower gamma like 0.01, can result in underfitting (makes the points in the training data less relevant).

In [2]:
from data import points, labels
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

training_data, validation_data, training_labels, validation_labels = train_test_split(points,
                                                                                      labels, 
                                                                                      train_size = 0.8, 
                                                                                      test_size = 0.2, 
                                                                                      random_state = 100)

print("gamma 52")
classifier = SVC(kernel = "rbf", gamma = 52) # rbf kernel transforms points into infinite dimensions
classifier.fit(training_data, training_labels)
print(classifier.score(training_data, training_labels))
print(classifier.score(validation_data, validation_labels))

print("\ngamma 10")
classifier = SVC(kernel = "rbf", gamma = 10)
classifier.fit(training_data, training_labels)
print(classifier.score(training_data, training_labels))
print(classifier.score(validation_data, validation_labels))

print("\ngamma 1")
classifier = SVC(kernel = "rbf", gamma = 1)
classifier.fit(training_data, training_labels)
print(classifier.score(training_data, training_labels))
print(classifier.score(validation_data, validation_labels))

print("\ngamma 0.1")
classifier = SVC(kernel = "rbf", gamma = 0.1)
classifier.fit(training_data, training_labels)
print(classifier.score(training_data, training_labels))
print(classifier.score(validation_data, validation_labels))

print("\ngamma 0.01")
classifier = SVC(kernel = "rbf", gamma = 0.01) # rbf kernel transforms points into infinite dimensions
classifier.fit(training_data, training_labels)
print(classifier.score(training_data, training_labels))
print(classifier.score(validation_data, validation_labels))

gamma 52
1.0
0.7222222222222222

gamma 10
1.0
0.8333333333333334

gamma 1
0.9930555555555556
0.8888888888888888

gamma 0.1
0.8611111111111112
0.7777777777777778

gamma 0.01
0.7986111111111112
0.7222222222222222


# Tutorial

In [4]:
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from svm_visualization import draw_boundary
from players import aaron_judge

fig, ax = plt.subplots() # ax = axes of your graph

# 1/ Create the labels

#print(aaron_judge.columns)
#print(aaron_judge.description.unique())
#print(aaron_judge.type.unique())  # ['S' 'B' 'X']

# change every 'S' to a 1 and every 'B' to a 0
aaron_judge['type'] = aaron_judge['type'].map({'S': 1, 'B': 0})
#print(aaron_judge['type'])  #  check


# 2/ Plotting the pitches

#print(aaron_judge['plate_x'], aaron_judge['plate_z'])

# remove every row that has a NaN in any of those columns
aaron_judge = aaron_judge.dropna(subset = ['type', 'plate_x', 'plate_z'])

# We have points to plot using Matplotlib
plt.scatter(x = aaron_judge['plate_x'], y = aaron_judge['plate_z'], c = aaron_judge['type'], cmap = plt.cm.coolwarm, alpha = 0.5)
# To color the points correctly, the parameter c should be the type column
# To make the strikes red and the balls blue, set the cmap parameter to plt.cm.coolwarm
# To make the points slightly transparent, set the alpha parameter to 0.25


# 3/ Building the SVM 

training_set, validation_set = train_test_split(aaron_judge, random_state = 1)
classifier = SVC(kernel = "rbf")
classifier.fit(training_set[['plate_x', 'plate_z']], training_set['type']) # The trained SVM

#  visualize the SVM, call the draw_boundary function. This is a function that we wrote ourselves - you won’t find it in scikit-learn
draw_boundary(ax, classifier)
# The axes of your graph. For us, is the ax variable that we defined at the top of your code.
# The trained SVM. For us, this is classifier. Make sure you’ve called .fit() before trying to visualize the decision boundary.

plt.show()
print(classifier.score(validation_set[['plate_x', 'plate_z']], validation_set['type']))


# 4/ Optimizing the SVM

classifier = SVC(kernel = "rbf", C = 100, gamma = 100)
classifier.fit(training_set[['plate_x', 'plate_z']], training_set['type']) # The trained SVM
draw_boundary(ax, classifier)
plt.show()
print(classifier.score(validation_set[['plate_x', 'plate_z']], validation_set['type']))



def best_para(training_set, validation_set, maximal):
  largest = {'value': 0, "C": 1, 'gamma': 1}
  for gamma in range(1,maximal):
    for C in range(1,maximal):
      classifier = SVC(kernel = "rbf", C = C, gamma = gamma)
      classifier.fit(training_set[['plate_x', 'plate_z']], training_set['type']) # The trained SVM
      score = classifier.score(validation_set[['plate_x', 'plate_z']], validation_set['type'])
      if largest['value'] < score:
        largest['value'] = score
        largest['C'] = C
        largest['gamma'] = gamma
  draw_boundary(ax, classifier)
  plt.show()  
  print(largest)

best_para(training_set, validation_set, 10)   

UnpicklingError: invalid load key, '\xef'.

# Review

- SVMs are supervised machine learning models used for classification.  
    
    
- An SVM uses support vectors to define a decision boundary. Classifications are made by comparing unlabeled points to that decision boundary.  
    

- Support vectors are the points of each class closest to the decision boundary. The distance between the support vectors and the decision boundary is called the margin.  
    
    
- SVMs attempt to create the largest margin possible while staying within an acceptable amount of error.
    The C parameter controls how much error is allowed. A large C allows for little error and creates a hard margin. A small C allows for more error and creates a soft margin.  
    
    
- SVMs use kernels to classify points that aren’t linearly separable.  
    
    
- Kernels transform points into higher dimensional space. A polynomial kernel transforms points into three dimensions while an rbf kernel transforms points into infinite dimensions.  
    
    
- An rbf kernel has a gamma parameter. If gamma is large, the training data is more relevant, and as a result overfitting can occur.
