<a href="https://colab.research.google.com/github/scharnk/Linear-Classifiers-in-Python/blob/master/CH04_Support_Vector_Machines.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

What is an SVM?
* Linear classifiers (so far)
* Trained using the hinge loss and L2 regularization

What are support vectors?
* Support vector: a training example not in the flat part of the loss diagram
* Support vector: an example that is incorrectly classified or close to the boundary
* If an example is not a support vector, removing it has no effect on the model
* Having a small number of support vectors makes kernel SVMs really fast

# Max-margin viewpoint
* The SVM maximizes the "margin" for linearly separable datasets
* Margin: distance from the boundary to the closest points

* SVM = hinge loss with L2 regularization

In [0]:
# Train a linear SVM
svm = SVC(kernel="linear")
svm.fit(X,y)
plot_classifier(X, y, svm, lims=(11,15,0,6))

# Make a new data set keeping only the support vectors
print("Number of original examples", len(X))
print("Number of support vectors", len(svm.support_))
X_small = X[svm.support_]
y_small = y[svm.support_]

# Train a new SVM using only the support vectors
svm_small = SVC(kernel="linear")
svm_small.fit(X_small, y_small)
plot_classifier(X_small, y_small, svm_small, lims=(11,15,0,6))

# Transforming features
* can transform features to make points linearly separeable
i.e. transformed feature=(original feature)^2

* **fitting a linear model in a transformed space corresponds to a non-linear model in the original space**

# kernals and SVM kernals implement feature transformations in a computationally efficiently way
* **default is radial basis func or "RBF"**
# Gamma (hyperparameter)
* gamma default = 1 
* gamma controls fitting
* too large = overfitting
* too small = underfitting

In [0]:
# Instantiate an RBF SVM
svm = SVC()

# Instantiate the GridSearchCV object and run the search
parameters = {'gamma':[0.00001, 0.0001, 0.001, 0.01, 0.1]}
searcher = GridSearchCV(svm, parameters)
searcher.fit(X,y)

# Report the best parameters
print("Best CV params", searcher.best_params_)

In [0]:
# Instantiate an RBF SVM
svm = SVC()

# Instantiate the GridSearchCV object and run the search
parameters = {'C':[0.1, 1, 10], 'gamma':[0.00001, 0.0001, 0.001, 0.01, 0.1]}
searcher = GridSearchCV(svm, parameters)
searcher.fit(X_train, y_train)

# Report the best parameters and the corresponding score
print("Best CV params", searcher.best_params_)
print("Best CV accuracy", searcher.best_score_)

# Report the test accuracy using these best parameters
print("Test accuracy of best grid search hypers:", searcher.score(X_test, y_test))

## Pros and Cons (Logistic-Regression vs SVM)

### **Logistic regression:**

* Is a linear classifier
* Can use with kernels, but slow
* Outputs meaningful probabilities
* Can be extended to multi-class
* All data points affect fit
* L2 or L1 regularization

### **Support vector machine (SVM):**

* Is a linear classifier
* Can use with kernels, and fast
* Does not naturally output probabilities
* Can be extended to multi-class
* Only "support vectors" affect fit
* Conventionally just L2 regularization

<br>

## Logistic regression in sklearn:

* linear_model.LogisticRegression

### Key hyperparameters in sklearn:

* C (inverse regularization strength)
* penalty (type of regularization)
* multi_class (type of multi-class)
<br>

## SVM in sklearn:

* svm.LinearSVC and svm.SVC

### Key hyperparameters in sklearn:

* C (inverse regularization strength)
* kernel (type of kernel)
* gamma (inverse RBF smoothness)

# SGDClassifier
### SGDClassifier: scales well to large datasets
In [1]: from sklearn.linear_model import SGDClassifier <br>
In [2]: logreg = SGDClassifier(loss='log') <br>
In [3]: linsvm = SGDClassifier(loss='hinge') <br>
* SGDClassifier hyperparameter alpha is like 1/C

In [0]:
# We set random_state=0 for reproducibility 
linear_classifier = SGDClassifier(random_state=0)

# Instantiate the GridSearchCV object and run the search
parameters = {'alpha':[0.00001, 0.0001, 0.001, 0.01, 0.1, 1], 
             'loss':['hinge','log'], 'penalty':['l1','l2']}
searcher = GridSearchCV(linear_classifier, parameters, cv=10)
searcher.fit(X_train, y_train)

# Report the best parameters and the corresponding score
print("Best CV params", searcher.best_params_)
print("Best CV accuracy", searcher.best_score_)
print("Test accuracy of best grid search hypers:", searcher.score(X_test, y_test))

### How does this course fit into Data Science?
* Data science
* --> Machine learning
* --> --> Supervised learning
* --> --> --> Classification
* --> --> --> --> Linear classifiers (this course)