In [4]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

## CHAPTER 5
# **SUPPORT VECTOR MACHINES**

A Support Vector Machine (SVM) is a very powerful and versatile Machine Learning
model, capable of performing linear or nonlinear classification, regression, and even
outlier detection. It is one of the most popular models in Machine Learning, and anyone
interested in Machine Learning should have it in their toolbox.

# **Synopsys:**
### 1. Linear SVM Classification
### 2. Non Linear SVM Classification
### 3. SVM Regression

# 1) 🛳️ Linear SVM Classification

### Soft Margin Classification
The objective is to
find a good balance between keeping the street as large as possible and limiting the
margin violations (i.e., instances that end up in the middle of the street or even on the
wrong side). This is called soft margin classification<br>
In Scikit-Learn’s SVM classes, you can control this balance using the C hyperparameter:
a smaller C value leads to a wider street but more margin violations. Figure 5-4
shows the decision boundaries and margins of two soft margin SVM classifiers on a
nonlinearly separable dataset. On the left, using a low C value the margin is quite
large, but many instances end up on the street. On the right, using a high C value the
classifier makes fewer margin violations but ends up with a smaller margin. However,
it seems likely that the first classifier will generalize better: in fact even on this training
set it makes fewer prediction errors, since most of the margin violations are
actually on the correct side of the decision boundary.


### Iris Data Model
The following Scikit-Learn code loads the iris dataset, scales the features, and then
trains a linear SVM model (using the LinearSVC class with C = 1 and the hinge loss
function, described shortly) to detect Iris-Virginica flowers.

In [325]:
# Setting up Libraries and Data

from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

iris = datasets.load_iris()
X = iris["data"][:, (2, 3)] # loading petal length and width
y = (iris["target"]==2).astype(np.float64) # Iris_Vriginica


In [326]:
for i in range(len(y)):
    if y[i]:
        plt.scatter(X[i, 0], X[i, 1], c="r")
    else:
        plt.scatter(X[i, 0], X[i, 1], c="b")
plt.title("Predicted")
plt.show()

In [327]:
# USING linearSVC()
svm_clf = Pipeline([
    ("scaler", StandardScaler()),
    ("linear_svc", LinearSVC(C=1, loss="hinge")),
])
svm_clf.fit(X, y)

In [330]:
pred_linear_svc = svm_clf.predict(X)

for i in range(len(pred_linear_svc)):
    if pred_linear_svc[i]:
        plt.scatter(X[i, 0], X[i, 1], c="r", alpha=0.7)
    else:
        plt.scatter(X[i, 0], X[i, 1], c="b", alpha=0.7)
plt.title("Predicted")
plt.show()
#Unlike Logistic Regression classifiers, SVM classifiers do not output probabilities for each class.

### Other Ways to implement SVC
👉 Alternatively, you could use the SVC class, using SVC(kernel="linear", C=1), but it
is much slower, especially with large training sets, so it is not recommended.<br><br>
👉 Another
option is to use the SGDClassifier class, with SGDClassifier(loss="hinge",
alpha=1/(m*C)). This applies regular Stochastic Gradient Descent (see Chapter 4) to
train a linear SVM classifier. It does not converge as fast as the LinearSVC class, but it
can be useful to handle huge datasets that do not fit in memory (out-of-core training),
or to handle online classification tasks.

# 2) 🥴 Non Linear SVM Classification

## a) Using Pipeline to add more features(polynomial)

In [71]:
from sklearn.datasets import make_moons
from sklearn.preprocessing import PolynomialFeatures

polynomial_svm_clf = Pipeline([
    ("poly_features", PolynomialFeatures(degree=3)),
    ("scaler", StandardScaler()),
    ("svm_clf", LinearSVC(C=10, loss="hinge")),
])
moons = make_moons()
X_moons = np.array(moons[:][0])
y_moons = np.array(moons[:][1])
for i in range(X_moons.shape[0]):
    X_moons[i, 1] += (np.random.randn())/10
    X_moons[i, 0] += (np.random.randn())/10

polynomial_svm_clf.fit(X_moons, y_moons)

# the model failed to converge as it fits the data in first step
# the reason for this is the graph plotted below
# but after adding noise, data is now slowly fitted

In [79]:
predicted = polynomial_svm_clf.predict(X_moons)
for i in range(len(predicted)):
    if predicted[i]:
        plt.scatter(X_moons[i, 0], X_moons[i, 1], c="r")
    else:
        plt.scatter(X_moons[i, 0], X_moons[i, 1], c="b")
plt.title("Predicted")
plt.show()
#plt.scatter(X_moons[:, 0], X_moons[:, 1])


## b) Polynomial Kernel
when using SVMs you can apply an almost miraculous mathematical
technique called the kernel trick (it is explained in a moment). It makes it possible to
get the same result as if you added many polynomial features, even with very highdegree
polynomials, without actually having to add them

In [124]:
# applying polynomial kernel using sklearn

from sklearn.svm import SVC
poly_kernel_svm_clf = Pipeline([
    ("scaler", StandardScaler()),
    ("svm_clf", SVC(kernel="poly", degree=3, coef0=1, C=5))
])
poly_kernel_svm_clf.fit(X_moons, y_moons)

# The hyperparameter coef0 controls how much the model is 
# influenced by highdegree
# polynomials versus low-degree polynomials.

In [125]:
# predicting the data 
all_data = np.random.randn(500,2)
predicted = poly_kernel_svm_clf.predict(all_data)
for i in range(len(predicted)):
    if predicted[i]:
        plt.scatter(all_data[i, 0], all_data[i, 1], c="r", s=10)
    else:
        plt.scatter(all_data[i, 0], all_data[i, 1], c="b", s=10)
plt.title("Predicted")
plt.show()

## c) Adding Similarity Features
Another technique to tackle nonlinear problems is to add features computed using a
similarity function that measures how much each instance resembles a particular
landmark. For example, let’s take the one-dimensional dataset discussed earlier and
add two landmarks to it at x1 = –2 and x1 = 1 (see the left plot in Figure 5-8). Next,
let’s define the similarity function to be the Gaussian Radial Basis Function (RBF)
with γ = 0.3 .<br>
It is a bell-shaped function varying from 0 (very far away from the landmark) to 1 (at
the landmark). Now we are ready to compute the new features. For example, let’s look
at the instance x1 = –1: it is located at a distance of 1 from the first landmark, and 2
from the second landmark. Therefore its new features are x2 = exp (–0.3 × 12) ≈ 0.74
and x3 = exp (–0.3 × 2<sup>2</sup>) ≈ 0.30. The plot on the right of Figure 5-8 shows the transformed
dataset (dropping the original features). As you can see, it is now linearly
separable.

#### Gaussian RBF Kernel
Just like the polynomial features method, the similarity features method can be useful
with any Machine Learning algorithm, but it may be computationally expensive to
compute all the additional features, especially on large training sets. However, once
again the kernel trick does its SVM magic: it makes it possible to obtain a similar
result as if you had added many similarity features, without actually having to add
them. Let’s try the Gaussian RBF kernel using the SVC class:

In [132]:
# making pipelines
rbf_kernel_svm_clf = Pipeline([
    ("scaler", StandardScaler()),
    ("svm_clf", SVC(kernel="rbf", gamma=5, C=0.001))
])
rbf_kernel_svm_clf.fit(X_moons, y_moons)

In [133]:
# predicting and ploting 

predicted = rbf_kernel_svm_clf.predict(X_moons)
for i in range(len(predicted)):
    if predicted[i]:
        plt.scatter(X_moons[i, 0], X_moons[i, 1], c="r")
    else:
        plt.scatter(X_moons[i, 0], X_moons[i, 1], c="b")
plt.title("Predicted")
plt.show()
#plt.scatter(X_moons[:, 0], X_moons[:, 1])


### 📌 TIP
>use? As a rule of thumb, you should always try the linear
kernel first (remember that LinearSVC is much faster than SVC(ker
nel="linear")), especially if the training set is very large or if it
has plenty of features. If the training set is not too large, you should
try the Gaussian RBF kernel as well; it works well in most cases.
Then if you have spare time and computing power, you can also
experiment with a few other kernels using cross-validation and grid
search, especially if there are kernels specialized for your training
set’s data structure.

## Computational Complexity ⌛
<table>
    <tr><th>Class</th><th>Time complexity</th><th>Out-of-core support</th><th>Scaling required</th> <th>Kernel trick</th> </tr>    
    <tr><td>LinearSVC</td><td> O(m × n)</td><td> No</td><td> Yes</td><td> No</td></tr>
    <tr> <td> SGDClassifier</td><td> O(m × n)</td><td> Yes</td><td> Yes</td><td> No</td><td></tr>
    <tr>   <td>SVC</td><td> O(m² × n) to O(m³ × n)</td><td> No</td><td> Yes</td><td> Yes</td></tr>
</table>

# 3) 💹 SVM Regression
The trick is to reverse the objective: instead of trying to fit the largest possible
street between two classes while limiting margin violations, SVM Regression
tries to fit as many instances as possible on the street while limiting margin violations
(i.e., instances off the street). The width of the street is controlled by a hyperparameter
ϵ. Figure 5-10 shows two linear SVM Regression models trained on some random
linear data, one with a large margin (ϵ = 1.5) and the other with a small margin (ϵ =
0.5).

In [302]:
# Creating the testing data
m=200
X_t =  6*np.random.randn(m, 1) 
y_t = 0.5 * X_t**2 + X_t + 2 + np.random.randn(m, 1)*10

In [311]:
plt.scatter(X_t, y_t, alpha=0.8)

### Using LinearSVR

In [304]:
from sklearn.svm import LinearSVR
svm_reg = LinearSVR(epsilon=1.5)
svm_reg.fit(X_t, y_t.ravel())

In [309]:
prediction_svm = svm_reg.predict(X_t)
plt.scatter(X_t, prediction_svm)
plt.scatter(X_t, y_t, c="orange", alpha=0.4)

### Using SVR

In [306]:
from sklearn.svm import SVR
svm_poly_reg = SVR(kernel="poly", degree=2, C=100, epsilon=1)
svm_poly_reg.fit(X_t, y_t.ravel())


In [310]:
pred_svr = svm_poly_reg.predict(X_t)
plt.scatter(X_t, y_t, c="orange", alpha=0.7)
plt.scatter(X_t, pred_svr)


# 🗺️Completed !! 🚵‍ 🎢
# 🤪🥴