# **SVM Kernels**
The SVM algorithm is implemented in practice using a kernel. A kernel transforms an input data space into the required form. SVM uses a technique called the kernel trick. Here, the kernel takes a low-dimensional input space and transforms it into a higher dimensional space. In other words, you can say that it converts nonseparable problem to separable problems by adding more dimension to it. It is most useful in non-linear separation problem. Kernel trick helps you to build a more accurate classifier.

- **Linear Kernel** A linear kernel can be used as normal dot product any two given observations. The product between two vectors is the sum of the multiplication of each pair of input values.

$$ K(x, xi) = sum(x * xi) $$

- **Polynomial Kernel** A polynomial kernel is a more generalized form of the linear kernel. The polynomial kernel can distinguish curved or nonlinear input space

$$ K(x,xi) = 1 + sum(x * xi)^d $$
Where d is the degree of the polynomial. d=1 is similar to the linear transformation. The degree needs to be manually specified in the learning algorithm.

- **Radial Basis Function Kernel** The Radial basis function kernel is a popular kernel function commonly used in support vector machine classification. RBF can map an input space in infinite dimensional space.

$$ K(x,xi) = exp(-gamma * sum((x – xi^2)) $$
Here gamma is a parameter, which ranges from 0 to 1. A higher value of gamma will perfectly fit the training dataset, which causes over-fitting. Gamma=0.1 is considered to be a good default value. The value of gamma needs to be manually specified in the learning algorithm.

In [17]:
import pandas as pd
import numpy as np
from tqdm.notebook import tqdm
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import confusion_matrix

In [2]:
df = pd.read_csv("q5.csv")
df

Unnamed: 0,Mean of the integrated profile,Standard deviation of the integrated profile,Excess kurtosis of the integrated profile,Skewness of the integrated profile,Mean of the DM-SNR curve,Standard deviation of the DM-SNR curve,Excess kurtosis of the DM-SNR curve,Skewness of the DM-SNR curve,target_class
0,140.562500,55.683782,-0.234571,-0.699648,3.199833,19.110426,7.975532,74.242225,0
1,102.507812,58.882430,0.465318,-0.515088,1.677258,14.860146,10.576487,127.393580,0
2,103.015625,39.341649,0.323328,1.051164,3.121237,21.744669,7.735822,63.171909,0
3,136.750000,57.178449,-0.068415,-0.636238,3.642977,20.959280,6.896499,53.593661,0
4,88.726562,40.672225,0.600866,1.123492,1.178930,11.468720,14.269573,252.567306,0
...,...,...,...,...,...,...,...,...,...
17893,136.429688,59.847421,-0.187846,-0.738123,1.296823,12.166062,15.450260,285.931022,0
17894,122.554688,49.485605,0.127978,0.323061,16.409699,44.626893,2.945244,8.297092,0
17895,119.335938,59.935939,0.159363,-0.743025,21.430602,58.872000,2.499517,4.595173,0
17896,114.507812,53.902400,0.201161,-0.024789,1.946488,13.381731,10.007967,134.238910,0


In [3]:
X = df.loc[:, df.columns != 'target_class']
X

Unnamed: 0,Mean of the integrated profile,Standard deviation of the integrated profile,Excess kurtosis of the integrated profile,Skewness of the integrated profile,Mean of the DM-SNR curve,Standard deviation of the DM-SNR curve,Excess kurtosis of the DM-SNR curve,Skewness of the DM-SNR curve
0,140.562500,55.683782,-0.234571,-0.699648,3.199833,19.110426,7.975532,74.242225
1,102.507812,58.882430,0.465318,-0.515088,1.677258,14.860146,10.576487,127.393580
2,103.015625,39.341649,0.323328,1.051164,3.121237,21.744669,7.735822,63.171909
3,136.750000,57.178449,-0.068415,-0.636238,3.642977,20.959280,6.896499,53.593661
4,88.726562,40.672225,0.600866,1.123492,1.178930,11.468720,14.269573,252.567306
...,...,...,...,...,...,...,...,...
17893,136.429688,59.847421,-0.187846,-0.738123,1.296823,12.166062,15.450260,285.931022
17894,122.554688,49.485605,0.127978,0.323061,16.409699,44.626893,2.945244,8.297092
17895,119.335938,59.935939,0.159363,-0.743025,21.430602,58.872000,2.499517,4.595173
17896,114.507812,53.902400,0.201161,-0.024789,1.946488,13.381731,10.007967,134.238910


In [4]:
y = df["target_class"]
y

0        0
1        0
2        0
3        0
4        0
        ..
17893    0
17894    0
17895    0
17896    0
17897    0
Name: target_class, Length: 17898, dtype: int64

### b)

In [5]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=42)

In [6]:
X_train.shape

(11991, 8)

In [7]:
X_test.shape

(5907, 8)

In [8]:
tests = {
    "rbf": [1, 100, 1000],
    "linear":[1, 100, 1000],
    "poly": [1, 100],
    "sigmoid": [1, 100],
}

In [9]:
df = pd.DataFrame(columns=['kernel', 'C', 'accuracy'])
i = 0
for kernel in tqdm(tests):
  for c in tqdm(tests[kernel]):
    clf = SVC(kernel=kernel, C=c)
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    df.loc[i] = [kernel, c, acc]
    i = i + 1
df

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]

Unnamed: 0,kernel,C,accuracy
0,rbf,1,0.972575
1,rbf,100,0.977992
2,rbf,1000,0.9785
3,linear,1,0.978669
4,linear,100,0.978331
5,linear,1000,0.977992
6,poly,1,0.972236
7,poly,100,0.976299
8,sigmoid,1,0.925681
9,sigmoid,100,0.921788


### c)

In [10]:
params = [
  {'C': [1, 10, 100, 500], 'gamma':[0.1, 0.3, 0.5, 0.7, 0.9], 'kernel': ['rbf']},
  {'C': [1, 10, 100, 1000], 'kernel': ['linear']},
  {'degree':[2,3,4], 'gamma':[0.01, 0.03, 0.05], 'kernel':['poly'], 'max_iter':[10000]}
]

In [12]:
# Ran it once and noticed cross-validation doesn't matter that much so it is set to 2
clf = GridSearchCV(estimator=SVC(), param_grid=params, cv=2, verbose = 5, scoring='accuracy') 
%time clf.fit(X_train, y_train)
clf.cv_results_

Fitting 2 folds for each of 33 candidates, totalling 66 fits
[CV 1/2] END ........C=1, gamma=0.1, kernel=rbf;, score=0.908 total time=   8.3s
[CV 2/2] END ........C=1, gamma=0.1, kernel=rbf;, score=0.908 total time=   7.9s
[CV 1/2] END ........C=1, gamma=0.3, kernel=rbf;, score=0.907 total time=   8.6s
[CV 2/2] END ........C=1, gamma=0.3, kernel=rbf;, score=0.907 total time=   8.7s
[CV 1/2] END ........C=1, gamma=0.5, kernel=rbf;, score=0.907 total time=   8.9s
[CV 2/2] END ........C=1, gamma=0.5, kernel=rbf;, score=0.907 total time=   8.9s
[CV 1/2] END ........C=1, gamma=0.7, kernel=rbf;, score=0.907 total time=   8.9s
[CV 2/2] END ........C=1, gamma=0.7, kernel=rbf;, score=0.907 total time=   9.0s
[CV 1/2] END ........C=1, gamma=0.9, kernel=rbf;, score=0.907 total time=   9.1s
[CV 2/2] END ........C=1, gamma=0.9, kernel=rbf;, score=0.907 total time=   9.0s
[CV 1/2] END .......C=10, gamma=0.1, kernel=rbf;, score=0.908 total time=   8.0s
[CV 2/2] END .......C=10, gamma=0.1, kernel=rbf;



[CV 2/2] END degree=2, gamma=0.01, kernel=poly, max_iter=1000;, score=0.181 total time=   0.0s
[CV 1/2] END degree=2, gamma=0.03, kernel=poly, max_iter=1000;, score=0.490 total time=   0.0s




[CV 2/2] END degree=2, gamma=0.03, kernel=poly, max_iter=1000;, score=0.220 total time=   0.0s
[CV 1/2] END degree=2, gamma=0.05, kernel=poly, max_iter=1000;, score=0.170 total time=   0.0s




[CV 2/2] END degree=2, gamma=0.05, kernel=poly, max_iter=1000;, score=0.321 total time=   0.0s
[CV 1/2] END degree=3, gamma=0.01, kernel=poly, max_iter=1000;, score=0.372 total time=   0.0s
[CV 2/2] END degree=3, gamma=0.01, kernel=poly, max_iter=1000;, score=0.722 total time=   0.0s




[CV 1/2] END degree=3, gamma=0.03, kernel=poly, max_iter=1000;, score=0.165 total time=   0.0s
[CV 2/2] END degree=3, gamma=0.03, kernel=poly, max_iter=1000;, score=0.157 total time=   0.0s
[CV 1/2] END degree=3, gamma=0.05, kernel=poly, max_iter=1000;, score=0.154 total time=   0.0s




[CV 2/2] END degree=3, gamma=0.05, kernel=poly, max_iter=1000;, score=0.258 total time=   0.0s
[CV 1/2] END degree=4, gamma=0.01, kernel=poly, max_iter=1000;, score=0.182 total time=   0.0s
[CV 2/2] END degree=4, gamma=0.01, kernel=poly, max_iter=1000;, score=0.895 total time=   0.0s




[CV 1/2] END degree=4, gamma=0.03, kernel=poly, max_iter=1000;, score=0.166 total time=   0.0s
[CV 2/2] END degree=4, gamma=0.03, kernel=poly, max_iter=1000;, score=0.169 total time=   0.0s
[CV 1/2] END degree=4, gamma=0.05, kernel=poly, max_iter=1000;, score=0.116 total time=   0.0s




[CV 2/2] END degree=4, gamma=0.05, kernel=poly, max_iter=1000;, score=0.895 total time=   0.0s
Wall time: 22min 45s


{'mean_fit_time': array([3.40395987e+00, 3.30347121e+00, 3.41013670e+00, 3.46202064e+00,
        3.41897035e+00, 3.50600433e+00, 3.36752403e+00, 3.57396376e+00,
        3.60250533e+00, 3.67949820e+00, 3.52602315e+00, 3.68094587e+00,
        3.64252234e+00, 4.05904627e+00, 4.58667302e+00, 4.36824787e+00,
        4.13304758e+00, 4.58588278e+00, 4.46968627e+00, 3.85540378e+00,
        6.18312573e+00, 4.12070742e+01, 1.39522470e+02, 2.49472267e+02,
        6.89646006e-02, 7.04970360e-02, 7.15180635e-02, 6.55297041e-02,
        6.44949675e-02, 6.25166893e-02, 5.74972630e-02, 5.79980612e-02,
        6.00305796e-02]),
 'std_fit_time': array([7.89946318e-02, 6.53183460e-03, 5.18417358e-03, 1.01470947e-03,
        9.29706097e-02, 4.20000553e-02, 1.44802332e-02, 1.01053715e-03,
        3.05033922e-02, 5.94999790e-02, 2.49757767e-02, 1.77989006e-01,
        4.75192070e-02, 7.09087849e-02, 3.47787142e-01, 3.29125762e-01,
        1.71364069e-01, 3.79650712e-01, 1.17917538e-01, 2.09363103e-01,
     

In [13]:
clf.best_params_

{'C': 10, 'kernel': 'linear'}

### d)

In [15]:
clf = SVC(kernel='linear', C=1)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
acc = accuracy_score(y_test, y_pred)
acc

0.97866937531742

In [18]:
confusion_matrix(y_test, y_pred)

array([[5355,   26],
       [ 100,  426]], dtype=int64)

In [19]:
clf = SVC(kernel='linear', C=10)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
acc = accuracy_score(y_test, y_pred)
acc

0.9781615033011681

In [20]:
confusion_matrix(y_test, y_pred)

array([[5352,   29],
       [ 100,  426]], dtype=int64)