# SVM- Support Vector Machines
---
SVM's are frequently used for solving classication problems.
### How does it work?
SVM algorithm attempts to find a hyperplane that seperates two classes with heighest possible margin.

***Note that the points that end up on the margin are known as support vectors.***

### Soft-Margin
Sometimes it may not be possible to seperate the two classes perfectly, in such scenarios a soft margin is used where points are allowed to be misclassified. This is where the "slack" ξ values comes in. 

### Kernal trick
A kernel is a function that takes the original non-linear problem and transforms it into a linear one within the high dimentionality space.

#### RBF (Radial Basis Function) kernel.
RBF is the default kernel used by sklearn's classification algorithm.

$K(x, x') = e^{-\gamma||x-x'||^2}$

In [1]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.svm import SVC

import plotly.express as px
import plotly.graph_objects as go 


numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject



In [2]:
df = pd.read_csv('../data/games.csv', encoding='utf-8')

df['rating_difference']=df['white_rating']-df['black_rating']

df['white_win']=df['winner'].apply(lambda x: 1 if x=='white' else 0)

In [3]:
def fitting(X, y, C, gamma):
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=0)

    model = SVC(kernel='rbf', probability=True, C=C, gamma=gamma)
    clf =  model.fit(X_train, y_train)

    pred_labels_tr = model.predict(X_train)
    pred_labels_te = model.predict(X_test)

    print('----- Evaluation on Test Data -----')
    score_te = model.score(X_test, y_test)
    print('Accuracy Score: ', score_te)

    print(classification_report(y_test, pred_labels_te))
    print('--------------------------------------------------------')

    print('----- Evaluation on Training Data -----')
    score_tr = model.score(X_train, y_train)
    print('Accuracy Score: ', score_tr)
    
    print(classification_report(y_train, pred_labels_tr))
    print('--------------------------------------------------------')
    
    return X_train, X_test, y_train, y_test, clf

In [4]:
def Plot_3D(X, X_test, y_test, clf):
    mesh_size = 5
    margin = 1

    x_min, x_max = X.iloc[:, 0].fillna(X.mean()).min() - margin, X.iloc[:, 0].fillna(X.mean()).max() + margin
    y_min, y_max = X.iloc[:, 1].fillna(X.mean()).min() - margin, X.iloc[:, 1].fillna(X.mean()).max() + margin
    xrange = np.arange(x_min, x_max, mesh_size)
    yrange = np.arange(y_min, y_max, mesh_size)
    xx, yy = np.meshgrid(xrange, yrange)
            
    Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]
    Z = Z.reshape(xx.shape)

    fig = px.scatter_3d(x=X_test['rating_difference'], y=X_test['turns'], z=y_test, 
                     opacity=0.8, color_discrete_sequence=['black'])

    fig.update_layout(paper_bgcolor = 'white',
                      scene = dict(xaxis=dict(backgroundcolor='white',
                                              color='black',
                                              gridcolor='#f0f0f0'),
                                   yaxis=dict(backgroundcolor='white',
                                              color='black',
                                              gridcolor='#f0f0f0'
                                              ),
                                   zaxis=dict(backgroundcolor='lightgrey',
                                              color='black', 
                                              gridcolor='#f0f0f0', 
                                              )))
    fig.update_traces(marker=dict(size=1))

    fig.add_traces(go.Surface(x=xrange, y=yrange, z=Z, name='SVM Prediction',
                              colorscale='RdBu', showscale=False, 
                              contours = {"z": {"show": True, "start": 0.2, "end": 0.8, "size": 0.05}}))
    fig.show()

### Model with default values for C and Gamma

In [5]:
X = df[['rating_difference', 'turns']]
y = df['white_win'].values

X_train, X_test, y_train, y_test, clf = fitting(X, y, 1, 'scale')

----- Evaluation on Test Data -----
Accuracy Score:  0.6478469495855923
              precision    recall  f1-score   support

           0       0.63      0.73      0.67      8012
           1       0.68      0.57      0.62      8035

    accuracy                           0.65     16047
   macro avg       0.65      0.65      0.65     16047
weighted avg       0.65      0.65      0.65     16047

--------------------------------------------------------
----- Evaluation on Training Data -----
Accuracy Score:  0.6454749439042633
              precision    recall  f1-score   support

           0       0.64      0.72      0.67      2045
           1       0.66      0.57      0.61      1966

    accuracy                           0.65      4011
   macro avg       0.65      0.64      0.64      4011
weighted avg       0.65      0.65      0.64      4011

--------------------------------------------------------


In [6]:
Plot_3D(X, X_test, y_test, clf)


X does not have valid feature names, but SVC was fitted with feature names



### SVM model 2- Gamma = 0.1

In [7]:
X = df[['rating_difference','turns']]
y = df['white_win'].values

X_train, X_test, y_train, y_test, clf = fitting(X, y, 1, 0.1)

Plot_3D(X, X_test, y_test, clf)

----- Evaluation on Test Data -----
Accuracy Score:  0.5643422446563221
              precision    recall  f1-score   support

           0       0.55      0.66      0.60      8012
           1       0.58      0.47      0.52      8035

    accuracy                           0.56     16047
   macro avg       0.57      0.56      0.56     16047
weighted avg       0.57      0.56      0.56     16047

--------------------------------------------------------
----- Evaluation on Training Data -----
Accuracy Score:  0.906008476689105
              precision    recall  f1-score   support

           0       0.91      0.91      0.91      2045
           1       0.91      0.90      0.90      1966

    accuracy                           0.91      4011
   macro avg       0.91      0.91      0.91      4011
weighted avg       0.91      0.91      0.91      4011

--------------------------------------------------------



X does not have valid feature names, but SVC was fitted with feature names



In [8]:
X = df[['rating_difference','turns']]
y = df['white_win'].values

X_train, X_test, y_train, y_test, clf = fitting(X, y, 1, 0.000001)

Plot_3D(X, X_test, y_test, clf)

----- Evaluation on Test Data -----
Accuracy Score:  0.6489686545771796
              precision    recall  f1-score   support

           0       0.63      0.72      0.67      8012
           1       0.67      0.58      0.62      8035

    accuracy                           0.65     16047
   macro avg       0.65      0.65      0.65     16047
weighted avg       0.65      0.65      0.65     16047

--------------------------------------------------------
----- Evaluation on Training Data -----
Accuracy Score:  0.6449763151333832
              precision    recall  f1-score   support

           0       0.64      0.71      0.67      2045
           1       0.66      0.58      0.62      1966

    accuracy                           0.64      4011
   macro avg       0.65      0.64      0.64      4011
weighted avg       0.65      0.64      0.64      4011

--------------------------------------------------------



X does not have valid feature names, but SVC was fitted with feature names

