In [1]:
# Load libraries
import numpy as np;
import pandas as pd

from timeit import default_timer

from sklearn.model_selection import train_test_split

# Support Vector Classifier

## Intuición

- Es un clasificador en el que el objetivo es construir un **hiperplano** óptimo que separe las clases.


- El hiperplano óptimo es el que maximiza las distancias del punto más cercano de cada clase.


- Esto reduce la posibilidad de clasificar erróneamente un nuevo patrón.

![imagen-2.png](attachment:imagen-2.png)

- Se garantiza que la solución al problema de optimización de las SVM **existe** y es **única**.


- La existencia está garantizada por ser la función a minimizar cuadrática.


- Además, la unicidad de la solución también es segura por tratarse de un problema de optimización convexo.


- Esta es una de las principales ventajas de las SVM frente a modelos como las Redes Neuronales.

## Soft Margin

Cuando los datos no son perfectos y tienen ruido o el problema es complejo, no podemos encontrar un separador lineal perfecto.

Incluso si podemos, puede no ser recomendable debido al riesgo de sobreajuste.

![imagen.png](attachment:imagen.png)

## Kernel trick

Consiste en pasar los datos a un espacio de dimensiones superiores en el que se construye un hiperplano para separarlos.

![imagen-2.png](attachment:imagen-2.png)

Permite separadores no lineales en el espacio original.

![imagen.png](attachment:imagen.png)

## Key Hyperparameters

![imagen-6.png](attachment:imagen-6.png)

In [12]:
# [1] Import model
from sklearn.svm import SVC as model_constructor
?model_constructor

## Data

Ya sabes que va a ser iris...

In [3]:
from sklearn.datasets import load_iris as load_data;

In [4]:
data = load_data();
X = data.data;
y = data.target;

In [5]:
perc_values = [0.7, 0.15, 0.15];
X_train, X_valtest, y_train, y_valtest = train_test_split(X, y, stratify = y, test_size=perc_values[1] + perc_values[2], random_state=1);
X_val, X_test, y_val, y_test = train_test_split(X_valtest, y_valtest, stratify = y_valtest, test_size= perc_values[2] / (perc_values[1] + perc_values[2]), random_state=1)

## Grid Search

<img src="../figures/grid.jpg">

<img src="../figures/grid.bmp">

Definamos la cuadrícula que vamos a utilizar.

In [31]:
# SVM
C_values = [1e-03, 1, 1e03];
gamma_values = [1e-03, 1, 1e03];

params_grid = {'C': C_values,
               'gamma': gamma_values} 

Obtener el número total de combinaciones.

In [32]:
n = len(params_grid['C'])*len(params_grid['gamma'])
print(str(n)+ ' iterations of SVC')

9 iterations of SVC


Utilizaremos **AUC** como métrica de evaluación. Este es un problema multiclase por lo que tenemos que utilizar el argumento *multi_class*.

In [33]:
# 2) Import metric
from sklearn.metrics import roc_auc_score as metric

### Conjunto de Validación Fijo

In [34]:
num_iter = 1;
grid_results = pd.DataFrame(columns = ('C',
                                       'gamma',
                                       'auc_train', 
                                       'auc_val',
                                       'time'))

for C in params_grid['C']:
    for gamma in params_grid['gamma']:
        
                    # Start time
                    start_time = default_timer()

                    # Print trace
                    print('Iteracion = ' + str(num_iter))

                    # [3] Define model
                    model = model_constructor(C = C, 
                                              gamma = gamma,
                                              probability = True, 
                                              random_state = 0) # Probability = True!!!

                    # [4] Train model
                    model.fit(X_train, y_train)

                    # [5] Predict
                    pred_train = model.predict_proba(X_train) # predict_proba!
                    pred_val = model.predict_proba(X_val) # predict_proba!

                    # [6] Compute metric
                    metric_train = metric(y_train, pred_train, multi_class = 'ovo')
                    metric_val = metric(y_val, pred_val, multi_class = 'ovo')

                    # Computational time
                    time = default_timer() - start_time

                    # print error
                    print('AUC train = %.2f - AUC validation = %.2f. Time spend = %.2f.' 
                          % (metric_train, metric_val, time))         

                    # Save iteration results
                    grid_results.loc[num_iter]=[C,
                                                gamma,
                                                metric_train,
                                                metric_val,
                                                time] 
                    num_iter += 1

print('Grid Search Total Computational Time: ', np.sum(grid_results.time.values)) 

Iteracion = 1
AUC train = 0.17 - AUC validation = 0.17. Time spend = 0.01.
Iteracion = 2
AUC train = 0.15 - AUC validation = 0.17. Time spend = 0.01.
Iteracion = 3
AUC train = 0.00 - AUC validation = 0.30. Time spend = 0.01.
Iteracion = 4
AUC train = 0.82 - AUC validation = 0.83. Time spend = 0.01.
Iteracion = 5
AUC train = 1.00 - AUC validation = 1.00. Time spend = 0.00.
Iteracion = 6
AUC train = 0.33 - AUC validation = 0.47. Time spend = 0.01.
Iteracion = 7
AUC train = 1.00 - AUC validation = 1.00. Time spend = 0.00.
Iteracion = 8
AUC train = 1.00 - AUC validation = 0.99. Time spend = 0.00.
Iteracion = 9
AUC train = 0.67 - AUC validation = 0.47. Time spend = 0.01.
Grid Search Total Computational Time:  0.05751540000073874


Veamos los resultados

In [35]:
grid_results

Unnamed: 0,C,gamma,auc_train,auc_val,time
1,0.001,0.001,0.167891,0.173469,0.007167
2,0.001,1.0,0.148844,0.166667,0.010006
3,0.001,1000.0,0.0,0.30102,0.006927
4,1.0,0.001,0.822857,0.829932,0.005857
5,1.0,1.0,0.998639,1.0,0.004586
6,1.0,1000.0,0.333333,0.474915,0.007366
7,1000.0,0.001,0.998367,1.0,0.004086
8,1000.0,1.0,1.0,0.993197,0.00451
9,1000.0,1000.0,0.666667,0.474915,0.00701


Gran diferencia en términos de métrica. **SVM es muy sensible a la elección de los hiperparámetros**.

Ahora seleccionamos el mejor modelo.

In [37]:
grid_results = grid_results.sort_values(by = ['auc_val', 'auc_train', 'time'], ascending = [False, False, True])
grid_results

Unnamed: 0,C,gamma,auc_train,auc_val,time
5,1.0,1.0,0.998639,1.0,0.004586
7,1000.0,0.001,0.998367,1.0,0.004086
8,1000.0,1.0,1.0,0.993197,0.00451
4,1.0,0.001,0.822857,0.829932,0.005857
9,1000.0,1000.0,0.666667,0.474915,0.00701
6,1.0,1000.0,0.333333,0.474915,0.007366
3,0.001,1000.0,0.0,0.30102,0.006927
1,0.001,0.001,0.167891,0.173469,0.007167
2,0.001,1.0,0.148844,0.166667,0.010006


In [38]:
best_model = grid_results.iloc[0]
best_model

C            1.000000
gamma        1.000000
auc_train    0.998639
auc_val      1.000000
time         0.004586
Name: 5, dtype: float64

### Cross-Validation

In [39]:
from sklearn.model_selection import GridSearchCV
?GridSearchCV

In [40]:
# Define grid
grid_cv = GridSearchCV(model_constructor(),
                     param_grid=params_grid,
                     n_jobs=2, ## Paralellization!
                     cv = 5) # Number of folds

En este caso, no necesitamos un conjunto de validación fijo, por lo que combinaremos el training y la validación.

In [41]:
# Run grid
start_time = default_timer()

grid_cv.fit(np.concatenate((X_train, X_val), axis = 0), np.concatenate((y_train, y_val), axis = 0))

stop_time = default_timer()
print('CV Grid Search Total Computational Time: : ', stop_time - start_time) 

CV Grid Search Total Computational Time: :  0.7250468000002002


In [42]:
grid_cv.best_params_

{'C': 1, 'gamma': 1}

In [43]:
grid_cv.best_score_

0.9683076923076923

## Modelo Final

La validación ha cumplido su propósito, combinémosla con el training para obtener más datos de entrenamiento.

In [44]:
print('Old train data size = ' + str(X_train.shape))
print('Old train target size = ' + str(y_train.shape))

# Combine train and validación
X_train = np.concatenate((X_train, X_val), axis = 0)
y_train = np.concatenate((y_train, y_val), axis = 0)

print('New train data size = ' + str(X_train.shape))
print('New train target size = ' + str(y_train.shape))

Old train data size = (105, 4)
Old train target size = (105,)
New train data size = (127, 4)
New train target size = (127,)


In [45]:
# [3] Define model
model = model_constructor(C = best_model.C,
                          gamma = best_model.gamma,
                          probability = True,
                          random_state = 0) # probability = True!!!
            
# [4] Train model
model.fit(X_train, y_train)
            
# [5] Predict
pred_train = model.predict_proba(X_train)
pred_test = model.predict_proba(X_test)
            
# [6] Compute metric
metric_train = metric(y_train, pred_train, multi_class = 'ovo')
metric_test = metric(y_test, pred_test, multi_class = 'ovo')

    

In [46]:
# print error
print('AUC train = %.2f - AUC test = %.2f' 
      % (metric_train, metric_test))

AUC train = 1.00 - AUC test = 1.00
