# HyperParameters for Support Vector Machine (SVM)

In python’s sklearn implementation of the Support Vector Classification model, there is a list of different hyperparameters. 
</br>
The complete list in the sklearn documentation <a herf src ="https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC">here</a>.
</br>
The most critical hyperparameters for SVM are `kernel`, `C`, and `gamma`.

 - **`kernel`** function transforms the training dataset into higher dimensions to make it linearly separatable.</br>
The default kernel function for the python implementation of the support vector classifier is the Radial Basis Function, which is usually referred to as rbf. </br>
The kernel function can take other values such as ***linear***, ***poly***, ***rbf***, ***sigmoid***, ***precomputed***, or ***callable***.
</br></br>
- **`C`** is the l2 regularization parameter.</br>
The value of C is inversely proportional to the strength of the regularization.</br>
For more about regularization, go to <a herf src ="https://pub.towardsai.net/lasso-l1-vs-ridge-l2-vs-elastic-net-regularization-for-classification-model-409c3d86f6e9">tutorial</a> on LASSO (L1) Vs Ridge (L2) Vs Elastic Net Regularization For Classification Model.</br>
    - **When `C` is small**, the penalty for misclassification is small, and the strength of the regularization is large. So a decision boundary with a large margin will be selected.
    - **When `C` is large**, the penalty for misclassification is large, and the strength of the regularization is small. A decision boundary with a small margin will be selected to reduce misclassifications.
</br></br>
- **`gamma`** is the kernel coefficient for *rbf*, *poly*, and *sigmoid*.</br>
It can be seen as the inverse of the support vector influence radius.</br>
The gamma parameter highly impacts the model performance.</br>
Gamma can take the value of ***scale***, ***auto***, or a ***float value***.</br>
The default value for the python sklearn implementation is scale since version 0.22.</br>
    - **When `gamma` is small**, the support vector influence radius is high. If the gamma value is too small, the radius of the support vectors covers the whole training dataset, and the pattern of the data will not be captured.
    - **When `gamma` is large**, the support vector influence radius is low. If the gamma value is too large, the support vector radius is too small to utilize C to prevent overfitting.

In [5]:
###### Step 1: Support Vector Machine (SVM) algorithm
# No code in this step

###### Step 2: Support Vector Machine (SVM) Hyperparameters
# No code in this step

###### Step 3: Import Libraries
# Dataset
from sklearn import datasets
# Data processing
import pandas as pd
import numpy as np
# Standardize the data
from sklearn.preprocessing import StandardScaler
# Modeling 
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
# Hyperparameter tuning
from sklearn.model_selection import StratifiedKFold, GridSearchCV, RandomizedSearchCV, cross_val_score
from hyperopt import tpe, STATUS_OK, Trials, hp, fmin, STATUS_OK, space_eval

In [6]:
###### Step 4: Read Data
# Load the breast cancer dataset
data = datasets.load_breast_cancer()
# Put the data in pandas dataframe format
df = pd.DataFrame(data=data.data, columns=data.feature_names)
df['target']=data.target
# Check the data information
df.info()
# Check the target value distribution
df['target'].value_counts(normalize=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 31 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   mean radius              569 non-null    float64
 1   mean texture             569 non-null    float64
 2   mean perimeter           569 non-null    float64
 3   mean area                569 non-null    float64
 4   mean smoothness          569 non-null    float64
 5   mean compactness         569 non-null    float64
 6   mean concavity           569 non-null    float64
 7   mean concave points      569 non-null    float64
 8   mean symmetry            569 non-null    float64
 9   mean fractal dimension   569 non-null    float64
 10  radius error             569 non-null    float64
 11  texture error            569 non-null    float64
 12  perimeter error          569 non-null    float64
 13  area error               569 non-null    float64
 14  smoothness error         5

1    0.627417
0    0.372583
Name: target, dtype: float64

In [7]:
###### Step 5: Train Test Split
# Train test split
X_train, X_test, y_train, y_test = train_test_split(df[df.columns.difference(['target'])], df['target'], test_size=0.2, random_state=42)
# Check the number of records in training and testing dataset.
print(f'The training dataset has {len(X_train)} records.')
print(f'The testing dataset has {len(X_test)} records.')

The training dataset has 455 records.
The testing dataset has 114 records.


In [8]:
###### Step 6: Standardization
# Initiate scaler
sc = StandardScaler()
# Standardize the training dataset
X_train_transformed = pd.DataFrame(sc.fit_transform(X_train),index=X_train.index, columns=X_train.columns)
# Standardized the testing dataset
X_test_transformed = pd.DataFrame(sc.transform(X_test),index=X_test.index, columns=X_test.columns)
# Summary statistics after standardization
X_train_transformed.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
area error,455.0,6.24653e-17,1.001101,-0.705091,-0.464164,-0.325347,0.077435,10.641841
compactness error,455.0,-2.395154e-15,1.001101,-1.258102,-0.694353,-0.280607,0.358304,5.905671
concave points error,455.0,3.455112e-16,1.001101,-1.891775,-0.668493,-0.126279,0.437566,6.504667
concavity error,455.0,2.479091e-16,1.001101,-1.022218,-0.55134,-0.207836,0.303371,11.310294
fractal dimension error,455.0,5.085065e-16,1.001101,-1.050856,-0.573964,-0.218908,0.24534,9.34587
mean area,455.0,-2.537653e-16,1.001101,-1.365036,-0.660205,-0.289597,0.319339,5.208312
mean compactness,455.0,1.011157e-15,1.001101,-1.607228,-0.777087,-0.24134,0.528128,3.964311
mean concave points,455.0,5.817081e-16,1.001101,-1.26991,-0.734905,-0.391123,0.673757,4.022271
mean concavity,455.0,9.857804e-16,1.001101,-1.119899,-0.750539,-0.344646,0.547387,4.256736
mean fractal dimension,455.0,-3.36727e-15,1.001101,-1.776889,-0.709792,-0.177285,0.464223,4.815921


In [9]:
###### Step 7: Support Vector Machine (SVM) Default Hyperparameters
# Check default values
svc = SVC()
params = svc.get_params()
params_df = pd.DataFrame(params, index=[0])
params_df.T
# Run model
svc.fit(X_train_transformed, y_train)
# Accuracy score
print(f'The accuracy score of the model is {svc.score(X_test_transformed, y_test):.4f}')

The accuracy score of the model is 0.9825


In [10]:
###### Step 8: Hyperparameter Tuning Using Grid Search
# List of C values
C_range = np.logspace(-1, 1, 3)
print(f'The list of values for C are {C_range}')
# List of gamma values
gamma_range = np.logspace(-1, 1, 3)
print(f'The list of values for gamma are {gamma_range}')
# Define the search space
param_grid = { 
    # Regularization parameter.
    "C": C_range,
    # Kernel type
    "kernel": ['rbf', 'poly'],
    # Gamma is the Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.
    "gamma": gamma_range.tolist()+['scale', 'auto']
    }
# Set up score
scoring = ['accuracy']
# Set up the k-fold cross-validation
kfold = StratifiedKFold(n_splits=3, shuffle=True, random_state=0)
# Define grid search
grid_search = GridSearchCV(estimator=svc, 
                           param_grid=param_grid, 
                           scoring=scoring, 
                           refit='accuracy', 
                           n_jobs=-1, 
                           cv=kfold, 
                           verbose=0)
# Fit grid search
grid_result = grid_search.fit(X_train_transformed, y_train)
# Print grid search summary
grid_result
# Print the best accuracy score for the training dataset
print(f'The best accuracy score for the training dataset is {grid_result.best_score_:.4f}')
# Print the hyperparameters for the best score
print(f'The best hyperparameters are {grid_result.best_params_}')
# Print the best accuracy score for the testing dataset
print(f'The accuracy score for the testing dataset is {grid_search.score(X_test_transformed, y_test):.4f}')

The list of values for C are [ 0.1  1.  10. ]
The list of values for gamma are [ 0.1  1.  10. ]
The best accuracy score for the training dataset is 0.9693
The best hyperparameters are {'C': 1.0, 'gamma': 'scale', 'kernel': 'rbf'}
The accuracy score for the testing dataset is 0.9825


In [11]:
###### Step 9: Hyperparameter Tuning Using Random Search
# List of C values
C_range = np.logspace(-10, 10, 21)
print(f'The list of values for C are {C_range}')
# List of gamma values
gamma_range = np.logspace(-10, 10, 21)
print(f'The list of values for gamma are {gamma_range}')
# Define the search space
param_grid = { 
    # Regularization parameter.
    "C": C_range,
    # Kernel type
    "kernel": ['rbf', 'poly'],
    # Gamma is the Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.
    "gamma": gamma_range
    }
# Set up score
scoring = ['accuracy']
# Set up the k-fold cross-validation
kfold = StratifiedKFold(n_splits=3, shuffle=True, random_state=0)
# Define random search
random_search = RandomizedSearchCV(estimator=svc, 
                           param_distributions=param_grid, 
                           n_iter=100,
                           scoring=scoring, 
                           refit='accuracy', 
                           n_jobs=-1, 
                           cv=kfold, 
                           verbose=0)
# Fit grid search
random_result = random_search.fit(X_train_transformed, y_train)
# Print grid search summary
random_result
# Print the best accuracy score for the training dataset
print(f'The best accuracy score for the training dataset is {random_result.best_score_:.4f}')
# Print the hyperparameters for the best score
print(f'The best hyperparameters are {random_result.best_params_}')
# Print the best accuracy score for the testing dataset
print(f'The accuracy score for the testing dataset is {random_search.score(X_test_transformed, y_test):.4f}')

The list of values for C are [1.e-10 1.e-09 1.e-08 1.e-07 1.e-06 1.e-05 1.e-04 1.e-03 1.e-02 1.e-01
 1.e+00 1.e+01 1.e+02 1.e+03 1.e+04 1.e+05 1.e+06 1.e+07 1.e+08 1.e+09
 1.e+10]
The list of values for gamma are [1.e-10 1.e-09 1.e-08 1.e-07 1.e-06 1.e-05 1.e-04 1.e-03 1.e-02 1.e-01
 1.e+00 1.e+01 1.e+02 1.e+03 1.e+04 1.e+05 1.e+06 1.e+07 1.e+08 1.e+09
 1.e+10]
The best accuracy score for the training dataset is 0.9736
The best hyperparameters are {'kernel': 'rbf', 'gamma': 0.001, 'C': 100.0}
The accuracy score for the testing dataset is 0.9825


In [12]:
###### Step 10: Hyperparameter Tuning Using Bayesian Optimization
# Space
space = {
    'C' : hp.choice('C', C_range),
    'gamma' : hp.choice('gamma', gamma_range),
    'kernel' : hp.choice('kernel', ['rbf', 'poly'])
}
# Set up the k-fold cross-validation
kfold = StratifiedKFold(n_splits=3, shuffle=True, random_state=0)
# Objective function
def objective(params):
    
    svc = SVC(**params)
    scores = cross_val_score(svc, X_train_transformed, y_train, cv=kfold, scoring='accuracy', n_jobs=-1)
    # Extract the best score
    best_score = max(scores)
    # Loss must be minimized
    loss = - best_score
    # Dictionary with information for evaluation
    return {'loss': loss, 'params': params, 'status': STATUS_OK}
# Trials to track progress
bayes_trials = Trials()
# Optimize
best = fmin(fn = objective, space = space, algo = tpe.suggest, max_evals = 100, trials = bayes_trials)
# Print the index of the best parameters
print(best)
# Print the values of the best parameters
print(space_eval(space, best))
# Train model using the best parameters
svc_bo = SVC(C=space_eval(space, best)['C'], gamma=space_eval(space, best)['gamma'], kernel=space_eval(space, best)['kernel']).fit(X_train_transformed,y_train)
# Print the best accuracy score for the testing dataset
print(f'The accuracy score for the testing dataset is {svc_bo.score(X_test_transformed, y_test):.4f}')

100%|██████████| 100/100 [00:03<00:00, 32.84trial/s, best loss: -0.9801324503311258]
{'C': 19, 'gamma': 2, 'kernel': 0}
{'C': 1000000000.0, 'gamma': 1e-08, 'kernel': 'rbf'}
The accuracy score for the testing dataset is 0.9737


<a herf src ="https://medium.com/grabngoinfo/support-vector-machine-svm-hyperparameter-tuning-in-python-a65586289bcb">Source</a>