### Support Vector Machines (PART 2)

#### Table of Contents <a name='top'></a>

- [Load Modules and Set Notebook Properties](#modules)
- [Define Path and Load Data](#load)
- [Inpsect Data](#inspect)
- [Prepare and Clean Data](#prepare)
- [Scale Values](#scale)
- [Fit Model](#fit)
- [Hyperparamter Tuning](#hyperparameter)
- [Evaluate Results](#evaluate)

[go to end](#end)

#### Load Modules and Set Notebook Properties <a name='modules'></a>

In [1]:
import os
import sys
import warnings
import numpy as np
import pandas as pd
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler, MinMaxScaler, Normalizer
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

In [2]:
warnings.filterwarnings('ignore')
pd.set_option("display.max_columns", None)

#### Define Path and Load Data  <a name='load'></a> 

In [3]:
INPUT_PATH = 'raw_data_source'
OUTPUT_PATH = 'outputs'

Insert note on the data source

In [4]:
train_data = pd.read_csv(os.path.join(INPUT_PATH, 'loan_train_data.csv'))
test_data =  pd.read_csv(os.path.join(INPUT_PATH, 'loan_test_data.csv'))

#### Inspect Data <a name='inspect'></a> 

In [5]:
print(f'Shape of training data : {train_data.shape}')
print(f'Shape of testing data : {test_data.shape}')

Shape of training data : (437, 11)
Shape of testing data : (177, 11)


In [6]:
train_data.sample(5)

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Property_Area,Loan_Status
245,334,1,1,0,1,63337,0,490,180,2,1
407,569,0,0,0,0,2378,0,9,360,2,0
157,214,1,1,3,1,5703,0,130,360,1,1
220,295,1,1,0,0,2383,3334,172,360,0,1
89,125,1,1,0,0,4300,2014,194,360,1,1


In [7]:
test_data.sample(5)

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Property_Area,Loan_Status
25,88,1,1,0,0,2500,2118.0,104,360,0,1
79,318,1,1,0,0,2058,2134.0,88,360,2,1
80,319,0,0,1,0,3541,0.0,112,360,0,1
84,329,0,1,0,0,4333,2451.0,110,360,2,0
37,131,1,0,0,1,20166,0.0,650,480,2,1


#### Prepare and Clean Data <a name='prepare'></a> 

In [8]:
# seperate the independent and target variable on training, and test data
X_train = train_data.drop(columns=['Loan_ID', 'Loan_Status'],axis=1)
y_train = train_data['Loan_Status']
X_test = test_data.drop(columns=['Loan_ID', 'Loan_Status'],axis=1)
y_test = test_data['Loan_Status']

#### Scale Values <a name='scale'></a> 


Insert explanation on why the fitting of the scaler should only be done on the training set. 

In [9]:
def scale_values(X_train, X_test, scaler='standard'):
    
    scaler_dict = {'standard': StandardScaler(), 
                    'minmax': MinMaxScaler(), 
                    'normal': Normalizer()}
    if scaler not in scaler_dict.keys():
        raise ValueError("Enter a valid value for scaler! Choose between 'standard', 'minmax', 'normal'.")
    else:
        scl = scaler_dict[scaler]
        X_train = scl.fit_transform(X_train)
        X_test = scl.transform(X_test) 
        return X_train, X_test

#### Fit a Model <a name='fit'></a> 

In [10]:
def svc_plain(X_train, y_train):
    clf = SVC()
    clf.fit(X_train, y_train)
    return clf

with Standard scaling

In [11]:
X_train_, X_test_ = scale_values(X_train, X_test, scaler='standard')
clf = svc_plain(X_train_, y_train)
y_pred = clf.predict(X_test_)
acc = accuracy_score(y_test, y_pred)
print('Accuracy score on train dataset : {:0.5f}'.format(acc))

Accuracy score on train dataset : 0.68927


with Normal scaling

In [12]:
X_train_, X_test_ = scale_values(X_train, X_test, scaler='normal')
clf = svc_plain(X_train_, y_train)
y_pred = clf.predict(X_test_)
acc = accuracy_score(y_test, y_pred)
print('Accuracy score on train dataset : {:0.5f}'.format(acc))

Accuracy score on train dataset : 0.69492


with MinMax scaling

In [13]:
X_train_, X_test_ = scale_values(X_train, X_test, scaler='minmax')
clf = svc_plain(X_train_, y_train)
y_pred = clf.predict(X_test_)
acc = accuracy_score(y_test, y_pred)
print('Accuracy score on train dataset : {:0.5f}'.format(acc))

Accuracy score on train dataset : 0.69492


No scaling

In [14]:
clf = svc_plain(X_train, y_train)
y_pred = clf.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print('Accuracy score on train dataset : {:0.3f}'.format(acc))

Accuracy score on train dataset : 0.695


#### Hyperparameter Optimization Using GridSearch CV

[link](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC)

In [15]:
def svc_with_hyperparameter_tuning(X_train, y_train):
    
    # defining parameter range
    param_grid = {'C': [0.1, 1, 10, 100, 1000], 
                  'gamma': [1, 0.1, 0.01, 0.001, 0.0001],
                  'kernel': ['rbf']} 
    grid = GridSearchCV(SVC(), param_grid, refit=True, verbose=3)
    grid.fit(X_train, y_train)
    
    return grid

In [16]:
X_train_, X_test_ = scale_values(X_train, X_test, scaler='standard')
grid = svc_with_hyperparameter_tuning(X_train, y_train)
y_pred = grid.predict(X_test)

Fitting 5 folds for each of 25 candidates, totalling 125 fits
[CV 1/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.682 total time=   0.0s
[CV 2/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.682 total time=   0.0s
[CV 3/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.690 total time=   0.0s
[CV 4/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.690 total time=   0.0s
[CV 5/5] END ........C=0.1, gamma=1, kernel=rbf;, score=0.678 total time=   0.0s
[CV 1/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.682 total time=   0.0s
[CV 2/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.682 total time=   0.0s
[CV 3/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.690 total time=   0.0s
[CV 4/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.690 total time=   0.0s
[CV 5/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.678 total time=   0.0s
[CV 1/5] END .....C=0.1, gamma=0.01, kernel=rbf;, score=0.682 total time=   0.0s
[CV 2/5] END .....C=0.1, gamma=0.01, kernel=rbf

In [17]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.43      0.06      0.10        54
           1       0.70      0.97      0.81       123

    accuracy                           0.69       177
   macro avg       0.56      0.51      0.46       177
weighted avg       0.62      0.69      0.59       177



#### Evaluate Results <a name='evaluate'></a> 

[go to top](#top)

--end--
<a name='end'></a> 