### Support Vector Machines

#### Table of Contents <a name='top'></a>

- [Load Modules and Set Notebook Properties](#modules)
- [Define Path and Load Data](#load)
- [Inpsect Data](#inspect)
- [Prepare and Clean Data](#prepare)
- [Scale Values](#scale)
- [Fit Model](#fit)
- [Hyperparamter Tuning](#hyperparameter)
- [Evaluate Results](#evaluate)

[go to end](#end)

#### Load Modules and Set Notebook Properties <a name='modules'></a>

In [17]:
import os
import sys
import warnings
import numpy as np
import pandas as pd
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler, MinMaxScaler, Normalizer
from sklearn.metrics import accuracy_score

In [3]:
warnings.filterwarnings('ignore')
pd.set_option("display.max_columns", None)

#### Define Path and Load Data  <a name='load'></a> 

In [4]:
INPUT_PATH = 'raw_data_source'
OUTPUT_PATH = 'outputs'

Insert note on the data source

In [5]:
train_data = pd.read_csv(os.path.join(INPUT_PATH, 'titanic_train_data.csv'))
test_data =  pd.read_csv(os.path.join(INPUT_PATH, 'titanic_test_data.csv'))

#### Inspect Data <a name='inspect'></a> 

In [6]:
print(f'Shape of training data : {train_data.shape}')
print(f'Shape of testing data : {test_data.shape}')

Shape of training data : (712, 25)
Shape of testing data : (179, 25)


In [8]:
train_data.sample(5)

Unnamed: 0,Survived,Age,Fare,Pclass_1,Pclass_2,Pclass_3,Sex_female,Sex_male,SibSp_0,SibSp_1,SibSp_2,SibSp_3,SibSp_4,SibSp_5,SibSp_8,Parch_0,Parch_1,Parch_2,Parch_3,Parch_4,Parch_5,Parch_6,Embarked_C,Embarked_Q,Embarked_S
160,1,29.699118,12.35,0,1,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0
685,0,16.0,8.05,0,0,1,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1
705,1,16.0,7.7333,0,0,1,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0
512,0,19.0,7.8958,0,0,1,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1
670,1,40.0,39.0,0,1,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1


In [9]:
test_data.sample(5)

Unnamed: 0,Survived,Age,Fare,Pclass_1,Pclass_2,Pclass_3,Sex_female,Sex_male,SibSp_0,SibSp_1,SibSp_2,SibSp_3,SibSp_4,SibSp_5,SibSp_8,Parch_0,Parch_1,Parch_2,Parch_3,Parch_4,Parch_5,Parch_6,Embarked_C,Embarked_Q,Embarked_S
124,1,33.0,15.85,0,0,1,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,1
144,0,29.699118,7.25,0,0,1,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1
120,1,36.0,135.6333,1,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0
98,1,17.0,57.0,1,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1
21,0,36.0,12.875,0,1,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0


#### Prepare and Clean Data <a name='prepare'></a> 

In [10]:
# seperate the independent and target variable on training, and test data
X_train = train_data.drop(columns=['Survived'],axis=1)
y_train = train_data['Survived']
X_test = test_data.drop(columns=['Survived'],axis=1)
y_test = test_data['Survived']

#### Scale Values <a name='scale'></a> 


Insert explanation on why the fitting of the scaler should only be done on the training set. 

In [27]:
def scale_values(X_train, X_test, scaler='standard'):
    
    scaler_dict = {'standard': StandardScaler(), 
                    'minmax': MinMaxScaler(), 
                    'normal': Normalizer()}
    if scaler not in scaler_dict.keys():
        raise ValueError("Enter a valid value for scaler! Choose between 'standard', 'minmax', 'normal'.")
    else:
        scl = scaler_dict[scaler]
        X_train = scl.fit_transform(X_train)
        X_test = scl.transform(X_test) 
        return X_train, X_test

#### Fit a Model <a name='fit'></a> 

In [31]:
def svc_plain(X_train, y_train):
    clf = SVC()
    clf.fit(X_train, y_train)
    return clf

with Standard scaling

In [40]:
X_train, X_test = scale_values(X_train, X_test, scaler='standard')
clf = svc_plain(X_train, y_train)
y_pred = clf.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print('Accuracy score on train dataset : {:0.3f}'.format(acc))

Accuracy score on train dataset : 0.821


with Normal scaling

In [37]:
X_train, X_test = scale_values(X_train, X_test, scaler='normal')
clf = svc_plain(X_train, y_train)
y_pred = clf.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print('Accuracy score on train dataset : {:0.3f}'.format(acc))

Accuracy score on train dataset : 0.821


with MinMax scaling

In [38]:
X_train, X_test = scale_values(X_train, X_test, scaler='minmax')
clf = svc_plain(X_train, y_train)
y_pred = clf.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print('Accuracy score on train dataset : {:0.3f}'.format(acc))

Accuracy score on train dataset : 0.810


No scaling

In [39]:
clf = svc_plain(X_train, y_train)
y_pred = clf.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print('Accuracy score on train dataset : {:0.3f}'.format(acc))

Accuracy score on train dataset : 0.810


#### Hyperparameter Optimization

#### Predict <a name='predict'></a> 

[go to top](#top)

--end--
<a name='end'></a> 