### Training a Model Locally
Mico Ellerich M. Comia

---

- SELECT 2 MACHINE LEARNING ALGORITHMS 
- FOR EACH OF THE ALGORITHMS
    - PERFORM TRAINING ON THE TRAINING DATASET
    - EVALUATE ON THE VALIDATION DATASET
    - TEST THE TRAINED MODEL ON THE TEST SET
    - SAVE THE MODEL USING JOBLIB (OR ALTERNATIVE)
- COMPARE THE “PERFORMANCE” OF THE 2 MODELS USING THE EVALUATION METRICS

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn
import warnings
import joblib
import time

from sklearn import metrics

warnings.filterwarnings(action="ignore")

### I. Import dataset splits
---

In [2]:
X_train =  pd.read_csv('data/X_train.csv')
X_test =  pd.read_csv('data/X_test.csv') 
X_val = pd.read_csv('data/X_val.csv') 
y_train =  pd.read_csv('data/y_train.csv') 
y_test =  pd.read_csv('data/y_test.csv') 
y_val = pd.read_csv('data/y_val.csv') 

### II. Select 2 ML Algorithms
    - PERFORM TRAINING ON THE TRAINING DATASET
    - EVALUATE ON THE VALIDATION DATASET
    - TEST THE TRAINED MODEL ON THE TEST SET
    - SAVE THE MODEL USING JOBLIB (OR ALTERNATIVE)
---

#### Training on the training dataset

In [4]:
from sklearn.svm import SVC  

svm_model = SVC(kernel='rbf')  
svm_model.fit(X_train, y_train)  

SVC()

#### Evaluate on the validation dataset

In [6]:
svm_pred_val =  svm_model.predict(X_val)

svm_val_acc = metrics.accuracy_score(y_val, svm_pred_val)
svm_val_prec = metrics.precision_score(y_val, svm_pred_val)
svm_val_rec = metrics.recall_score(y_val, svm_pred_val)

print(f"Accuracy: {svm_val_acc}")
print(f"Precision: {svm_val_prec}")
print(f"Recall: {svm_val_rec}")

Accuracy: 0.955
Precision: 0.93
Recall: 0.9789473684210527


#### Testing on test set

In [8]:
svm_pred_test =  svm_model.predict(X_test)

print("Accuracy:",metrics.accuracy_score(y_test, svm_pred_test))
print("Precision:",metrics.precision_score(y_test, svm_pred_test))
print("Recall:",metrics.recall_score(y_test, svm_pred_test))

Accuracy: 0.965
Precision: 0.9468085106382979
Recall: 0.978021978021978


#### Saving the models

In [9]:
timestr = time.strftime("%m%d-%H%M")

# SVM model saving
filename = 'model/logistic_' + timestr + '.sav'
joblib.dump(logistic_model, filename)

# SVM model saving
filename = 'model/svm_' + timestr + '.sav'
joblib.dump(svm_model, filename)

['model/svm_0520-1547.sav']

### III. Comparison of models
---

Based on the evaluation metrics, the better model between SVM and Logistic Regression is the SVM model. On the test set, the accuracy, precision, and recall of the SVM model is higher by 8%, 10.7%, and 5.5%, respectively.