# Building Machine Learning Models

At this point, you should have cleaned and prepared data. ready to ingested by the algorithms you selected and to build your model.

let's get started with importing the libraries we will be using.

In [12]:
import numpy as np
import pandas as pd
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import cross_validate
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

Load prepared data

In [13]:
df = pd.read_csv('training_prepared.csv')
X = df.drop(['Survived'], axis=1)
y = df['Survived']

We will use first the Holdout method where we split the data set into training set and test set.  
A common split would be 80% / 20%

In [14]:
train_X, test_X, train_y, test_y = train_test_split(X, y, train_size=0.80, test_size=0.20, stratify=y)


Since we will build multiple models, lets define a function to avoid repetition.

In [15]:
def classifier(model):
    classifier = model()
    classifier.fit(train_X, train_y)
    print(classifier.score(test_X, test_y,))
    return classifier

In [16]:
knc =classifier(KNeighborsClassifier)

0.7653631284916201


In [6]:
svc = classifier(SVC)

0.7988826815642458


As mentioned in the slides, Cross Validation allows us to use all of our data for training and testing, where the data is split into K folds and we use one of the folds as a test subset while training the data on the rest of the folds.  

Lets define another function, and add one more argument for the metric

In [7]:
def classifier_cv(model, metric):
    classifier = model()
    scores = cross_validate(classifier, X, y, cv=10, scoring=metric)
    print(scores['test_score'])
    print("Average: ", scores['test_score'].mean())
    return classifier

In [8]:
knc_cv =classifier_cv(KNeighborsClassifier,'accuracy')

[0.66666667 0.79775281 0.74157303 0.83146067 0.85393258 0.80898876
 0.83146067 0.83146067 0.87640449 0.78651685]
Average:  0.802621722846442


In [9]:
knc_cv =classifier_cv(KNeighborsClassifier,'precision')

[0.56097561 0.73529412 0.72       0.73170732 0.81818182 0.81481481
 0.82758621 0.85185185 0.87096774 0.80769231]
Average:  0.7739071785849155


In [10]:
svc_cv = classifier_cv(SVC,'accuracy')

[0.83333333 0.79775281 0.7752809  0.85393258 0.80898876 0.78651685
 0.78651685 0.79775281 0.86516854 0.7752809 ]
Average:  0.8080524344569288


In [11]:
svc_cv = classifier_cv(SVC,'precision')

[0.91666667 0.83333333 0.79166667 0.81818182 0.77419355 0.77777778
 0.8        0.83333333 0.92307692 0.77777778]
Average:  0.8246007845201394


In [None]:
# y_test_pred=svc_cv.predict(X_test_set)
# y_test_pred