### Running Multiple Models

The other notebooks focus on one algorithm at a time for reference purposes. Most of the code is redundant from notebook to notebook. This notebook shows code to run multiple models, and will be demonstrated on the Titanic data.

Another difference in this notebook is that all the imports occur first. The imports in the other notebooks were placed near their first use so that it would be clear exactly what to import.

In [1]:
# notebook imports

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import BernoulliNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score


In [2]:
### load the data
df = pd.read_csv('data/titanic3.csv', usecols=['pclass', 'survived', 'sex', 'age'])

# convert columns to factors
df.survived = df.survived.astype('category').cat.codes
df.pclass = df.pclass.astype('category').cat.codes
df.sex = df.sex.astype('category').cat.codes
df.head()

Unnamed: 0,pclass,survived,sex,age
0,0,1,0,29.0
1,0,1,1,0.9167
2,0,0,0,2.0
3,0,0,1,30.0
4,0,0,0,25.0


In [3]:
# fill missing values

age_mean = np.mean(df.age)
df.age.fillna(age_mean, inplace=True)

In [4]:
# train test split

X = df.loc[:, ['pclass', 'age', 'sex']]
y = df.survived

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

### Setting up the modesl

Now that the data is ready, a list of models is created. Each model will have an associated acronym for later display purposes.

In [5]:
models = [
    ['LR', LogisticRegression()],
    ['NB', BernoulliNB()],
    ['kNN', KNeighborsClassifier(n_neighbors=5)],
    ['DT', DecisionTreeClassifier()]
]

In [6]:
clf_acc = {}

for clf, model in models:
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    clf_acc[clf] = accuracy_score(y_test, y_pred)
    
for clf in clf_acc:
    print(clf, '\t', clf_acc[clf])
    

LR 	 0.7977099236641222
NB 	 0.7786259541984732
kNN 	 0.7786259541984732
DT 	 0.7786259541984732
