This notebook provides an overview of some popular models for the supervised machine learning task of binary classification.
From https://en.wikipedia.org/wiki/Binary_classification (accessed 2/17/2021):

>"Binary classification is the task of classifying the elements of a set into two groups on the basis of a classification rule. Typical binary classification problems include:
>
>Medical testing to determine if a patient has certain disease or not;
Quality control in industry, deciding whether a specification has been met;
In information retrieval, deciding whether a page should be in the result set of a search or not.
Binary classification is dichotomization applied to a practical situation. In many practical binary classification problems, the two groups are not symmetric, and rather than overall accuracy, the relative proportion of different types of errors is of interest. For example, in medical testing, detecting a disease when it is not present (a false positive) is considered differently from not detecting a disease when it is present (a false negative)."

We will be using a medical testing dataset to demonstrate techniques. The following code block imports some of the libraries we will be using and initializes bokeh to output to the notebook.

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
pd.options.display.float_format = '{:,.4f}'.format
import seaborn as sns
sns.set_style('whitegrid')

from bokeh.layouts import gridplot, column
from bokeh.models import (BasicTicker, ColorBar, ColumnDataSource, 
                          HoverTool, LabelSet, LinearColorMapper, NumeralTickFormatter)
from bokeh.palettes import brewer, RdBu, Reds
from bokeh.plotting import figure, show, output_notebook
from bokeh.transform import transform

%config Completer.use_jedi = False
output_notebook()

The following code block defines a function that we will use to generate a confusion matrix for the various prediction models.

In [2]:
def plot_confusion_matrix(y_true, y_predicted):
    
    from sklearn import metrics
    
    accuracy = np.round(100*(y_true == y_predicted).astype(int).sum()/len(y_predicted), 2)
    
    confusion = pd.DataFrame(metrics.confusion_matrix(y_true, y_predicted))
    confusion.index.name = "True"
    confusion.columns.name = "Predicted"
    confusion = confusion.stack().rename("value").reset_index()
    confusion['True'] = confusion['True'].astype(str)
    confusion['Predicted'] = confusion['Predicted'].astype(str)

    source = ColumnDataSource(confusion)

    values = sorted(list(confusion['True'].unique()))

    palette = brewer['RdBu'][10]
    color_mapper = LinearColorMapper(
        palette = palette, 
    )

    p = figure(
        plot_width = 400, 
        plot_height = 400, 
        title = f'Confusion Matrix: Overall accuracy = {accuracy}%',
        x_range = ['0', '1'], 
        y_range = ['0', '1'],
        x_axis_label = 'Predicted',
        y_axis_label = 'True',
        tools = 'hover', 
        x_axis_location="below",
    )

    p.rect(
        x = 'Predicted', 
        y = 'True', 
        width = 1, 
        height = 1, 
        source = source,
        line_color = 'grey', 
        fill_color = transform('value', color_mapper),
    )

    hover = p.hover.tooltips = [
        ("True", "@{True}"),
        ("Predicted", "@{Predicted}"),
        ("Count", "@value"),
    ]

    p.axis.axis_line_color = None
    p.axis.major_tick_line_color = None
    p.axis.major_label_text_font_size = "14px"
    p.axis.major_label_standoff = 0
    p.xaxis.major_label_orientation = 1.0
    
    labels = LabelSet(x='Predicted', y='True', text='value',
                      render_mode='canvas', text_color = 'white',
                      x_offset = 70, y_offset = 70, source=source,)

    p.add_layout(labels)

    show(p)

We will use a subset of a popular dataset that has been used for ML demonstrations that focuses on the task of predicting heart disease in patients. Details on the dataset can be found at https://archive.ics.uci.edu/ml/datasets/heart+disease.

The following code block reads the sample and create  a `heart_disease` column that captures whether or not the presence of heart disease was found in the patient.

In [3]:
data = pd.read_csv('cleveland.csv')

my_filter = data['diagnosis'] == 0
data.loc[my_filter, 'heart_disease'] = 0
data.loc[~my_filter, 'heart_disease'] = 1
data = data.drop(columns = ['diagnosis'])
data.columns = [col.replace(' ', '_') for col in data.columns]

for col in data.columns:
    data[col] = pd.to_numeric(data[col], errors = 'coerce')
    data[col] = data[col].fillna(data[col].median())

data.head()

Unnamed: 0,age,sex,chest_pain,blood_pressure,serum_cholestoral,fasting_blood_sugar,electrocardiographic,max_heart_rate,induced_angina,ST_depression,slope,vessels,thal,heart_disease
0,63.0,1.0,1.0,145.0,233.0,1.0,2.0,150.0,0.0,2.3,3.0,0.0,6.0,0.0
1,67.0,1.0,4.0,160.0,286.0,0.0,2.0,108.0,1.0,1.5,2.0,3.0,3.0,1.0
2,67.0,1.0,4.0,120.0,229.0,0.0,2.0,129.0,1.0,2.6,2.0,2.0,7.0,1.0
3,37.0,1.0,3.0,130.0,250.0,0.0,0.0,187.0,0.0,3.5,3.0,0.0,3.0,0.0
4,41.0,0.0,2.0,130.0,204.0,0.0,2.0,172.0,0.0,1.4,1.0,0.0,3.0,0.0


The following code block uses bokeh to generate a heatmap showing the correlation between column values.

In [4]:
correlation_matrix = data.corr()
correlation_matrix = correlation_matrix.unstack().reset_index()
correlation_matrix.columns = ['Variable 1', 'Variable 2', 'Correlation']
variables = sorted(list(correlation_matrix['Variable 1'].unique()))

source = ColumnDataSource(correlation_matrix)

palette = brewer['RdBu'][10]
color_mapper = LinearColorMapper(
    palette = palette, 
    low = -1, 
    high = 1.0,
)

p = figure(
    plot_width = 550, 
    plot_height = 400, 
    title = f'Correlation Matrix',
    x_range = variables, 
    y_range = list(reversed(variables)),
    tools = 'hover', 
    x_axis_location="below",
)

p.rect(
    x = 'Variable 2', 
    y = 'Variable 1', 
    width = 1, 
    height = 1, 
    source = source,
    line_color = 'grey', 
    fill_color = transform('Correlation', color_mapper),
)

color_bar = ColorBar(
    color_mapper = color_mapper, 
    location = (0, 0),
    ticker = BasicTicker(desired_num_ticks = len(palette)),
)
color_bar.formatter = NumeralTickFormatter(format="0.0%")

p.add_layout(color_bar, 'right')

hover = p.hover.tooltips = [
    ("Variable 1", "@{Variable 1}"),
    ("Variable 2", "@{Variable 2}"),
    ("Correlation", "@Correlation{0.2f%}"),
]

p.axis.axis_line_color = None
p.axis.major_tick_line_color = None
p.axis.major_label_text_font_size = "12px"
p.axis.major_label_standoff = 0
p.xaxis.major_label_orientation = 1.0

color_bar.label_standoff = 4
color_bar.major_label_text_align = 'left'
color_bar.major_label_text_font_size = '12px'

show(p)

The following code block defines a list of `features`, which are things that we want to use to base our predictions on, and a `target` variable, which denotes the column that we aim to predict.

In [5]:
features = [
    'age', 
    'sex', 
    'chest_pain', 
    'blood_pressure', 
    'serum_cholestoral',
    'fasting_blood_sugar', 
    'electrocardiographic', 
    'max_heart_rate',
    'induced_angina', 
    'ST_depression', 
    'slope', 
    'vessels', 
    'thal',
]

target = 'heart_disease'

Before attempting any prediction, let's inspect the value in the various feature columns.

In [6]:
data.describe(percentiles = [0.5]).transpose()

Unnamed: 0,count,mean,std,min,50%,max
age,303.0,54.4389,9.0387,29.0,56.0,77.0
sex,303.0,0.6799,0.4673,0.0,1.0,1.0
chest_pain,303.0,3.1584,0.9601,1.0,3.0,4.0
blood_pressure,303.0,131.6898,17.5997,94.0,130.0,200.0
serum_cholestoral,303.0,246.6931,51.7769,126.0,241.0,564.0
fasting_blood_sugar,303.0,0.1485,0.3562,0.0,0.0,1.0
electrocardiographic,303.0,0.9901,0.995,0.0,1.0,2.0
max_heart_rate,303.0,149.6073,22.875,71.0,153.0,202.0
induced_angina,303.0,0.3267,0.4698,0.0,0.0,1.0
ST_depression,303.0,1.0396,1.1611,0.0,0.8,6.2


As you can see, the values for each feature are not similarly scaled. This can be an issue for some prediction models. The following code block uses the `StandardScaler` class from `scikit-learn` to scale the feature values, which are saved in a DataFrame named `scaled_data`. 

In [7]:
from sklearn import preprocessing

scaler = preprocessing.StandardScaler()
scaled_data = scaler.fit(data[features])
scaled_data = scaler.fit_transform(data[features])
scaled_data = pd.DataFrame(scaled_data, columns = features)
scaled_data[target] = data[target]

# scaler.mean_
# scaler.scale_

The following code block shows that the scaled data is much more comparable with respect to scaling.

In [8]:
scaled_data.describe(percentiles = [0.5]).transpose()

Unnamed: 0,count,mean,std,min,50%,max
age,303.0,-0.0,1.0017,-2.8191,0.173,2.5002
sex,303.0,-0.0,1.0017,-1.4573,0.6862,0.6862
chest_pain,303.0,-0.0,1.0017,-2.2518,-0.1653,0.878
blood_pressure,303.0,0.0,1.0017,-2.145,-0.0962,3.8877
serum_cholestoral,303.0,0.0,1.0017,-2.3349,-0.1101,6.1385
fasting_blood_sugar,303.0,-0.0,1.0017,-0.4176,-0.4176,2.3944
electrocardiographic,303.0,-0.0,1.0017,-0.9967,0.01,1.0167
max_heart_rate,303.0,-0.0,1.0017,-3.4421,0.1486,2.2942
induced_angina,303.0,-0.0,1.0017,-0.6966,-0.6966,1.4355
ST_depression,303.0,0.0,1.0017,-0.8969,-0.2067,4.4519


Typically, we measure the quality of prediction models using training and testing datasets. Before constructing such datasets, let's look at how the target variable is distributed.

In [9]:
pd.value_counts(scaled_data[target])/pd.value_counts(scaled_data[target]).sum()

0.0000   0.5413
1.0000   0.4587
Name: heart_disease, dtype: float64

In our dataset, we have an relatively even distribution of values in the target column. However, it is a good practice to make sure that your training and testing datasets have the same distribution. The following code blocks demonstrate how we can come up with somehwat unrepresentative train/test splits if we are not careful.

In [10]:
from sklearn.model_selection import train_test_split

for i in range(1000):
    train, test = train_test_split(scaled_data, random_state = i)
    temp = pd.value_counts(train[target])/pd.value_counts(train[target]).sum()
    if temp[0] < temp[1]:
        print(i)

297
736
790
809
839
952


We can use the `stratify` argument in scikit-learn's `test_train_split` function to ensure our splits are representative of the balance we observe in the data.

In [11]:
for i in range(1000):
    train, test = train_test_split(scaled_data, stratify = scaled_data[target], random_state = i)
    temp = pd.value_counts(train[target])/pd.value_counts(train[target]).sum()
    if temp[0] < temp[1]:
        print(i)

In [12]:
train, test = train_test_split(scaled_data, stratify = scaled_data[target], random_state = 809)
pd.value_counts(train[target])/pd.value_counts(train[target]).sum()

0.0000   0.5419
1.0000   0.4581
Name: heart_disease, dtype: float64

The following code block creates our various train/test objects.

In [13]:
train, test = train_test_split(scaled_data, stratify = scaled_data[target], random_state = 809)
x_train, y_train = train[features], train[target]
x_test, y_test = test[features], test[target]

# Question: Would linear regression work?

The following code block uses `statsmodels` linear regression implementation to understand the effects of features on the target.

In [14]:
import statsmodels.formula.api as smf

formula = f"{target} ~ {' + '.join(features)}"

model = smf.ols(
    formula = formula, 
    data = train)

fit_model = model.fit()

fit_model.summary()

0,1,2,3
Dep. Variable:,heart_disease,R-squared:,0.602
Model:,OLS,Adj. R-squared:,0.578
Method:,Least Squares,F-statistic:,24.81
Date:,"Mon, 22 Feb 2021",Prob (F-statistic):,8.3e-36
Time:,14:21:03,Log-Likelihood:,-59.302
No. Observations:,227,AIC:,146.6
Df Residuals:,213,BIC:,194.6
Df Model:,13,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.4602,0.022,21.178,0.000,0.417,0.503
age,-0.0017,0.026,-0.063,0.950,-0.054,0.050
sex,0.0754,0.025,3.076,0.002,0.027,0.124
chest_pain,0.0954,0.025,3.811,0.000,0.046,0.145
blood_pressure,0.0325,0.025,1.308,0.192,-0.016,0.082
serum_cholestoral,0.0165,0.025,0.666,0.506,-0.032,0.065
fasting_blood_sugar,-0.0210,0.023,-0.912,0.363,-0.066,0.024
electrocardiographic,0.0116,0.022,0.518,0.605,-0.033,0.056
max_heart_rate,-0.0495,0.027,-1.820,0.070,-0.103,0.004

0,1,2,3
Omnibus:,6.631,Durbin-Watson:,1.793
Prob(Omnibus):,0.036,Jarque-Bera (JB):,6.611
Skew:,0.33,Prob(JB):,0.0367
Kurtosis:,3.513,Cond. No.,3.02


The following code block shows how we can use the fit model to make predictions.

In [15]:
predictions = fit_model.predict(test[features])

predictions =(predictions > 0.5).astype(int)
 
predictions

92     1
190    0
215    0
294    0
35     0
      ..
280    1
191    1
213    1
126    1
268    0
Length: 76, dtype: int32

The following code block compares the predictions on our test data to the actual target values in our test data to generate a `confusion matrix`. This matrix allows us to inspect the accuracy of our model with respect to true/false positives and true/false negatives.

In [16]:
predictions = fit_model.predict(test[features])

predictions = (predictions > 0.5).astype(int)

plot_confusion_matrix(y_test.values, predictions.values)

# What about other, more *sophisticated* approaches?

## Logistic Regression

https://en.wikipedia.org/wiki/Logistic_regression

In [17]:
from sklearn.linear_model import LogisticRegression

In [18]:
clf = LogisticRegression(random_state=0, solver = 'newton-cg')

clf = clf.fit(x_train, y_train)

clf.score(x_test, y_test)

0.7894736842105263

In [19]:
predictions = clf.predict(x_test)
plot_confusion_matrix(y_test, predictions)

## Decision Trees

https://en.wikipedia.org/wiki/Decision_tree_learning

In [20]:
from sklearn import tree

In [21]:
clf = tree.DecisionTreeClassifier()

clf = clf.fit(x_train, y_train)

clf.score(x_test, y_test)

0.6578947368421053

In [22]:
for max_depth in range(1, 10):
    clf = tree.DecisionTreeClassifier(max_depth = max_depth, random_state = 0)

    clf = clf.fit(x_train, y_train)
    
    score = clf.score(x_test, y_test)
    
    print(f'Max depth = {max_depth}: {score}') 

Max depth = 1: 0.6973684210526315
Max depth = 2: 0.6842105263157895
Max depth = 3: 0.75
Max depth = 4: 0.7236842105263158
Max depth = 5: 0.6842105263157895
Max depth = 6: 0.6710526315789473
Max depth = 7: 0.6842105263157895
Max depth = 8: 0.7105263157894737
Max depth = 9: 0.6973684210526315


In [23]:
clf = tree.DecisionTreeClassifier(max_depth = 3, random_state = 0)

clf = clf.fit(x_train, y_train)

score = clf.score(x_test, y_test)

print(tree.export_text(clf, feature_names = features))

|--- thal <= -0.12
|   |--- chest_pain <= 0.36
|   |   |--- ST_depression <= 1.26
|   |   |   |--- class: 0.0
|   |   |--- ST_depression >  1.26
|   |   |   |--- class: 1.0
|   |--- chest_pain >  0.36
|   |   |--- vessels <= -0.18
|   |   |   |--- class: 0.0
|   |   |--- vessels >  -0.18
|   |   |   |--- class: 1.0
|--- thal >  -0.12
|   |--- chest_pain <= -0.69
|   |   |--- vessels <= -0.18
|   |   |   |--- class: 0.0
|   |   |--- vessels >  -0.18
|   |   |   |--- class: 1.0
|   |--- chest_pain >  -0.69
|   |   |--- ST_depression <= -0.29
|   |   |   |--- class: 1.0
|   |   |--- ST_depression >  -0.29
|   |   |   |--- class: 1.0



In [24]:
predictions = clf.predict(x_test)
plot_confusion_matrix(y_test, predictions)

## Random Forest

https://en.wikipedia.org/wiki/Random_forest

In [25]:
from sklearn.ensemble import RandomForestClassifier

clf = RandomForestClassifier(random_state = 0)

clf = clf.fit(x_train, y_train)

clf.score(x_test, y_test)

0.7368421052631579

In [26]:
y_hat = clf.predict(x_test)
plot_confusion_matrix(y_test, y_hat)

## Gradient Boosted Trees

https://en.wikipedia.org/wiki/Gradient_boosting#Gradient_tree_boosting

In [27]:
from sklearn.ensemble import GradientBoostingClassifier

clf = GradientBoostingClassifier(random_state = 0)

clf = clf.fit(x_train, y_train)

clf.score(x_test, y_test)

0.7368421052631579

In [28]:
predictions = clf.predict(x_test)
plot_confusion_matrix(y_test, predictions)

## AdaBoost

https://en.wikipedia.org/wiki/AdaBoost

In [29]:
from sklearn.ensemble import AdaBoostClassifier

clf = AdaBoostClassifier(random_state = 0)

clf = clf.fit(x_train, y_train)

clf.score(x_test, y_test)

0.75

In [30]:
predictions = clf.predict(x_test)
plot_confusion_matrix(y_test, predictions)

## Start here

In [31]:
clf = AdaBoostClassifier(n_estimators = 2500, 
                         learning_rate = 0.1, 
                         random_state = 0)

clf = clf.fit(x_train, y_train)

clf.score(x_test, y_test)

0.7631578947368421

In [32]:
predictions = clf.predict(x_test)
plot_confusion_matrix(y_test, predictions)

# Hyperparamter Tuning

In [33]:
from sklearn.model_selection import GridSearchCV

In [34]:
params = {
    'n_estimators': [5, 10, 50, 100],
    'learning_rate': [0.001, 0.01, 0.1, 1, 10],
}

In [35]:
adaboost = AdaBoostClassifier(random_state = 0)

clf = GridSearchCV(adaboost, params, error_score=0)
search = clf.fit(x_train, y_train)
best_params = search.best_params_
best_params

{'learning_rate': 0.01, 'n_estimators': 100}

In [36]:
clf = AdaBoostClassifier(random_state = 0, **best_params)
clf = clf.fit(x_train, y_train)
clf.score(x_test, y_test)

0.7894736842105263

In [37]:
predictions = clf.predict(x_test)
plot_confusion_matrix(y_test, predictions)

In [38]:
summary = pd.DataFrame(search.cv_results_)
param_columns = [col for col in summary.columns if col.startswith('param') and (col != 'params')]

metric_col = 'mean_test_score'
summary = summary[param_columns + [metric_col]]
summary = summary.dropna()
for col in summary.columns:
    summary[col] = pd.to_numeric(summary[col])

formula = f"{metric_col} ~ {'*'.join(param_columns)}"

model = smf.ols(
    formula = formula, 
    data = summary)

fit_model = model.fit()

fit_model.summary()

0,1,2,3
Dep. Variable:,mean_test_score,R-squared:,0.786
Model:,OLS,Adj. R-squared:,0.746
Method:,Least Squares,F-statistic:,19.63
Date:,"Mon, 22 Feb 2021",Prob (F-statistic):,1.3e-05
Time:,14:56:22,Log-Likelihood:,19.31
No. Observations:,20,AIC:,-30.62
Df Residuals:,16,BIC:,-26.64
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.7988,0.039,20.465,0.000,0.716,0.882
param_learning_rate,-0.0289,0.009,-3.325,0.004,-0.047,-0.010
param_n_estimators,0.0004,0.001,0.537,0.598,-0.001,0.002
param_learning_rate:param_n_estimators,-0.0003,0.000,-2.233,0.040,-0.001,-1.75e-05

0,1,2,3
Omnibus:,4.069,Durbin-Watson:,1.875
Prob(Omnibus):,0.131,Jarque-Bera (JB):,2.368
Skew:,0.186,Prob(JB):,0.306
Kurtosis:,4.644,Cond. No.,433.0


In [39]:
run_cell = True
int_step = 4
float_delta = 0.1
float_steps = 8

if run_cell:
    params = {
        'n_estimators': [5, 10, 50, 100],
        'learning_rate': [0.001, 0.01, 0.1, 1, 10],
    }

    adaboost = AdaBoostClassifier(random_state = 0)

    print('Starting course search')
    clf = GridSearchCV(adaboost, params)
    search = clf.fit(x_train, y_train)
    print(f'Best params from course search: {search.best_params_}')

    fine_params = {}
    for param in params:
        if isinstance(search.best_params_[param], int):
            min_val = search.best_params_[param] - int_step
            max_val = search.best_params_[param] + int_step + 1
            fine_params[param] = [i for i in range(min_val, max_val)]
        else:
            min_val = search.best_params_[param]*(1 - float_delta)
            max_val = search.best_params_[param]*(1 + float_delta)
            fine_params[param] = np.linspace(min_val, max_val, float_steps)

    print('Starting fine search')
    clf = GridSearchCV(adaboost, fine_params, error_score=0)
    search = clf.fit(x_train, y_train)
    print(f'Best params from fine search: {search.best_params_}')

    clf = AdaBoostClassifier(random_state = 0, **search.best_params_)
    clf = clf.fit(x_train, y_train)
    clf.score(x_test, y_test)

    predictions = clf.predict(x_test)
    plot_confusion_matrix(y_test, predictions)

Starting course search
Best params from course search: {'learning_rate': 0.01, 'n_estimators': 100}
Starting fine search
Best params from fine search: {'learning_rate': 0.010714285714285716, 'n_estimators': 104}


## Neural Networks

https://en.wikipedia.org/wiki/Artificial_neural_network

https://missinglink.ai/guides/neural-network-concepts/7-types-neural-network-activation-functions-right/

In [40]:
train, test = train_test_split(scaled_data, stratify = scaled_data[target], random_state = 809)
x_train, y_train = train[features], train[target]
x_test, y_test = test[features], test[target]

In [41]:
x_train.shape

(227, 13)

In [42]:
import tensorflow as tf

from keras.models import Sequential
from keras.layers import Dense, Dropout, Input

ModuleNotFoundError: No module named 'tensorflow'

In [None]:
def plot_history(tf_history):

    width = 12
    height = width*0.4
    fig, ax = plt.subplots(1, 2, figsize = (width, height))

    index = [i for i, _ in enumerate(history.history['loss'], 1)]

    ax[0].plot(index, tf_history.history['loss'], label = 'Loss')
    ax[0].plot(index, tf_history.history['val_loss'], label = 'Validation Loss')
    ax[0].legend(loc = 0)
    ax[0].set_xlabel('Epoch')
    ax[0].set_ylabel('Value')

    ax[1].plot(index, tf_history.history['accuracy'], label = 'Accuracy')
    ax[1].plot(index, tf_history.history['val_accuracy'], label = 'Validation Accuracy')
    ax[1].legend(loc = 0)
    ax[1].set_xlabel('Epoch')
    ax[1].set_ylabel('Value')
    plt.show()

#### Case 1: Inputs mapped to a single output layer. MSE loss.

In [None]:
tf.random.set_seed(0)

model = Sequential()
model.add(Input(13))
model.add(Dense(1))
model.compile(loss = 'mse', 
              optimizer = 'adam', 
              metrics = ['accuracy'])

X, y = train[features].values, train[target].values
history = model.fit(X, y, 
          epochs = 10, 
          batch_size = 10, 
          verbose = 1,
          validation_split = 0.2);

In [None]:
model.summary()

In [None]:
plot_history(history)

Seems that we stopped training prematurely.

#### Case 2: Adding epochs

In [None]:
tf.random.set_seed(0)

model = Sequential()
model.add(Input(13))
model.add(Dense(1))
model.compile(loss = 'mse', 
              optimizer = 'adam', 
              metrics = ['accuracy'])

X, y = train[features].values, train[target].values
history = model.fit(X, y, 
          epochs = 100, 
          batch_size = 10, 
          verbose = 0,
          validation_split = 0.2);

plot_history(history)

In [None]:
predictions = (model.predict(test[features].values).flatten() > 0.5).astype(int)

plot_confusion_matrix(y_test, predictions)

#### Case 3: An *almost* Deep Neural Net

In [None]:
tf.random.set_seed(0)

model = Sequential()
model.add(Dense(6))
model.add(Dense(1))
model.compile(loss = 'mse', 
              optimizer = 'adam', 
              metrics = ['accuracy'])

X, y = train[features].values, train[target].values
history = model.fit(X, y, 
          epochs = 100, 
          batch_size = 10, 
          verbose = 0,
          validation_split = 0.2);

plot_history(history)

In [None]:
model.summary()

#### Case 4: A Deep neural net

In [None]:
tf.random.set_seed(0)

model = Sequential()
model.add(Dense(9))
model.add(Dense(6))
model.add(Dense(3))
model.add(Dense(1))
model.compile(loss = 'mse', 
              optimizer = 'adam', 
              metrics = ['accuracy'])

X, y = train[features].values, train[target].values
history = model.fit(X, y, 
          epochs = 100, 
          batch_size = 10, 
          verbose = 0,
          validation_split = 0.2);

plot_history(history)

In [None]:
model.summary()

#### Case 5: Rectified Linear Unit (ReLu) Activation

https://en.wikipedia.org/wiki/Rectifier_(neural_networks)

In [None]:
tf.random.set_seed(0)

model = Sequential()
model.add(Dense(6, activation='relu'))
model.add(Dense(1, activation='relu'))
model.compile(loss = 'binary_crossentropy', 
              optimizer = 'adam', 
              metrics = ['accuracy'])

X, y = train[features].values, train[target].values
history = model.fit(X, y, 
          epochs = 100, 
          batch_size = 20, 
          verbose = 0,
          validation_split = 0.2);

plot_history(history)

#### Case 6: ReLu with additional epochs

In [None]:
tf.random.set_seed(0)

model = Sequential()
model.add(Dense(6, activation='relu'))
model.add(Dense(1, activation='relu'))
model.compile(loss = 'binary_crossentropy', 
              optimizer = 'adam', 
              metrics = ['accuracy'])

X, y = train[features].values, train[target].values
history = model.fit(X, y, 
          epochs = 500, 
          batch_size = 10, 
          verbose = 0,
          validation_split = 0.2);

plot_history(history)

#### Case 7: ReLu with additional epochs and Dropout

In [None]:
tf.random.set_seed(0)

model = Sequential()
model.add(Dense(6, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1, activation='relu'))
model.compile(loss = 'binary_crossentropy', 
              optimizer = 'adam', 
              metrics = ['accuracy'])

X, y = train[features].values, train[target].values
history = model.fit(X, y, 
          epochs = 500, 
          batch_size = 10, 
          verbose = 0,
          validation_split = 0.2);

plot_history(history)

In [None]:
predictions = (model.predict(test[features].values).flatten() > 0.5).astype(int)

plot_confusion_matrix(y_test, predictions)

# What?

#### Case 8: ReLu with additional epochs and Dropout, different data

In [None]:
train, test = train_test_split(scaled_data, stratify = scaled_data[target], random_state = 0)
x_train, y_train = train[features], train[target]
x_test, y_test = test[features], test[target]

tf.random.set_seed(0)

model = Sequential()
model.add(Dense(6, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1, activation='relu'))
model.compile(loss = 'binary_crossentropy', 
              optimizer = 'adam', 
              metrics = ['accuracy'])

X, y = train[features].values, train[target].values
history = model.fit(X, y, 
          epochs = 500, 
          batch_size = 10, 
          verbose = 0,
          validation_split = 0.2);

plot_history(history)

#### Case 9: ReLu with additional epochs and Dropout, different data

In [None]:
train, test = train_test_split(scaled_data, stratify = scaled_data[target], random_state = 0)
x_train, y_train = train[features], train[target]
x_test, y_test = test[features], test[target]

tf.random.set_seed(0)

model = Sequential()
model.add(Dense(6, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='relu'))
model.compile(loss = 'binary_crossentropy', 
              optimizer = 'adam', 
              metrics = ['accuracy'])

X, y = train[features].values, train[target].values
history = model.fit(X, y, 
          epochs = 500, 
          batch_size = 10, 
          verbose = 0,
          validation_split = 0.2);

plot_history(history)

In [None]:
predictions = (model.predict(test[features].values).flatten() > 0.5).astype(int)

plot_confusion_matrix(y_test, predictions)