### Codio Activity 22.9: End to End Classification with `keras`

**Expected Time = 60 minutes**

**Total Points = 20**

This activity focuses on using a dataset that contains a mix of feature types to build a binary classification model.  You will use familiar scikit-learn preprocessing operations to handle categorical features, and use the transformed data together with a `keras` model.  After building a familiar model, further examples are shown using regularization strategies for neural networks.

#### Index

- [Problem 1](#-Problem-1)
- [Problem 2](#-Problem-2)
- [Problem 3](#-Problem-3)
- [Problem 4](#-Problem-4)
- [Problem 5](#-Problem-5)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import warnings
warnings.filterwarnings('ignore')

from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import make_column_transformer
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

### The Data

The dataset contains information gathered from the United States Census about whether or not the individual earned over \$50,000 per year. ([more info here](https://archive.ics.uci.edu/ml/datasets/Adult))

In [None]:
df = pd.read_csv('data/adult.csv')

In [None]:
df.head()

In [None]:
df.info()

In [None]:
X = df.drop('income', axis = 1)
y = df['income']

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

[Back to top](#-Index)

### Problem 1

#### Preparing the Data

**10 Points**

Below, use the `make_column_transformer` to transform the feature array as `X_train_num` and `X_test_num` using the `OneHotEncoder` with `drop = if_binary` and existing numeric data should be scaled using `StandardScaler`.  

For the target, assign the values of `y_train_num` and `y_test_num` as a binary array where 1 represents observations earning over \$50,000 per year, and 0 less.  Note that as long as you have a purely numeric array, the `keras` model will accept this as input.  While you can use a sparse array, here you are to transform the sparse array to a dense array with the `.toarray()` method.

In [None]:
df.select_dtypes('object').columns[:-1] #categorical features

In [None]:
y.unique() #unique values of y

In [None]:
y.value_counts(normalize=True) #baseline for model

In [None]:
### GRADED
transformer = ''
X_train_num = ''
X_test_num = ''

y_train_num = ''
y_test_num = ''
    
### BEGIN SOLUTION
transformer = make_column_transformer((OneHotEncoder(drop = 'if_binary'), ['workclass', 'education', 'marital.status', 'occupation',
       'relationship', 'race', 'sex', 'native.country']),
                                     remainder = StandardScaler())
X_train_num = transformer.fit_transform(X_train).toarray()
X_test_num = transformer.transform(X_test).toarray()

y_train_num = np.where(y_train == '<=50K', 0, 1)
y_test_num = np.where(y_test == '<=50K', 0, 1)
### END SOLUTION

### ANSWER CHECK
print(np.unique(y_train_num))
print(type(X_train_num))

In [None]:
### BEGIN HIDDEN TESTS
transformer_ = make_column_transformer((OneHotEncoder(drop = 'if_binary'), ['workclass', 'education', 'marital.status', 'occupation',
       'relationship', 'race', 'sex', 'native.country']),
                                     remainder = StandardScaler())
X_train_num_ = transformer_.fit_transform(X_train).toarray()
X_test_num_ = transformer_.transform(X_test).toarray()

y_train_num_ = np.where(y_train == '<=50K', 0, 1)
y_test_num_ = np.where(y_test == '<=50K', 0, 1)
#
#
#
np.testing.assert_array_equal(X_train_num, X_train_num_)
np.testing.assert_array_equal(y_train_num, y_train_num_)
### END HIDDEN TESTS

[Back to top](#-Index)

### Problem 2

#### Building the model

**10 Points**

Now, use `keras` to build a model using a single hidden layer using 50 units and a single output layer, naming the model `model1`.  Fit the model and work to build a model using 10 training epochs.  Note the accuracy on the validation set.

NOTE: This question is computationally expensive and may take a minute or two to complete calculations.

In [None]:
### GRADED
model1 = ''
history1 = ''
    
### BEGIN SOLUTION
model1 = Sequential()
model1.add(Dense(50, activation = 'relu'))
model1.add(Dense(1, activation = 'sigmoid'))
model1.compile(loss = 'bce', metrics = ['acc'])
history1 = model1.fit(X_train_num, y_train_num, validation_data = (X_test_num, y_test_num),
                     epochs = 10, verbose = 0)

### END SOLUTION

### ANSWER CHECK
print(history1.history['val_acc'][-1])

In [None]:
### BEGIN HIDDEN TESTS
model1_ = Sequential()
model1_.add(Dense(50, activation = 'relu'))
model1_.add(Dense(1, activation = 'sigmoid'))
model1_.compile(loss = 'bce', metrics = ['acc'])
history1_ = model1_.fit(X_train_num, y_train_num, validation_data = (X_test_num, y_test_num),
                     epochs = 10, verbose = 0)
#
#
#
assert type(history1) == type(history1_)
### END HIDDEN TESTS

### Exploring Regularization

Below, the questions introduce the notion of regularization in your `keras` models.  The following questions are not graded, but are meant to offer exploratory introduction to using regularization in the neural network.  This may or may not improve our model for income -- but they are important components of some advanced architectures including those coming in the next module.

[Back to top](#-Index)

### Problem 3

#### Regularization in the Network

Similar to what we have seen in early models, there are some options for regularization in your neural network.  Below, `model2` uses `kernel_regularizer = 'l1'` to apply L1 regularization to the hidden layer.  

Explore other options built in to the `Dense` layer for regularization and see if applying either `l1` or `l2` regularization to the kernel and bias elements of the hidden layer.  

Assign your new model as `model2` below, and the fit information as `history2` below.

In [None]:
model2 = Sequential()
model2.add(Dense(50, activation = 'relu', kernel_regularizer = 'l1'))

model2.add(Dense(1, activation = 'sigmoid'))
model2.compile(loss = 'bce', metrics = ['acc'])
history2 = model2.fit(X_train_num, y_train_num, validation_data=(X_test_num, y_test_num),
           epochs = 10, verbose = 0)

[Back to top](#-Index)

### Problem 4

#### Dropout

An alternative option for regularization is the `Dropout` layer.  This layer randomly "drops out" nodes in a given layer.  From the `keras` documentation:

```
The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent overfitting. Inputs not set to 0 are scaled up by 1/(1 - rate) such that the sum over all inputs is unchanged.
```

To use this, the code below creates a model where 20% of the nodes in the hidden layer will be randomly dropped in each training epoch.  Experiment with this model and incorporate other regularization strategies from above to see if you can improve the performance of predictions on the validation data.

In [None]:
from tensorflow.keras.layers import Dropout

In [None]:
model3 = Sequential()
model3.add(Dense(50, activation = 'relu'))
model3.add(Dropout(0.2))
model3.add(Dense(1, activation = 'sigmoid'))
model3.compile(loss = 'bce', metrics = ['acc'])
history = model3.fit(X_train_num, y_train_num, validation_data=(X_test_num, y_test_num),
           epochs = 10, verbose = 0)

[Back to top](#-Index)

### Problem 5

#### Early Stopping

A third option for regularization is *Early Stopping*.  Here, you can set the model to stop training when overfitting begins.  Overfitting can be defined as the difference between training and validation data during each epoch of training.

To use this in `keras` you use a *callback*.  This is an object that is created and passed as an argument during the fitting of the model.  Below, the code demonstrates the use of the `EarlyStopping` callback from `keras`. Use this to explore alternative settings with both `Dropout` and kernel regularization to see if you can improve the performance of the network.

Plot the training and validation data to explore how early stopping truncated the training when overfitting begins.

In [None]:
from tensorflow.keras.callbacks import EarlyStopping

In [None]:
stopper = EarlyStopping(patience = 4)

In [None]:
model4 = Sequential()
model4.add(Dense(50, activation = 'relu'))
model4.add(Dropout(0.2))
model4.add(Dense(1, activation = 'sigmoid'))
model4.compile(loss = 'bce', metrics = ['acc'])
history4 = model4.fit(X_train_num, y_train_num, validation_data=(X_test_num, y_test_num),
           epochs = 1, verbose = 0,
                    callbacks = stopper)