# Home Credit Loan Defaulter Prediction - Data Wrangling and EDA

## Capstone Three : Springboard Data Science career track
**Notebook by Md Saimoom Ferdous**

### Problem Statement

Home Credit Group is one of the largest non-banking financial institution headquartered in Netherlands. It focuses on handing out credits to the population with little or no credit history. Mjority of the population living in remote communities needs micor credit, but they do not have anough credit history that will build confidence in lending the credit. 

Other than the credit history, what are the other available characteristics  (social, demographical) can provide insights into the client groups who would be able to repay the credit. Does repaying credit correlates with age group, occupation, shopping habit or any other unseen traits which can be discovered by data analysis?

Once the features that lead to repaying credits are known, which are the dominant features in deterining credit repayment?


Machine learning algorithms can be trained to predict default vs non-default clients based on theri very little information available. These come with two problems. Clients who really in need of the credit may often be prioly misclassified as deafulter. This phenomena is called false positive. For Home Credit Group this would derail their purpose if anyone deserving does not get the credit.  On the other hand, clients with real tendency for loan defaulter may be given green light for the credit as the algorithm would misclassify as non-defaulter. These phenomena is called false negative. This is also higly discouraging as registering loan defaulters in disguise would cripple organizations financial health in the medium to longer terms. It is essential to design machine learning algorithm for loan defulter in such a way to minimize false positiove/negative whereas maximizing true positive/negative rate. 



### The Data

- Data has been sourced from a Kaggle competition, consisting of thousands of client home loan, credit records (https://www.kaggle.com/c/home-credit-default-risk/data).
- Data wrangling, EDA and pre-processing will be done to get the data training ready for modelling stage


### Question(s) of Interest

EDA will look to answer following questions:
- What (%) of the population is likely to repay loan
- What loan type/gender group more prone to loan defaulting
- Do most of the clients own car/house?
- What education/working background clients come from?
- What are their marital status?
- How many children most clients have and family members?
- What are their income/credit/annuity distribution
- What age distributiton the cliens have


### 1. Data Wrangling

This step consists of Data Collection, Data Definitions, and Data Cleaning.

  * Data Collection
      - Data loading
  * Data Definition
      - Column names
      - Data types (numeric, categorical, timestamp, etc.)
      - Description of the columns
      - Count or percent per unique values or codes (including NA)
      - The range of values or codes  
  * Data Cleaning
      - NA or missing data
      - Duplicates


### 2. Exploratory Data Analysis

Conducted EDA on Home Credit Group loan data to examine relationships between variables and other patterns in the data.
- Explore distribution of categorical variables
- Explore distribution of numerical variables
- Anomalies and outliers
- Finding correlated variables and feature removal
- Feature creation

### 3. Pre-processing

- Create dummy features for categorical variables
- Standardize the magnitude of numeric variables
- Balance the data 
- Split into training, validation and test data

### 4. Modelling

### 5. Conclusion


# 3. Pre-processing

In [132]:
#load python packages

import os
import pandas as pd
import datetime
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import random
%matplotlib inline

# modelling
from sklearn.preprocessing import LabelEncoder
from sklearn import preprocessing
import tensorflow as tf

# to ignore warning message

import warnings
warnings.filterwarnings('ignore')

## 3.1 Load Data from the EDA Step
Dataframe with and without additional features will be loaded

In [133]:
# load data

# Original dataframe
df_base = pd.read_csv('data/homecredit_baseline_EDA.csv', index_col=0)

# With additional features
df = pd.read_csv('data/homecredit_extended_EDA.csv', index_col=0)

- We will work with the dataframe with extra features as of now
- Extra features will be saved and used to check model performance without the additional features

In [134]:
# Save extra features
additional_features = list(set(df.columns)-set(df_base.columns))

In [135]:
# look at the dataframe
df.head(3)

Unnamed: 0,NAME_CONTRACT_TYPE,CODE_GENDER,FLAG_OWN_CAR,FLAG_OWN_REALTY,NAME_TYPE_SUITE,NAME_INCOME_TYPE,NAME_EDUCATION_TYPE,NAME_FAMILY_STATUS,NAME_HOUSING_TYPE,OCCUPATION_TYPE,...,ANNUNITY_OVER_INCOME,CREDIT_OVER_INCOME,ANNUNITY_OVER_CREDIT,EMPLOYED_OVER_AGE,APARTMENT_OVER_INCOME,CARAGE_OVER_AGE,BUILD_OVER_AGE,EXT_3_2,EXT_3_1,EXT_2_1
0,Cash loans,M,N,Y,Unaccompanied,Working,Secondary / secondary special,Single / not married,House / apartment,Laborers,...,0.121978,2.007889,0.060749,0.067329,1.219753e-07,0.0,0.023888,0.036649,0.011573,0.021834
1,Cash loans,F,N,N,Family,State servant,Higher education,Married,House / apartment,Core staff,...,0.132217,4.79075,0.027598,0.070862,3.551852e-07,0.0,0.01733,0.365301,0.182735,0.193685
2,Revolving loans,M,Y,Y,Unaccompanied,Working,Secondary / secondary special,Single / not married,House / apartment,Laborers,...,0.1,2.0,0.05,0.011814,8.548148e-07,0.0,0.01442,0.405575,0.273626,0.208496


## 3.2 Convert Categorical Variables into Dummy Variables
Many machine learning models can not handle categorial variables during training. It is important to systematically convert categorical variables into meaningful numeric variables. At first we will determine for every categorical comuns how many unique categories are there. Depending on the number of unique categories in a column we will apply either label encoding or one hot encoding. Label encoding will be used for number of unique categories upto 2. This will allow the algorithm to equally weight on every categories. For more than 2 categories, one hot code encoding will be applied. Although this will crowd with high number of features, chances of bias introduction is very limited. 

In [136]:
# Find out number of unique categories for each categorical variables
print('Number of unique variables for the categorical columns:\n', df.select_dtypes(include=['object']).nunique())

Number of unique variables for the categorical columns:
 NAME_CONTRACT_TYPE             2
CODE_GENDER                    3
FLAG_OWN_CAR                   2
FLAG_OWN_REALTY                2
NAME_TYPE_SUITE                7
NAME_INCOME_TYPE               8
NAME_EDUCATION_TYPE            5
NAME_FAMILY_STATUS             6
NAME_HOUSING_TYPE              6
OCCUPATION_TYPE               18
WEEKDAY_APPR_PROCESS_START     7
ORGANIZATION_TYPE             58
FONDKAPREMONT_MODE             4
HOUSETYPE_MODE                 3
WALLSMATERIAL_MODE             7
EMERGENCYSTATE_MODE            2
dtype: int64


In [137]:
# Find out columns with maximum 2 unique categories for label encoding, more than 2 for one hot encoding

col_for_label_encode = ['NAME_CONTRACT_TYPE','FLAG_OWN_CAR','FLAG_OWN_REALTY','EMERGENCYSTATE_MODE']
col_for_one_hot_encode = ['CODE_GENDER','NAME_TYPE_SUITE','NAME_INCOME_TYPE','NAME_EDUCATION_TYPE',
                          'NAME_FAMILY_STATUS','NAME_HOUSING_TYPE','OCCUPATION_TYPE','WEEKDAY_APPR_PROCESS_START',
                          'ORGANIZATION_TYPE','FONDKAPREMONT_MODE','HOUSETYPE_MODE','WALLSMATERIAL_MODE']

In [138]:
# label encoding

df_label_encode = df[col_for_label_encode].apply(LabelEncoder().fit_transform)
df_label_encode.head(3)

Unnamed: 0,NAME_CONTRACT_TYPE,FLAG_OWN_CAR,FLAG_OWN_REALTY,EMERGENCYSTATE_MODE
0,0,0,1,0
1,0,0,0,0
2,1,1,1,0


In [139]:
# one hot encoding

df_one_hot_code = pd.get_dummies(df[col_for_one_hot_encode])
df_one_hot_code.head(3)

Unnamed: 0,CODE_GENDER_F,CODE_GENDER_M,CODE_GENDER_XNA,NAME_TYPE_SUITE_Children,NAME_TYPE_SUITE_Family,NAME_TYPE_SUITE_Group of people,NAME_TYPE_SUITE_Other_A,NAME_TYPE_SUITE_Other_B,"NAME_TYPE_SUITE_Spouse, partner",NAME_TYPE_SUITE_Unaccompanied,...,HOUSETYPE_MODE_block of flats,HOUSETYPE_MODE_specific housing,HOUSETYPE_MODE_terraced house,WALLSMATERIAL_MODE_Block,WALLSMATERIAL_MODE_Mixed,WALLSMATERIAL_MODE_Monolithic,WALLSMATERIAL_MODE_Others,WALLSMATERIAL_MODE_Panel,"WALLSMATERIAL_MODE_Stone, brick",WALLSMATERIAL_MODE_Wooden
0,0,1,0,0,0,0,0,0,0,1,...,1,0,0,0,0,0,0,0,1,0
1,1,0,0,0,1,0,0,0,0,0,...,1,0,0,1,0,0,0,0,0,0
2,0,1,0,0,0,0,0,0,0,1,...,1,0,0,1,0,0,0,0,0,0


In [140]:
# Arrange by this sequence: additional_features + categorical_features + numerical_features
df_additional_features = df[additional_features]
# Combine encoded categorical variables
df_cat_encoded = pd.concat([df_label_encode,df_one_hot_code], axis=1)
df_num = df.drop(columns=col_for_label_encode + col_for_one_hot_encode + additional_features)

# Combine encoded categorical features with numerical features
df_encoded = pd.concat([df_additional_features, df_cat_encoded, df_num ], axis=1)
df_encoded.head(3)

Unnamed: 0,CARAGE_OVER_AGE,EXT_2_1,EMPLOYED_OVER_AGE,CNT_FAM_MEMBERS,APARTMENT_OVER_INCOME,ANNUNITY_OVER_INCOME,BUILD_OVER_AGE,ANNUNITY_OVER_CREDIT,CREDIT_OVER_INCOME,CNT_CHILDREN_ANOM,...,AMT_REQ_CREDIT_BUREAU_DAY,AMT_REQ_CREDIT_BUREAU_WEEK,AMT_REQ_CREDIT_BUREAU_MON,AMT_REQ_CREDIT_BUREAU_QRT,AMT_REQ_CREDIT_BUREAU_YEAR,YEARS_BIRTH,YEARS_EMPLOYED,YEARS_REGISTRATION,YEARS_ID_PUBLISH,YEARS_LAST_PHONE
0,0.0,0.021834,0.067329,0,1.219753e-07,0.121978,0.023888,0.060749,2.007889,0,...,0.0,0.0,0.0,0.0,1.0,25.920548,1.745205,9.994521,5.808219,3.106849
1,0.0,0.193685,0.070862,0,3.551852e-07,0.132217,0.01733,0.027598,4.79075,0,...,0.0,0.0,0.0,0.0,0.0,45.931507,3.254795,3.249315,0.79726,2.268493
2,0.0,0.208496,0.011814,0,8.548148e-07,0.1,0.01442,0.05,2.0,0,...,0.0,0.0,0.0,0.0,0.0,52.180822,0.616438,11.671233,6.934247,2.232877


## 3.3 Standardize the Magnitude of Numerical Variables
This is applied to avoid bias when there are differences in magnitude of the numerical variables. In this case, we have seen stark differences between numeric variables. For example annual income, annuity and credits differ sharply in magnitude from other variables. We will scale every numeric variables except the booleans one.

In [141]:
# remove the target variable
df_num.drop(columns=['TARGET'], inplace = True)

# remove boolean variables
bool_cols = [col for col in df_num
             if np.isin(df[col].dropna().unique(), [0, 1]).all()]
df_num_only = df_num.drop(columns=bool_cols)
df_num_only.head(3)


Unnamed: 0,CNT_CHILDREN,AMT_INCOME_TOTAL,AMT_CREDIT,AMT_ANNUITY,REGION_POPULATION_RELATIVE,REGION_RATING_CLIENT,HOUR_APPR_PROCESS_START,EXT_SOURCE_1,EXT_SOURCE_2,EXT_SOURCE_3,...,AMT_REQ_CREDIT_BUREAU_DAY,AMT_REQ_CREDIT_BUREAU_WEEK,AMT_REQ_CREDIT_BUREAU_MON,AMT_REQ_CREDIT_BUREAU_QRT,AMT_REQ_CREDIT_BUREAU_YEAR,YEARS_BIRTH,YEARS_EMPLOYED,YEARS_REGISTRATION,YEARS_ID_PUBLISH,YEARS_LAST_PHONE
0,0,202500.0,406597.5,24700.5,0.018801,2,10,0.083037,0.262949,0.139376,...,0.0,0.0,0.0,0.0,1.0,25.920548,1.745205,9.994521,5.808219,3.106849
1,0,270000.0,1293502.5,35698.5,0.003541,1,11,0.311267,0.622246,0.587068,...,0.0,0.0,0.0,0.0,0.0,45.931507,3.254795,3.249315,0.79726,2.268493
2,0,67500.0,135000.0,6750.0,0.010032,2,9,0.375053,0.555912,0.729567,...,0.0,0.0,0.0,0.0,0.0,52.180822,0.616438,11.671233,6.934247,2.232877


In [142]:
# Scale all numerical columns except target and boolean variables
num_scaled = preprocessing.scale(df_num_only)

# Combine scaled features with boolean and categorical features
inputs_scaled = np.concatenate((df_additional_features.values, df_cat_encoded.values, num_scaled, df_num[bool_cols].values), axis=1)
targets = df['TARGET'].values

## 3.4 Balance the Data
From EDA step, we have seen deafulter vd non-defaulter percentage is around 92% vs 8% which is highly imbalanced. Imbalance data set introduces high percentage of False positives and False negatives. One of the most effective ways of combating data imbalance in balancing them to 50-50 by under smapling. 

In [143]:
# Take equal amount of defaulter and non-defaulter population

num_one_targets = int(np.sum(df['TARGET']))
zero_targets_counter = 0
indices_to_remove = []

for i in range(targets.shape[0]):
    if targets[i] == 0:
        zero_targets_counter +=1
        if zero_targets_counter > num_one_targets:
            indices_to_remove.append(i)
            
inputs_scaled_equal = np.delete(inputs_scaled, indices_to_remove, axis=0)
targets_equal = np.delete(targets, indices_to_remove, axis=0)

## 3.5 Shuffle the Data
In many cases data is collected in orderly fashion. This becomes particularly problemtic when running batch processing for model training. We will shuffle the data to be in the safe side to spread out target variables as much as possible.

In [144]:
# Shuffle the indices
shuffled_indices = np.arange(inputs_scaled_equal.shape[0])
np.random.shuffle(shuffled_indices)

# Shuffle the inputs and targets
inputs_shuffled = inputs_scaled_equal[shuffled_indices]
targets_shuffled = targets_equal[shuffled_indices]

## 3.6 Split Data into Train, Validation and Test Set
Split the scaled, balanced dataset into train, validation and test dataset. We will split by 80-10-10 ratio.

In [145]:
# Total samples
samples_count = inputs_shuffled.shape[0]

# Count number of train, validation and test samples to split by 80-10-10 ratio 
train_samples_count = int(0.8 * samples_count)
validation_samples_count = int(0.1 * samples_count)
test_samples_count = samples_count - train_samples_count - validation_samples_count

# Train set for inputs and targets
train_inputs = inputs_shuffled[:train_samples_count]
train_targets = targets_shuffled[:train_samples_count]

# Validation set for inputs and targets
validation_inputs = inputs_shuffled[train_samples_count:train_samples_count+validation_samples_count]
validation_targets = targets_shuffled[train_samples_count:train_samples_count+validation_samples_count]

# Test set for inputs and targets
test_inputs = inputs_shuffled[train_samples_count+validation_samples_count:]
test_targets = targets_shuffled[train_samples_count+validation_samples_count:]


- The data was made 50-50 balanced. After spliting into train, validation and test set, it is important to check the split data set is also balanced. Lets check on that.

In [146]:
# Lets check how many percentage of the three set contains targets with loan deafulter population

print('Loan defaulter percentage in train set: {:.2f}%'.format(100*(np.sum(train_targets) / train_samples_count)))
print('Loan defaulter percentage in validation set: {:.2f}%'.format(100*(np.sum(validation_targets) / validation_samples_count)))
print('Loan defaulter percentage in test set: {:.2f}%'.format(100*(np.sum(test_targets) / test_samples_count)))

Loan defaulter percentage in train set: 50.13%
Loan defaulter percentage in validation set: 49.67%
Loan defaulter percentage in test set: 49.33%


- The train, validation and test set are well balanced

## 3.7 Save Pre-processed Data
Save the balanced, scaled and train, validation, test split data for the modelling stage. The data are in numpy array format. So we will be saving as .npz format

In [147]:
# Save the datasets in .npz format

np.savez('data/homecredit_train', inputs=train_inputs, targets=train_targets)
np.savez('data/homecredit_validation', inputs=validation_inputs, targets=validation_targets)
np.savez('data/homecredit_test', inputs=test_inputs, targets=test_targets)

# 4. Modelling

## 4.1 Data Load

In [148]:
# Load the train data and cast inputs to float and targets to integer data type
npz = np.load('data/homecredit_train.npz')
train_inputs = npz['inputs'].astype(np.float)
train_targets = npz['targets'].astype(np.int)

# Load the validation data and cast inputs to float and targets to integer data type
npz = np.load('data/homecredit_validation.npz')
validation_inputs = npz['inputs'].astype(np.float)
validation_targets = npz['targets'].astype(np.int)

# Load the test data and cast inputs to float and targets to integer data type
npz = np.load('data/homecredit_test.npz')
test_inputs = npz['inputs'].astype(np.float)
test_targets = npz['targets'].astype(np.int)

In [149]:
train_inputs.shape

(39720, 216)

## 4.2 Base Deep Neural Net Model
**Optimizers**

Gradient Descent (GD): Iterates through whole training set. Updates once in a epoch. Slow in speed
Stochastic Gradient Descent (SGD): Updates weights multiple times in a epoch defined by batch size. Faster in speed
Adaptive Momentum (Adam): Adaptive learning rate coupled with momentum help the algorithm overcome any local peak and ensure reaching the minimum global peak
We will choose Adam for best performance.

**Loss Function**

Cross-entropy would be the choice for classification problem. Three options in Tensorflow 2.0. They are: binary, categorical and sparse categorical cross entropy. Binary expects the data is binary encoded. Categorical expects the data is one hot coded. Sparse can one hot code data during the training. To be in safe side we will apply sparse categorical cross entropy.

**Metrics**

For classification problem such as this, 'accuracy' metric is the most appropriate

**Validation Set**

Make sure the model parameters (weights, biases) do not overfit

**Test Set**

Make sure the model hyperparameters (width, height, batch size etc) do not overfit


In [150]:
# Input/output layers size
input_size = train_inputs.shape[0]
output_size = 2

# Hidden layer size
hidden_layer_size = 50
    
# Model structure
model = tf.keras.Sequential([
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
    tf.keras.layers.Dense(output_size, activation='softmax') # classifier so use 'softmax'
])


# Optimizer, loss function and metrics
 
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

### Training

# Batch size
batch_size = 100

# Number of training epochs
max_epochs = 100

# Early stopping: Allow for 2 instances of training after minimum loss point reached for for train/validation loss
early_stopping = tf.keras.callbacks.EarlyStopping(patience=2)


# fit the model
model.fit(train_inputs, # train inputs
          train_targets, # train targets
          batch_size=batch_size, # batch size
          epochs=max_epochs, # number of iteration the training will run, unless early stopping kicks in
          callbacks=[early_stopping], # early stopping
          validation_data=(validation_inputs, validation_targets), # validation data
          verbose = 2 # print minimum information during training
          )  

Train on 39720 samples, validate on 4965 samples
Epoch 1/100
39720/39720 - 1s - loss: 0.2424 - accuracy: 0.9068 - val_loss: 0.2025 - val_accuracy: 0.9259
Epoch 2/100
39720/39720 - 1s - loss: 0.1827 - accuracy: 0.9357 - val_loss: 0.1777 - val_accuracy: 0.9398
Epoch 3/100
39720/39720 - 1s - loss: 0.1736 - accuracy: 0.9403 - val_loss: 0.1854 - val_accuracy: 0.9358
Epoch 4/100
39720/39720 - 1s - loss: 0.1686 - accuracy: 0.9411 - val_loss: 0.1757 - val_accuracy: 0.9414
Epoch 5/100
39720/39720 - 1s - loss: 0.1637 - accuracy: 0.9429 - val_loss: 0.1745 - val_accuracy: 0.9406
Epoch 6/100
39720/39720 - 1s - loss: 0.1583 - accuracy: 0.9443 - val_loss: 0.1700 - val_accuracy: 0.9416
Epoch 7/100
39720/39720 - 1s - loss: 0.1544 - accuracy: 0.9464 - val_loss: 0.1763 - val_accuracy: 0.9420
Epoch 8/100
39720/39720 - 1s - loss: 0.1490 - accuracy: 0.9474 - val_loss: 0.1721 - val_accuracy: 0.9426


<tensorflow.python.keras.callbacks.History at 0x282270856c8>

## Test the model

As we discussed in the lectures, after training on the training data and validating on the validation data, we test the final prediction power of our model by running it on the test dataset that the algorithm has NEVER seen before.

It is very important to realize that fiddling with the hyperparameters overfits the validation dataset. 

The test is the absolute final instance. You should not test before you are completely done with adjusting your model.

If you adjust your model after testing, you will start overfitting the test dataset, which will defeat its purpose.

In [151]:
test_loss, test_accuracy = model.evaluate(test_inputs, test_targets)



In [152]:
print('\nTest loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))


Test loss: 0.18. Test accuracy: 94.04%


## 4.3 Model Performance without Additional Features
In the EDA step, additional features were added based on observation. In this section, we will explore model performance without the additional features. Base deep neural net will be used for that.

### Form Dataset without Added Features

In [153]:
# Column indices for the additional features

additional_features_indices = (np.arange(len(additional_features))+1).tolist()

# Train data without added features
train_inputs_base = np.delete(train_inputs, additional_features_indices, axis=1)

# Validation data without added features
validation_inputs_base = np.delete(validation_inputs, additional_features_indices, axis=1)

# Test data without added features
test_inputs_base = np.delete(test_inputs, additional_features_indices, axis=1)

### Model Performance

In [169]:
# Input/output layers size
input_size = train_inputs_base.shape[0]
output_size = 2

# Hidden layer size
hidden_layer_size = 50
    
# Model structure
model = tf.keras.Sequential([
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
    tf.keras.layers.Dense(output_size, activation='softmax') # classifier so use 'softmax'
])


# Optimizer, loss function and metrics
 
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

### Training

# Batch size
batch_size = 100

# Number of training epochs
max_epochs = 100

# Early stopping: Allow for 2 instances of training after minimum loss point reached for for train/validation loss
early_stopping = tf.keras.callbacks.EarlyStopping(patience=2)


# fit the model
model.fit(train_inputs_base, # train inputs
          train_targets, # train targets
          batch_size=batch_size, # batch size
          epochs=max_epochs, # number of iteration the training will run, unless early stopping kicks in
          callbacks=[early_stopping], # early stopping
          validation_data=(validation_inputs_base, validation_targets), # validation data
          verbose = 2 # print minimum information during training
          )  

Train on 39720 samples, validate on 4965 samples
Epoch 1/100
39720/39720 - 1s - loss: 0.2583 - accuracy: 0.8980 - val_loss: 0.1834 - val_accuracy: 0.9384
Epoch 2/100
39720/39720 - 1s - loss: 0.1807 - accuracy: 0.9354 - val_loss: 0.1795 - val_accuracy: 0.9412
Epoch 3/100
39720/39720 - 1s - loss: 0.1719 - accuracy: 0.9406 - val_loss: 0.1791 - val_accuracy: 0.9416
Epoch 4/100
39720/39720 - 1s - loss: 0.1657 - accuracy: 0.9423 - val_loss: 0.1781 - val_accuracy: 0.9404
Epoch 5/100
39720/39720 - 1s - loss: 0.1618 - accuracy: 0.9440 - val_loss: 0.1740 - val_accuracy: 0.9380
Epoch 6/100
39720/39720 - 1s - loss: 0.1555 - accuracy: 0.9455 - val_loss: 0.1743 - val_accuracy: 0.9414
Epoch 7/100
39720/39720 - 1s - loss: 0.1512 - accuracy: 0.9476 - val_loss: 0.1713 - val_accuracy: 0.9438
Epoch 8/100
39720/39720 - 1s - loss: 0.1468 - accuracy: 0.9483 - val_loss: 0.1722 - val_accuracy: 0.9422
Epoch 9/100
39720/39720 - 1s - loss: 0.1431 - accuracy: 0.9494 - val_loss: 0.1709 - val_accuracy: 0.9414
Epoch 

<tensorflow.python.keras.callbacks.History at 0x282451ed708>

In [170]:
test_loss, test_accuracy = model.evaluate(test_inputs_base, test_targets)



In [171]:
print('\nTest loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))


Test loss: 0.18. Test accuracy: 93.70%


- Added features improved the model performance by 0.8%

## 4.3 Hyperparameter Optimized Deep Neural Net
Hyperparameters are:
- Number of hidden units (width)
- Number of hidden layers (height)
- Combinations of width and height
- Activation function (Relu, tanh, leaky Relu, sigmoid)
- Batch size (1=SGD, 1000)
- Learning rates (high at beginning low at end)
  Try 0.0001, 0.02, 