#**Module 13: Neural Networks and Deep Learning--Classification**
In the previous module, you have learned how a Deep Learning Network works and how to build one that will predict one numeric value. 

**A Deep Neural Network--Regression**
<div>
<center>
<img src="https://raw.githubusercontent.com/shstreuber/Data-Mining/master/images/deepnn_regression.png" width="600">
</div>

In this module, we will start with a Deep Learning NN for Classification purposes. Instead of producing ONE numeric value, we will be configuring it to yield categorical output. This means that you'll see as many nodes in the output layer as there are level to your target categories, like the two below.

**A Deep Neural Network--Classification**
<div>
<center>
<img src="https://raw.githubusercontent.com/shstreuber/Data-Mining/master/images/deepnn_classification2layers.png" width="600">
</div>

At the end of this module, you will be able to:

* Configure a deep learning Classification Network 
* Distinguish the activation functions in the output layer for classifications from those for regression
* Apply regular data classification techniques to image classification
* Describe the special cases of Convolutional Neural Networks
* Solve a simple classification problem

To get started, please watch this great video that shows you where we are going with this (and if you don't remember the content from the instructor video any more, please [review that](https://youtu.be/RkiTL_T8VsY), as well):


In [1]:
from IPython.display import HTML
HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/vF21cC-8G1U" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>')

##**The Problem**
In the previous module, we used a regression Deep Learning network to predict incomeUSD and age. We will start this module with predicting a categorical variable in our adult dataset: Race.

#**0. Preparation and Setup**
You will see that setting up a Deep Learning model for a simple classification is very similar to building a regression model. In fact, the only difference, really, is in the output layer. So, we'll step through the whole development process a bit more quickly than in the previous module.

In [2]:
import tensorflow as tf # This tells Colab that we are using TensorFlow

from tensorflow import keras # This is the main TensorFlow library
from tensorflow.keras import layers # We are building a Neural Network with several hidden layers
from tensorflow.keras.layers.experimental import preprocessing
from keras.models import Sequential
from keras.layers import Dense

print("Current TensorFlow version is", tf.__version__)

import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns # for visualization
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import warnings
warnings.filterwarnings("ignore")
np.random.seed(42)

#Reading in the data as adult dataframe
adult = pd.read_csv("https://raw.githubusercontent.com/shstreuber/Data-Mining/master/data/adult.data.simplified.csv")
adult.head()

Current TensorFlow version is 2.8.2


Unnamed: 0,age,workclass,education,educationyears,maritalstatus,occupation,relationship,race,sex,hoursperweek,nativecountry,incomeUSD
0,39,State-gov,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,40,United-States,43747
1,50,Self-emp-not-inc,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,13,United-States,38907
2,38,Private,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,40,United-States,25055
3,53,Private,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,40,United-States,26733
4,28,Private,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,40,Cuba,23429


#**1. Exploratory Data Analysis**
Which is your favorite way of doing this? Practice it below.

#**2. Preprocessing**
No difference here from what we did before; in fact, in the previous model, we already prepared the "race" variable for the output layer by one-hot encoding it. Below is a summary of the code with comments

In [3]:
# Downsizing the Dataset to just the numeric attributes
adult_dl = pd.DataFrame(adult, columns = ['age', 'educationyears', 'race','hoursperweek','incomeUSD'])

# Splitting into Training and Test Set
train_dataset = adult_dl.sample(frac=0.8, random_state=0)
test_dataset = adult_dl.drop(train_dataset.index)

# Splitting Features from Labels
train_features = train_dataset.copy()
test_features = test_dataset.copy()

train_labels = train_features.pop('race')
test_labels = test_features.pop('race')

In [4]:
train_labels.head() # Let's see what the training labels look like now

22278                 White
8950                  White
7838                  White
16505    Amer-Indian-Eskimo
19140                 White
Name: race, dtype: object

In [5]:
# Encoding the output variable with pd.get_dummies
train_labels1 = pd.get_dummies(train_labels, columns=['race'], prefix='', prefix_sep='')

# Normalizing the input variables
normalizer = preprocessing.Normalization(axis=-1)
normalizer.adapt(np.array(train_features))

In [6]:
train_labels1.head() # Let's see what the training labels look like encoded

Unnamed: 0,Amer-Indian-Eskimo,Asian-Pac-Islander,Black,Other,White
22278,0,0,0,0,1
8950,0,0,0,0,1
7838,0,0,0,0,1
16505,1,0,0,0,0
19140,0,0,0,0,1


#**3. Build the Keras Model**
Now, we can build the Sequential model and add layers one at a time until we are happy with our network architecture.

* To build the **input layer**, we need to define the number of input features. We use the **input_dim** argument and set it to 4 for the 4 input variables ('age', 'educationyears','hoursperweek','incomeUSD').
* The **output layer** will be our race attribute with 5 levels (Amer-Indian-Eskimo,	Asian-Pac-Islander,	Black,	Other,	White) 


###**How do we know the number and architecture of layers in the middle?** 

The short answer is: We don't. The longer answer is: We experiment until we get the best output the fastest. The even longer answer is: We can use various optimization strategies that can help us out somewhat. So, let's assume that trial and error has shown us that three layers is optimal. Furthermore, let's assume that we are going to build a **Dense Network**, aka a **fully connected** network structure, in which every node is connected with every node in the next layer. 

To define this architecture, we use the Dense class. We will specify the number of neurons or nodes in the layer as the first argument, and set up the activation function with the activation argument.

Speaking of **activation function**, we will use the **rectified linear unit** or ReLU activation function on the first two layers and the Softmax function in the output layer (if our output were between 0 and 1, we would use the Sigmoid function). 

###**Model Design**
So, our model looks like this:

* The model expects rows of data with 4 variables ('age', 'educationyears', 'hoursperweek', and 'incomeUSD' = the input_dim=4 argument)
* The first hidden layer has 12 nodes and uses the relu activation function.
* The second hidden layer has 8 nodes and uses the relu activation function.
* The output layer has five nodes and uses the Softmax activation function.

In [None]:
# define the Keras model
model = Sequential()
model.add(Dense(12, input_dim=4, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='softmax'))
model.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_9 (Dense)              (None, 12)                60        
_________________________________________________________________
dense_10 (Dense)             (None, 8)                 104       
_________________________________________________________________
dense_11 (Dense)             (None, 1)                 9         
Total params: 173
Trainable params: 173
Non-trainable params: 0
_________________________________________________________________


#**4. Compile the Keras Model**
Now that the model is defined, we can compile it. To do so, we must specify 
* the **loss function** to use to evaluate a set of weights. In our case, we will use **categorical_crossentropy**.
* the **optimizer** searches through different weights for the network and any optional metrics we would like to collect and report during training. In our case, we will define the optimizer as the efficient stochastic gradient descent algorithm “**adam**“. This is a popular version of gradient descent because it automatically tunes itself and gives good results in a wide range of problems.
* Finally, because this is a classification problem, we will collect and report the **classification accuracy**, defined via the **metrics** argument.

In [None]:
# compile the Keras model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

#**5. Train the Model**
Now that we have defined our model and compiled it, it is time to train the model on some data. We use the fit() function for this purpose. Training occurs over **epochs** and **each epoch is split into batches**.

* **Epoch**: One pass through all of the rows in the training dataset. The training process will perform a set number of iterations through the dataset  that we must specify using the 'epochs' argument.

* **Batch**: The number of dataset rows that are considered before the model weights are updated within each epoch. One epoch contains one or more batches, based on the defined 'batch_size' argument. 

For this problem, we will run for a small number of epochs (150) and use a relatively small batch size of 10.

###**How do we know the number of epochs and the batch size?**
Three words: Trial and error. Again. That's because we will be revising the model until we get the smallest loss function (aka the smallest error). Now, the model will always have **some** error, but the amount of error will level out after some point for a given model configuration. This is called model convergence.


In [None]:
# fit the Keras model on the dataset
model.fit(train_features, train_labels, epochs=150, batch_size=10)

#**6. Evaluate the Model**
We have trained our neural network and we can now evaluate the performance of the network on the test dataset. To evaluate your model on your training dataset, use the evaluate() function on your model and pass it the test data.

This will generate a prediction for each input and output pair and collect scores, including the average loss and any metrics you have configured, such as accuracy.

The evaluate() function will return a list with two values. The first will be the loss of the model on the dataset and the second will be the accuracy of the model on the dataset. We are only interested in reporting the accuracy, so we will ignore the loss value.

In [None]:
# evaluate the keras model
accuracy = model.evaluate(test_features, test_labels1)
print('Accuracy: %.2f' % (accuracy*100))

#Your Turn
As you can tell, steps 5 and 6 above don't work (yet!), and there is also a CRITICAL error in building the Keras model under steps 3 and 4. 

This exercise tests your ability to **research** and **debug**. You already know a lot about running this same network on the same data, but as a regression problem, so NOW your job is to translate the regression problem into a classification problem for 'race'. All the building blocks you need are in this workbook.

**Here is your job:**
1. Fix the Keras model in section 3.
2. Be sure it is compiled correctly in section 4.
3. Fix the code in sections 5 and 6 so that the model will run 
4. Research how to use the predict() function to run the model on the test_features and test_labels. Remember that you will have to encode the test labels in order to use them in the output layer!

Use the fields below to work on your code.