In [1]:
import keras
import pandas as pd

Using TensorFlow backend.


# What will our model do?

><font size="4" color="#00A2S3"  face="verdana"> <B> The goal of our model will be to predict the hourly 
wages of a person by looking at factors such as their age, years of education, and Gender. </B></font> 


><font size="4" color="#00A2S3"  face="verdana"> <B> The dataset is in the form of a table with the hourly wage, age, years of education, and gender. We will train our model on this data in order for it predict the hourly wage of a new person. </B></font> 

# Reading The Training Data

## Important Things To Note

><font size="4" color="#00A2S3"  face="verdana"> <B> df stands for dataframe. </B></font> 

><font size="4" color="#ffab09"  face="verdana"> <B> Pandas reads in the csv file as a dataframe. </B></font> 

><font size="4" color="#90Z0B2"  face="verdana"> <B> The ‘head()’ function will show the first 5 rows of the dataframe so you can check that the data has been read in properly and can take an initial look at how the data is structured. </B></font> 

><font size="4" color="#00A0B2"  face="verdana"> <B> https://www.kaggle.com/c/predict-hourly-wage/data </B></font> 

In [2]:
## read in data using pandas

train_df = pd.read_csv("Income_training.csv")

# check data has been read in properly

train_df.head()

Unnamed: 0,compositeHourlyWages,age,yearsEducation,sex1M0F
0,21.38,58,10,1
1,25.15,42,16,1
2,8.57,31,12,0
3,12.07,43,13,0
4,10.97,46,12,0


# Split dataset into target and inputs

## The Next Steps

><font size="4" color="#90Z0B2"  face="verdana"> <B> Next, we need to split up our dataset into inputs (train_X) and our target (train_y). </B></font> 

><font size="4" color="#00A0B2"  face="verdana"> <B> Our input will be every column except ‘compositeHourlyWages’  because ‘compositeHourlyWages’ is what we will be attempting to predict. This makes it the target. </B></font> 

><font size="4" color="#00A0B2"  face="verdana"> <B> We will use pandas ‘drop’ function to drop the column ‘compositeHourlyWages’ from our dataframe and store it in the variable ‘train_X’. This will be our input.</B></font>

><font size="4" color="#00A0B2"  face="verdana"> <B> Next We will insert the column ‘wage_per_hour’ into our target variable (train_y). </B></font> 

In [3]:
#create a dataframe with all training data except the target column

train_X = train_df.drop(columns=["compositeHourlyWages"])

#check that the target variable has been removed

train_X.head()


Unnamed: 0,age,yearsEducation,sex1M0F
0,58,10,1
1,42,16,1
2,31,12,0
3,43,13,0
4,46,12,0


In [4]:
# create a dataframe with only the target column

train_y = train_df[["compositeHourlyWages"]]

# view dataframe

train_y.head()

Unnamed: 0,compositeHourlyWages
0,21.38
1,25.15
2,8.57
3,12.07
4,10.97


# Building The Model

## How Do We Build A Model?

## 1 

><font size="4" color="#90Z0B2"  face="verdana"> <B> The model type that we will be using is Sequential. Sequential is the easiest way to build a model in Keras. It allows you to build a model layer by layer. Each layer has weights that correspond to the layer the follows it. </B></font> 

In [5]:
from keras.models import Sequential

#create model
model = Sequential()

## 2

><font size="4" color="#90Z0B2"  face="verdana"> <B> We use the ‘add()’ function to add layers to our model. We will add two layers and an output layer.‘Dense’ is the layer type. Dense is a standard layer type that works for most cases. In a dense layer, all nodes in the previous layer connect to the nodes in the current layer. </B></font> 

In [6]:
from keras.layers import Dense

## 3

><font size="4" color="#ffab00"  face="verdana"> <B> We have 10 nodes in each of our input layers. This number can also be in the hundreds or thousands. Increasing the number of nodes in each layer increases model capacity. I will go into further detail about the effects of increasing model capacity shortly. </B></font> 

## 4

><font size="4" color="#ffab00"  face="verdana"> <B> The first layer needs an input shape. The input shape specifies the number of rows and columns in the input. The number of columns in our input is stored in ‘n_cols’. There is nothing after the comma which indicates that there can be any amount of rows.</B></font> 

In [7]:
#get number of columns in training data
n_cols = train_X.shape[1]

## 5

><font size="4" color="#90Z0B2"  face="verdana"> <B> ‘Activation’ is the activation function for the layer. An activation function allows models to take into account nonlinear relationships. For example, if you are predicting diabetes in patients, going from age 10 to 11 is different than going from age 60–61. </B></font> 

## 6

><font size="4" color="#00A0B2"  face="verdana"> <B> The activation function we will be using is ReLU or Rectified Linear Activation. Although it is two linear pieces, it has been proven to work well in neural networks.</B></font> 

## 7

><font size="4" color="#00A0B2"  face="verdana"> <B>The last layer is the output layer. It only has one node, which is for our prediction.</B></font> 


In [8]:
#add model layers
model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
model.add(Dense(10, activation='relu'))
model.add(Dense(1))

## Compiling the Model

## How do we compile the model?
"
><font size="4" color="#00A0B2"  face="verdana"> <B>Next, we need to compile our model. Compiling the model takes two parameters: optimizer and loss.</B></font> 

### What is an optimizer?

><font size="4" color="#00A2S3"  face="verdana"> <B>The optimizer goal is to minimize the loss function. We will be using ‘adam’ as our optmizer. Adam is generally a good optimizer to use for many cases. The adam optimizer adjusts the learning rate throughout training.</B></font> 

### What is Learning Rate?

><font size="4" color="#00A2S3"  face="verdana"> <B>The learning rate determines how fast the optimal weights for the model are calculated. A smaller learning rate may lead to more accurate weights (up to a certain point), but the time it takes to compute the weights will be longer.</B></font> 

### What is Loss?

><font size="4" color="#00A2S3"  face="verdana"> <B> Loss functions and optimizations. Machines learn by means of a loss function. It's a method of evaluating how well specific algorithm models the given data. If predictions deviates too much from actual results, loss function would cough up a very large number.</B></font> 

><font size="4" color="#99YBD2"  face="verdana"> <B>For our loss function, we will use ‘mean_squared_error’. It is calculated by taking the average squared difference between the predicted and actual values. It is a popular loss function for regression problems. The closer to 0 this is, the better the model performed.</B></font>

In [9]:
#compile model using mse as a measure of model performance

model.compile(optimizer='adam', loss='mean_squared_error')

## Training the Model

## How do we train the model?

><font size="4" color="#00A0B2"  face="verdana"> <B>Now we will train our model. To train, we will use the ‘fit()’ function on our model with the following five parameters: training data (train_X), target data (train_y), validation split, the number of epochs and callbacks.</B></font> 



## What is Validation Split?

><font size="4" color="#00A2S3"  face="verdana"> <B>Validation-split of Keras fit function. validation_split: rests between 0 and 1. Fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch.</B></font> 

><font size="4" color="#99YBD2"  face="verdana"> <B>We will randomly split the data into use for training and testing. During training, we will be able to see the validation loss, which gives the mean squared error of our model on the validation set. We will set the validation split at 0.2, which means that 20% of the training data we provide in the model will be set aside for testing model performance.</B></font> 


## What are epochs?

><font size="4" color="#00A2S3"  face="verdana"> <B>The number of epochs is the number of times the model will cycle through the data. The more epochs we run, the more the model will improve, up to a certain point. After that point, the model will stop improving during each epoch. In addition, the more epochs, the longer the model will take to run. To monitor this, we will use ‘early stopping’.</B></font> 


In [10]:
from keras.callbacks import EarlyStopping
#set early stopping monitor so the model stops training when it won't improve anymore

### Early stopping will stop the model from training before the number of epochs is reached if the model stops improving. 

### We will set our early stopping monitor to 3. This means that after 3 epochs in a row in which the model doesn’t improve, training will stop. 

### Training loss is the error on the training set of data. Validation loss is the error after running the validation set of data through the trained network.

### Sometimes, the validation loss can stop improving then improve in the next epoch, but after 3 epochs in which the validation loss doesn’t improve, it usually won’t improve again.


In [11]:
early_stopping_monitor = EarlyStopping(patience=3)

In [12]:
#train model
model.fit(train_X, train_y, validation_split=0.2, epochs=30, callbacks=[early_stopping_monitor])

Train on 2557 samples, validate on 640 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<keras.callbacks.callbacks.History at 0x1acb2de8fc8>

## Making Prediction On New Data

><font size="4" color="#00A0B2"  face="verdana"> <B>If you want to use this model to make predictions on new data, we would use the ‘predict()’ function, passing in our new data. The output would be ‘wage_per_hour’ predictions.</B></font> 


In [13]:
# How to use our newly trained model on how to make predictions on unseen data 

# read in new data using pandas

test_df = pd.read_csv("Income_testing.csv")

# check data has been read in properly

test_df.head()

Unnamed: 0,ID,age,yearsEducation,sex1M0F
0,1,36,20,0
1,2,38,17,0
2,3,24,10,0
3,4,39,12,1
4,5,50,12,0


In [14]:
test_X = test_df

#create a dataframe with all training data except the target column

#create a dataframe with all training data except the target column

test_X = test_df.drop(columns=["ID"])

#check that the target variable has been removed

test_X.head()

test_y_predictions = model.predict(test_X)

In [15]:
print(test_y_predictions)

[[19.165485 ]
 [18.596682 ]
 [11.546017 ]
 [16.971369 ]
 [16.819584 ]
 [17.312239 ]
 [17.370226 ]
 [17.487032 ]
 [12.538034 ]
 [15.524353 ]
 [10.668041 ]
 [11.263518 ]
 [11.923805 ]
 [14.769675 ]
 [15.131422 ]
 [21.341192 ]
 [19.541666 ]
 [12.196407 ]
 [14.290527 ]
 [20.553171 ]
 [16.275593 ]
 [13.21536  ]
 [14.736309 ]
 [20.508745 ]
 [15.613754 ]
 [20.024    ]
 [17.912943 ]
 [11.923805 ]
 [ 8.950714 ]
 [14.435908 ]
 [11.14501  ]
 [16.637854 ]
 [18.290066 ]
 [14.071308 ]
 [12.409339 ]
 [16.217606 ]
 [16.940168 ]
 [13.21155  ]
 [18.418364 ]
 [16.142561 ]
 [14.700412 ]
 [18.418364 ]
 [17.049112 ]
 [11.382027 ]
 [18.048498 ]
 [15.363766 ]
 [14.74556  ]
 [20.027338 ]
 [16.786432 ]
 [11.542616 ]
 [11.542616 ]
 [20.416948 ]
 [15.180959 ]
 [ 9.611001 ]
 [11.000836 ]
 [15.131422 ]
 [23.09286  ]
 [18.202671 ]
 [15.3576145]
 [17.83042  ]
 [21.8226   ]
 [17.453695 ]
 [14.1016865]
 [15.372591 ]
 [15.975503 ]
 [10.898744 ]
 [20.416948 ]
 [16.322279 ]
 [14.987751 ]
 [13.638253 ]
 [14.890257 ]
 [16.3

><font size="5" ><B>Congrats! You have built a deep learning model in Keras!</B></font> 
    
><font size="4" ><B>It is not very accurate yet, but that can improve with using a larger amount of training data and ‘model capacity’.</B></font> 
    
    

## Model Capacity

><font size="4" color="#00A2S3"  face="verdana"> <B>As you increase the number of nodes and layers in a model, the model capacity increases. Increasing model capacity can lead to a more accurate model, up to a certain point, at which the model will stop improving. Generally, the more training data you provide, the larger the model should be. We are only using a tiny amount of data, so our model is pretty small. The larger the model, the more computational capacity it requires and it will take longer to train.</B></font> 

><font size="4" color="#99YBD2"  face="verdana"> <B>Let’s create a new model using the same training data as our previous model. This time, we will add a layer and increase the nodes in each layer to 200. We will train the model to see if increasing the model capacity will improve our validation score.</B></font> 


In [16]:
#training a new model on the same data to show the effect of increasing model capacity

#create model
model_2 = Sequential()

#add model layers
model_2.add(Dense(100, activation='relu', input_shape=(n_cols,)))
model_2.add(Dense(100, activation='relu'))
model_2.add(Dense(100, activation='relu'))
model_2.add(Dense(1))

#compile model using mse as a measure of model performance
model_2.compile(optimizer='adam', loss='mean_squared_error')
#train model
model_2.fit(train_X, train_y, validation_split=0.2, epochs=30, callbacks=[early_stopping_monitor])

Train on 2557 samples, validate on 640 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30


<keras.callbacks.callbacks.History at 0x1acb3ec2348>

### We can see that by increasing our model capacity, we have improved our validation loss from 39.06 in our old model to 36.09 in our new model.

### By continuing to change the # of neurons in the layer we can get a different validation loss. You dont want too many neurons or too little.

### Try testing out different #'s of neurons to see if you can get a lower validation loss!

In [17]:
test_y_predictions = model_2.predict(test_X)

In [18]:
print(test_y_predictions)

[[18.68613  ]
 [18.14392  ]
 [10.2356205]
 [16.559536 ]
 [12.421405 ]
 [17.945156 ]
 [16.951225 ]
 [17.788654 ]
 [11.923449 ]
 [13.605982 ]
 [10.543438 ]
 [ 9.28663  ]
 [10.463318 ]
 [12.747608 ]
 [12.602343 ]
 [22.326134 ]
 [15.847848 ]
 [11.69287  ]
 [14.598878 ]
 [21.098122 ]
 [15.725738 ]
 [12.312199 ]
 [14.150507 ]
 [21.07153  ]
 [12.414058 ]
 [15.668151 ]
 [15.451301 ]
 [10.463318 ]
 [ 7.1242604]
 [13.588498 ]
 [ 8.451209 ]
 [15.401782 ]
 [17.957714 ]
 [14.20553  ]
 [ 8.873706 ]
 [16.348808 ]
 [12.49686  ]
 [11.446181 ]
 [16.42591  ]
 [15.96304  ]
 [11.484476 ]
 [16.42591  ]
 [16.323572 ]
 [10.137994 ]
 [13.487632 ]
 [13.98909  ]
 [11.613588 ]
 [20.83979  ]
 [16.481318 ]
 [ 9.775039 ]
 [ 9.775039 ]
 [20.076529 ]
 [13.916423 ]
 [ 8.307534 ]
 [ 9.477603 ]
 [12.602343 ]
 [21.015526 ]
 [17.331123 ]
 [14.217453 ]
 [14.859984 ]
 [22.5523   ]
 [16.526299 ]
 [11.108426 ]
 [12.5081   ]
 [12.275693 ]
 [ 9.28638  ]
 [20.076529 ]
 [14.216068 ]
 [11.089751 ]
 [ 9.962049 ]
 [12.700082 ]
 [14.2

## Congrats! You are now well on your way to building amazing deep learning models in Keras!