# Agenda
1. About the Dataset
2. Objective
3. Loading Libraries
4. Loading Data
5. View Data
6. Separate Input Features and Output Features
7. Split The Data into Train and Test Set
8. Train the model (The five step model life cycle)
  1. Define the model.
  2. Compile the model.
  3. Fit the model.
  4. Evaluate the model
    * Hyperparameter Tunning
  5. Prediction

## About the Dataset
We will be working on a data set that comes from the real estate industry in Boston (US). This database contains 14 attributes. The target variable refers to the median value of owner-occupied homes in 1000 USD's.

* CRIM: per capita crime rate by town
* ZN: proportion of residential land zoned for lots over 25,000 sq.ft.
* INDUS: proportion of non-retail business acres per town
* CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
* NOX: nitric oxides concentration (parts per 10 million)
* RM: average number of rooms per dwelling
* AGE: proportion of owner-occupied units built prior to 1940
* DIS: weighted distances to five Boston employment centres
* RAD: index of accessibility to radial highways
* TAX: full-value property-tax rate per 10,000 USD
* PTRATIO: pupil-teacher ratio by town
* B: 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
* LSTAT: lower status of the population (%)
* MEDV: Median value of owner-occupied homes in 1000 USD's (Target)


## Objective
The objective is to use linear regression to find the median value of owner-occupied homes in 1000 USD's.

We will build a Machine learning model (i.e. Linear Regression) using `tensorflow.keras` (in short `tf.keras`) API.

## Loading Libraries
All Python capabilities are not loaded to our working environment by default (even they are already installed in your system). So, we import each and every library that we want to use.

In data science, numpy and pandas are most commonly used libraries. Numpy is required for calculations like means, medians, square roots, etc. Pandas is used for data processing and data frames. Matplotlib is used for data visualization. We chose alias names for our libraries for the sake of our convenience (numpy --> np and pandas --> pd, matplotlib.pyplot as plt).

**pyplot:** pyplot is matplotlib's plotting framework. It is the most used module of matplotlib.

In [1]:
# importing packages
import numpy as np # to perform calculations 
import pandas as pd # to read data
import matplotlib.pyplot as plt # to visualise

## Loading Data
Pandas module is used for reading files. We have our data in '.csv' format. We will use 'read_csv()' function for loading the data.

In [2]:
# In read_csv() function, we have passed the location to where the file is located at dphi official github page
boston_data = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/Boston_Housing/Training_set_boston.csv" )

## View Data

In [3]:
boston_data.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,15.0234,0.0,18.1,0.0,0.614,5.304,97.3,2.1007,24.0,666.0,20.2,349.48,24.91,12.0
1,0.62739,0.0,8.14,0.0,0.538,5.834,56.5,4.4986,4.0,307.0,21.0,395.62,8.47,19.9
2,0.03466,35.0,6.06,0.0,0.4379,6.031,23.3,6.6407,1.0,304.0,16.9,362.25,7.83,19.4
3,7.05042,0.0,18.1,0.0,0.614,6.103,85.1,2.0218,24.0,666.0,20.2,2.52,23.29,13.4
4,0.7258,0.0,8.14,0.0,0.538,5.727,69.5,3.7965,4.0,307.0,21.0,390.95,11.28,18.2


# Separating Input Features and Output Features
Before building any machine learning model, we always separate the input variables and output variables. Input variables are those quantities whose values are changed naturally in an experiment, whereas output variable is the one whose values are dependent on the input variables. So, input variables are also known as independent variables as its values are not dependent on any other quantity, and output variable/s are also known as dependent variables as its values are dependent on other variable i.e. input variables. Like here in this data, we are trying to predict the price of a houce, so this is our target column i.e. 'MEDV'

By convention input variables are represented with 'X' and output variables are represented with 'y'.

In [4]:
X = boston_data.drop('MEDV', axis = 1)    # Input Variables/features
y = boston_data.MEDV      # output variables/features

# Splitting the data

We want to check the performance of the model that we built. For this purpose, we always split (both input and output data) the given data into training set which will be used to train the model, and test set which will be used to check how accurately the model is predicting outcomes.

For this purpose we have a class called 'train_test_split' in the 'sklearn.model_selection' module.

We split 80% of the data to the training set while 20% of the data to test set using below code.
The test_size variable is where we actually specify the proportion of the test set.

By passing our X and y variables into the train_test_split method, we are able to capture the splits in data by assigning 4 variables to the result.

In [5]:
# import train_test_split
from sklearn.model_selection import train_test_split 

# Assign variables to capture train test split output
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# X_train: independent/input feature data for training the model
# y_train: dependent/output feature data for training the model
# X_test: independent/input feature data for testing the model; will be used to predict the output values
# y_test: original dependent/output values of X_test; We will compare this values with our predicted values to check the performance of our built model.
 
# test_size = 0.20: 20% of the data will go for test set and 70% of the data will go for train set
# random_state = 42: this will fix the split i.e. there will be same split for each time you run the code

In [6]:
# find the number of input features
n_features = X.shape[1]
print(n_features)

13


# Training our model


After splitting the data into training and testing sets, it's time to train our first deep learning model. Wait! Before training the deep learning model, let's understand the **Deep Learning Model Life-Cycle**.

## Neural Network: Architecture
Here we are giving you just an overview of the architecture of Neural Network. You will know more about it in next module.

Neural Networks consists of an input and output layer with one or more hidden layers.

![neural network architecture](https://dphi-courses.s3.ap-south-1.amazonaws.com/Deep+Learning+Bootcamp/nn+arch.png)

## The 5 Step Model Life-Cycle

A model has a life-cycle, and this very simple knowledge provides the backbone for both modeling a dataset and understanding the tf.keras API.

The five steps in the life-cycle are as follows:

1. Define the model.
2. Compile the model.
3. Fit the model.
4. Make predictions on the test data.
5. Evaluate the model.

We will take closer look into each of the steps and parallely build the deep learning model.

### 1. Define the model
Defining the model requires that you first select the type of model that you need and then choose the architecture or network topology.

From an API perspective, this involves defining the layers of the model, configuring each layer with a number of nodes and activation function, and connecting the layers together into a cohesive model.

Models can be defined either with the Sequential API or the Functional API (you will know this in later modules). Here we will define the model with Sequential API. Now **what is Sequential API?**

**Sequential API**
The sequential API is the simplest API to get started with Deep Learning. 

It is referred to as “sequential” because it involves defining a Sequential class and adding layers to the model one by one in a linear manner, from input to output.

The example below defines a Sequential MLP model that accepts one input (i.e. 'YearsExperience'), has one hidden layer with 1 node and then an output layer with one node to predict a numerical value.



In [7]:
from tensorflow.keras import Sequential    # import Sequential from tensorflow.keras
from tensorflow.keras.layers import Dense  # import Dense from tensorflow.keras.layers
from numpy.random import seed     # seed helps you to fix the randomness in the neural network.  
import tensorflow

In [8]:
# define the model
model = Sequential()
model.add(Dense(10, activation='relu', input_shape=(n_features,)))
model.add(Dense(8, activation='relu'))
model.add(Dense(1))

Note that the visible layer of the network is defined by the “input_shape” argument on the first hidden layer. That means in the above example, the model expects the input for one sample to be a vector of n_features (i.e. 13) number . 

The sequential API is easy to use because you keep calling model.add() until you have added all of your layers.

The activation function we have chosen is **ReLU**, which stands for **rectified linear unit**. Activation function decides, whether a neuron should be activated or not

ReLU is defined mathematically as **F(x) = max(0,x)**. In other words, the output is x, if x is greater than 0, and the output is 0 if x is 0 or negative.

### 2. Compile the model
Compiling the model requires that you first select a loss function that you want to optimize, such as mean squared error or cross-entropy.

It also requires that you select an algorithm to perform the optimization procedure. We’re using **RMSprop** as our optimizer here. RMSprop stands for **Root Mean Square Propagation**. It’s one of the most popular gradient descent optimization algorithms for deep learning networks. RMSprop is an optimizer that’s reliable and fast. 

**Note:** For the time being understand gradient descent as just an optimization algorithm. You will know more about it in the next module.

It may also require that you select any performance metrics to keep track of during the model training process. The loss function used here is **mean squared error.** (don't worry if you don't know about the loss function mean squared error, for the time being just know it's a function that helps you know the error or loss your model is giving. You will learn more about loss functions in the coming modules)

From an API perspective, this involves calling a function to compile the model with the chosen configuration, which will prepare the appropriate data structures required for the efficient use of the model you have defined.

In [9]:
# import RMSprop optimizer
from tensorflow.keras.optimizers import RMSprop
optimizer = RMSprop(0.01)    # 0.01 is the learning rate

**Why learning rate = 0.01?**

It is important to find a good value for the learning rate for your model on your training dataset. we cannot analytically calculate the optimal learning rate for a given model on a given dataset. Instead, a good (or good enough) learning rate must be discovered via trial and error.

The range of values to consider for the learning rate is less than 1.0 and greater than $10^{-6}$.

A traditional default value for the learning rate is 0.1 or 0.01, and this may represent a good starting point on your problem.

In [10]:
model.compile(loss='mean_squared_error',optimizer=optimizer)    # compile the model

### 3. Fitting the model
Fitting the model requires that you first select the training configuration, such as the number of epochs (loops through the training dataset) and the batch size (number of samples in an epoch used to estimate model error).

Training applies the chosen optimization algorithm to minimize the chosen loss function and updates the model using the backpropagation (don't worry if you don't know this term, you will know it in the next module) of error algorithm.

Fitting the model is the slow part of the whole process and can take seconds to hours to days, depending on the complexity of the model, the hardware you’re using, and the size of the training dataset.

From an API perspective, this involves calling a function to perform the training process. This function will block (not return) until the training process has finished.

While fitting the model, a progress bar will summarize the status of each epoch and the overall training process.

In [11]:
seed_value = 42
seed(seed_value)        # If you build the model with given parameters, set_random_seed will help you produce the same result on multiple execution


# Recommended by Keras -------------------------------------------------------------------------------------
# 1. Set `PYTHONHASHSEED` environment variable at a fixed value
import os
os.environ['PYTHONHASHSEED']=str(seed_value)

# 2. Set `python` built-in pseudo-random generator at a fixed value
import random
random.seed(seed_value)

# 3. Set `numpy` pseudo-random generator at a fixed value
import numpy as np
np.random.seed(seed_value)
# Recommended by Keras -------------------------------------------------------------------------------------


# 4. Set the `tensorflow` pseudo-random generator at a fixed value
tensorflow.random.set_seed(seed_value) 
model.fit(X_train, y_train, epochs=10, batch_size=30, verbose = 1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7fa11051fcd0>

What is **verbose**?

By setting verbose 0, 1 or 2 you just say how do you want to 'see' the training progress for each epoch.

`verbose=0` will show you nothing (silent)

`verbose=1` will show you an animated progress bar like this:

![progres_bar](https://dphi-courses.s3.ap-south-1.amazonaws.com/Deep+Learning+Bootcamp/progress+bar.png)

`verbose=2` will just mention the number of epoch like this:

![verbose = 2](https://dphi-courses.s3.ap-south-1.amazonaws.com/Deep+Learning+Bootcamp/epoch.png)

### 4. Evaluate the model
Evaluating the model requires that you first choose a holdout dataset used to evaluate the model. This should be data not used in the training process i.e. the X_test.

The speed of model evaluation is proportional to the amount of data you want to use for the evaluation, although it is much faster than training as the model is not changed.

From an API perspective, this involves calling a function with the holdout dataset and getting a loss and perhaps other metrics that can be reported.

In [12]:
model.evaluate(X_test, y_test)



54.9976921081543

The mean squared error we got here is 64.8. Now, **what does it mean?**

When you subtract the predicted values (of X_test data) from the acutal value (of X_test data), then square it and sum all the squares, and finally take a mean (i.e. average) of it, the result you will get is 64.8 in this case.

evaluate() does this task automatically. If you want to get the prediciton for X_test you can do **`model.predict(X_test)`**

#### Hyperparameter Tunning
The hyperparameters here in this notebook are:
1. Learning Rate
2. Epochs
3. Batch Size

We can try and change the values of these parameters and see the performance  of the model (evaluate the model) on X_test data

**Learning Rate**

A scalar used to train a model via gradient descent. During each iteration, the **gradient descent** algorithm multiplies the learning rate by the gradient. The resulting product is called the **gradient step**.

Learning rate is a key **hyperparameter**.

In [13]:
####################### Complete example to check the performance of the model with different learning rates #######################################
# define the model
model = Sequential()
model.add(Dense(10, activation='relu', input_shape=(n_features,)))
model.add(Dense(8, activation='relu'))
model.add(Dense(1))

optimizer = RMSprop(0.1)    # 0.1 is the learning rate
model.compile(loss='mean_squared_error',optimizer=optimizer)    # compile the model

# fit the model 
model.fit(X_train, y_train, epochs=10, batch_size=30, verbose = 1)

# evaluate the model
print('The MSE value is: ', model.evaluate(X_test, y_test))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
The MSE value is:  122.83949279785156


As you can see above, how the loss (cost) i.e. MSE has changed by just changing the learning rate.

### Exercise 1

Test several learning rate values to see the impact of varying this value when defining your model.

In [14]:
# Play with learning rate
learning_rate = ?          # Replace ? with a floating-point number
epochs = 10
optimizer = RMSprop(learning_rate)
model.compile(loss='mean_squared_error',optimizer=optimizer)    # compile the model
model.fit(X_train, y_train, epochs=epochs, batch_size=30)     # fit the model
model.evaluate(X_test, y_test)       # Evaluate the model

SyntaxError: invalid syntax (<ipython-input-14-d055f907dbf2>, line 2)

**Epochs**

A full training pass over the entire dataset such that each example has been seen once. Thus, an epoch represents N/batch size training iterations, where N is the total number of examples.

In [15]:
####################### Complete example to check the performance of the model with different epochs and learning rate = 0.01 #######################################
# define the model
model = Sequential()
model.add(Dense(10, activation='relu', input_shape=(n_features,)))
model.add(Dense(8, activation='relu'))
model.add(Dense(1))

optimizer = RMSprop(0.1)    # 0.1 is the learning rate
model.compile(loss='mean_squared_error',optimizer=optimizer)    # compile the model

# fit the model 
model.fit(X_train, y_train, epochs=100, batch_size=30, verbose = 1)

# evaluate the model
print('The MSE value is: ', model.evaluate(X_test, y_test))

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

You can see above how the loss (cost) i.e. MSE has changed just by changing the epochs and keeping the learning rate same as 0.01 (i.e. the first model we built)

### Exercise 2

Test several epoch values to see the impact of varying this value when defining your model.

In [16]:
# Play with epochs
learning_rate = 0.01         
epochs = ?             # Replace ? with an integer
optimizer = RMSprop(learning_rate)
model.compile(loss='mean_squared_error',optimizer=optimizer)    # compile the model
model.fit(X_train, y_train, epochs=epochs, batch_size=30)     # fit the model
model.evaluate(X_test, y_test)       # Evaluate the model

SyntaxError: invalid syntax (<ipython-input-16-ac2bf0d4f1e0>, line 3)

### Exercise 3

Find the best possible combination of *learning rate* and *epochs* while testing some combinations

In [17]:
# play with learning rate and epochs
learning_rate = ?        # Replace ? with a floating-point number
epochs = ?             # Replace ? with an integer
optimizer = RMSprop(learning_rate)
model.compile(loss='mean_squared_error',optimizer=optimizer)    # compile the model
model.fit(X_train, y_train, epochs=epochs, batch_size=30)     # fit the model
model.evaluate(X_test, y_test)       # Evaluate the model

SyntaxError: invalid syntax (<ipython-input-17-fd774d61b041>, line 2)

**Batch Size**

The number of examples in a batch.

In [18]:
####################### Complete example to check the performance of the model with different batch size while keeping epochs as 30 and learning rate as 0.01 #######################################
# define the model
model = Sequential()
model.add(Dense(10, activation='relu', input_shape=(n_features,)))
model.add(Dense(8, activation='relu'))
model.add(Dense(1))

optimizer = RMSprop(0.1)    # 0.1 is the learning rate
model.compile(loss='mean_squared_error',optimizer=optimizer)    # compile the model

# fit the model 
model.fit(X_train, y_train, epochs=10, batch_size=40, verbose = 1)

# evaluate the model
print('The MSE value is: ', model.evaluate(X_test, y_test))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
The MSE value is:  441.9665222167969


You can see above the cost(loss) value i.e. MSE for batch size 40 while keeping epochs as 10 and learning rate as 0.01

### Exercise 4

Test several batch size values to see the impact of varying this value when defining your model.

In [19]:
# play with batch size
learning_rate = 0.01        
epochs = 150         
batch = ?      # Replace ? with an integer    
optimizer = RMSprop(learning_rate)
model.compile(loss='mean_squared_error',optimizer=optimizer)    # compile the model
model.fit(X_train, y_train, epochs=epochs, batch_size=batch)     # fit the model
model.evaluate(X_test, y_test)       # Evaluate the model

SyntaxError: invalid syntax (<ipython-input-19-c8c597e9060a>, line 4)

#### **Summary of hyperparameter tuning**
Most machine learning problems require a lot of hyperparameter tuning. Unfortunately, we can't provide concrete tuning rules for every model. Lowering the learning rate can help one model converge efficiently but make another model converge much too slowly. You must experiment to find the best set of hyperparameters for your dataset. That said, here are a few rules of thumb:

*  Training loss should steadily decrease, steeply at first, and then more slowly until the slope of the curve reaches or approaches zero.
*  If the training loss does not converge, train for more epochs.
*  If the training loss decreases too slowly, increase the learning rate. Note that setting the learning rate too high may also prevent training loss from converging.
*  If the training loss varies wildly (that is, the training loss jumps around), decrease the learning rate.
*  Lowering the learning rate while increasing the number of epochs or the batch size is often a good combination.
*  Setting the batch size to a very small batch number can also cause instability. First, try large batch size values. Then, decrease the batch size until you see degradation.
*  For real-world datasets consisting of a very large number of examples, the entire dataset might not fit into memory. In such cases, you'll need to reduce the batch size to enable a batch to fit into memory.

Remember: the ideal combination of hyperparameters is data dependent, so you must always experiment and verify.

We can do a hyperparameter tuning procedure in two ways:
1. Implementing hyperparameter tuning with Sklearn
2. Implementing hyperparameter tuning with Keras

#### **Implementing hyperparameter tuning with Sklearn**
Well, we can automate the hyperparameter tunning using **GridSearCV**. GridSearchCV is a hyperparameter search procedure that is done over a defined grid of hyperparameters. Each one of the hyperparameter combinations is used for training a new model, while a cross-validation process is executed to measure the performance of the provisional models. Once the process is done, the hyperparameters and the model with the best performance are chosen.


Let's first take a look at the implementation of GridSearchCV with Sklearn, following the steps:
1. Define the general architecture of the model
2. Define the hyperparameters grid to be validated
3. Run the GridSearchCV process
4. Print the results of the best model

In [20]:
# Import the GridSearchCV class
from sklearn.model_selection import GridSearchCV

# 1. Define the model's architecture
model = Sequential()
model.add(Dense(10, activation='relu', input_shape=(n_features,)))
model.add(Dense(8, activation='relu'))
model.add(Dense(1))
optimizer = RMSprop(0.1)    # 0.1 is the learning rate
model.compile(loss='mean_squared_error',optimizer=optimizer)    # compile the model

# 2. Define the hyperparameters grid to be validated
batch_size = [10, 20, 40, 60, 80, 100]
epochs = [10, 50, 100]
param_grid = dict(batch_size=batch_size, epochs=epochs)
grid = GridSearchCV(estimator=model, param_grid=param_grid, scoring='neg_mean_squared_error', n_jobs=-1)

# 3. Run the GridSearchCV process
grid_result = grid.fit(X_train, y_train)

# 4. Print the results of the best model
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

TypeError: Cannot clone object '<tensorflow.python.keras.engine.sequential.Sequential object at 0x7fa1000c3cd0>' (type <class 'tensorflow.python.keras.engine.sequential.Sequential'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' method.

We can observe an error in the hyperparameter tuning procedure using native Sklearn, because the defined model is a Sequential model implemented by Keras, not a scikit-learn estimator. In order to correct this error, we will integrate Sklearn and Keras properly, by (a) creating a `create_model` function that allows to create the model in an automated way, and (b) defining a `KerasRegressor` model which is an implementation of the scikit-learn regressor API for Keras.

In [21]:
# ----------------------------- Functional Tuning - Option 1: using Sklearn  ------------------------------
# Goal: tune the batch size and epochs

# Import KerasRegressor class
from keras.wrappers.scikit_learn import KerasRegressor

# Define the model trhough a user-defined function
def create_model(optimizer=RMSprop(0.01)):
  model = Sequential()
  model.add(Dense(10, activation='relu', input_shape=(n_features,)))
  model.add(Dense(8, activation='relu'))
  model.add(Dense(1))
  model.compile(loss='mse', metrics=['mse'], optimizer=optimizer)    # compile the model
  return model
model = KerasRegressor(build_fn=create_model, verbose=1)

# Define the hyperparameters grid to be validated
batch_size = [10, 20, 30, 40, 60, 80, 100]
epochs = [10, 50, 100]
param_grid = dict(batch_size=batch_size, nb_epoch=epochs)
model = KerasRegressor(build_fn=create_model, verbose=1)
grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=5, n_jobs=-1)

# Run the GridSearchCV process
grid_result = grid.fit(X_train, y_train, verbose = 1)

# Print the results of the best model
print('Best params: ' + str(grid_result.best_params_))

Best params: {'batch_size': 40, 'nb_epoch': 100}


In [22]:
# Import the cross validation evaluator
from sklearn.model_selection import cross_val_score

# Measure the model's performance
results = cross_val_score(grid.best_estimator_, X_test, y_test, cv=5)
print('Results: \n  * Mean:', -results.mean(), '\n  * Std:', results.std())

Results: 
  * Mean: 364.5093505859375 
  * Std: 177.47323450474244


#### **Implementing hyperparameter tuning with Keras**
Now we will go through the process of automating hyperparameter tuning using **Random Search** and **Keras**. Random Search is a hyperparameter search procedure that is performed on a defined grid of hyperparameters. However, not all hyperparameter combinations are used to train a new model, only some selected randomly, while a process of cross-validation to measure the performance of temporal models. Once the process is complete, the hyperparameters and the best performing model are chosen.

Let's take a look at the implementation of Random Search with Keras, following the steps:

0. Install and import all the packages needed
1. Define the general architecture of the model through a creation function
2. Define the hyperparameters grid to be validated
3. Run the GridSearchCV process
4. Print the results of the best model

To execute the hyperparameter tuning procedure we will use the `keras-tuner`, a library that helps you pick the optimal set of hyperparameters for your TensorFlow model.

In [23]:
# ----------------------------- Functional Tuning - Option 2: using Keras Tuner ------------------------------
# Goal: tune the learning rate

# 0. Install and import all the packages needed
!pip install -q -U keras-tuner
import kerastuner as kt

# 1. Define the general architecture of the model through a creation user-defined function
def model_builder(hp):
  model = Sequential()
  model.add(Dense(10, activation='relu', input_shape=(n_features,)))
  model.add(Dense(8, activation='relu'))
  model.add(Dense(1))
  hp_learning_rate = hp.Choice('learning_rate', values = [1e-1, 1e-2, 1e-3, 1e-4]) # Tuning the learning rate (four different values to test: 0.1, 0.01, 0.001, 0.0001)
  optimizer = RMSprop(learning_rate = hp_learning_rate)                            # Defining the optimizer
  model.compile(loss='mse',metrics=['mse'], optimizer=optimizer)                   # Compiling the model 
  return model                                                                     # Returning the defined model

# 2. Define the hyperparameters grid to be validated
tuner_rs = kt.RandomSearch(
              model_builder,                # Takes hyperparameters (hp) and returns a Model instance
              objective = 'mse',            # Name of model metric to minimize or maximize
              seed = 42,                    # Random seed for replication purposes
              max_trials = 5,               # Total number of trials (model configurations) to test at most. Note that the oracle may interrupt the search before max_trial models have been tested.
              directory='random_search')    # Path to the working directory (relative).

# 3. Run the GridSearchCV process
tuner_rs.search(X_train, y_train, epochs=10, validation_split=0.2, verbose=1)

You should consider upgrading via the '/opt/conda/bin/python3.7 -m pip install --upgrade pip' command.[0m
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


The iterative hyperparameter optimization process that has been completed has gone through the four defined Learning rate values (0.1, 0.01, 0001, and 0.0001), running a 10-epochs training processo per learning rate, with a validation set of 20% of the general dataset. 

Let's see the summary of the hyperparameter optimization process:

In [24]:
# 4.1. Print the summary results of the hyperparameter tuning procedure
tuner_rs.results_summary()

The summary shows the general performance measured in MSE for each of the learning rate variations in the hyperparameter tuning process. The variations are sorted from lowest to highest performance, which is why we see that the model with the best performance is the one with a learning rate of 0.01, and the worst model the one with a learning rate of 0.0001. This performance is due to the Gradient Descent process. A balance must be sought in the learning rate value, which allows finding the best regression coefficients for which the loss is the minimum.

Let's now look at the general performance of the model, evaluated with our testing set. We access the best model (model in position 0) by accessing the `get_best_models(num_models = 1)[0]`, where `num_models` refers to the number of models to extract and the `0` index to the index where the extraction of the models begins, which in this case will be only the best one. Then we evaluate the model using the `evalate()` function and our testing set (features (`X_test`) and real target values (`y_test`)):

In [25]:
# 4.2. Print the results of the best model
best_model = tuner_rs.get_best_models(num_models=1)[0]
best_model.evaluate(X_test, y_test)



[54.85664367675781, 54.85664367675781]

#### 5. Make a Prediction
Making a prediction is the final step in the life-cycle. It is why we wanted the model in the first place.

It requires you have new data for which a prediction is required, e.g. where you do not have the target values.

From an API perspective, you simply call a function to make a prediction of a class label, probability, or numerical value: whatever you designed your model to predict.

We have our new test data located at the given github location:

https://raw.githubusercontent.com/dphi-official/Datasets/master/Boston_Housing/Testing_set_boston.csv



In [26]:
# Load new test data
new_test_data = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/Boston_Housing/Testing_set_boston.csv')

In [27]:
# make a prediction
model.predict(new_test_data)

AttributeError: 'KerasRegressor' object has no attribute 'model'


**Congratulations! You have successfully build your first deep learning model and predicted the output (i.e. MEDV) of new test data.**

#### Resources
*  [https://machinelearningmastery.com/tensorflow-tutorial-deep-learning-with-tf-keras/](https://machinelearningmastery.com/tensorflow-tutorial-deep-learning-with-tf-keras/)
*  [https://heartbeat.fritz.ai/linear-regression-using-keras-and-python-7cee2819a60c](https://heartbeat.fritz.ai/linear-regression-using-keras-and-python-7cee2819a60c)
*  Google Machine Learning Crash Course