# TensorFlow 2 Tutorial: Get Started in Deep Learning With tf.keras
    
by Jason Brownlee on June 21, 2020.[Here](https://machinelearningmastery.com/tensorflow-tutorial-deep-learning-with-tf-keras/) in [Deep Learning](https://machinelearningmastery.com/tensorflow-tutorial-deep-learning-with-tf-keras/)

Using tf.keras allows you to design, fit, evaluate, and use deep learning models to make predictions in just a few lines of code. It makes common deep learning tasks, such as classification and regression predictive modeling, accessible to average developers looking to get things done.

After completing this tutorial, you will know:

- The `difference between Keras and tf.keras` and how to install and confirm TensorFlow is working.
- The `5-step life-cycle of tf.keras models` and how to use the sequential and functional APIs.
- How to `develop MLP, CNN, and RNN models` with tf.keras for regression, classification, and time series forecasting.
- How to use the `advanced features` of the tf.keras API to inspect and diagnose your model.
- How to `improve the performance` of your tf.keras model by reducing overfitting and accelerating training.

## Overview 

1. Install TensorFlow and tf.keras
    - What Are Keras and tf.keras?
    - How to Install TensorFlow
     -How to Confirm TensorFlow Is Installed
2. Deep Learning Model Life-Cycle
    - The 5-Step Model Life-Cycle
    - Sequential Model API (Simple)
    - Functional Model API (Advanced)
3. How to Develop Deep Learning Models
    - Develop Multilayer Perceptron Models
        - MLP for Binary Classification
        - MLP for Multiclass Classification
        - MLP for Regression
    - Develop Convolutional Neural Network Models
    - Develop Recurrent Neural Network Models
4. How to Use Advanced Model Features
    - How to Visualize a Deep Learning Model
    - How to Plot Model Learning Curves
    - How to Save and Load Your Model
5. How to Get Better Model Performance
    - How to Reduce Overfitting With Dropout
    - How to Accelerate Training With Batch Normalization
    - How to Halt Training at the Right Time With Early Stopping

### 1. Install TensorFlow and tf.keras

#### 1.1 What Are Keras and tf.keras?

In 2019, Google released a new version of their TensorFlow deep learning library (TensorFlow 2) that integrated the Keras API directly and promoted this interface as the default or standard interface for deep learning development on the platform.

This integration is commonly referred to as the tf.keras interface or API (“tf” is short for “TensorFlow“). This is to distinguish it from the so-called standalone Keras open source project.

- __Standalone Keras__. The standalone open source project that supports TensorFlow, Theano and CNTK backends.
- __tf.keras__. The Keras API integrated into TensorFlow 2.

The Keras API implementation in Keras is referred to as “tf.keras” because this is the Python idiom used when referencing the API. First, the TensorFlow module is imported and named “tf“; then, Keras API elements are accessed via calls to tf.keras; for example:

#### 1.2 How to Install TensorFlow -How to Confirm TensorFlow Is Installed

In [1]:
# validate tf.keras
import tensorflow as tf
model = tf.keras.Sequential()
print(tf.__version__)

2.1.0


### 2. Deep Learning Model Life-Cycle
You will discover the life-cycle for a deep learning model and the two tf.keras APIs that you can use to define models.

#### 2.1 The 5-Step Model Life-Cycle

The five steps in the life-cycle are as follows:

- Define the model.
- Compile the model.
- Fit the model.
- Evaluate the model.
- Make predictions.

##### Define the model.
Defining the model requires that you `first select the type of model that you need and then choose the architecture or network topology`.

__From an API perspective__, this involves defining the layers of the model, configuring each layer with a number of nodes and activation function, and connecting the layers together into a cohesive model.

Models can be defined either with the __Sequential API__ or the __Functional API__, and we will take a look at this in the next section.

##### Compile the model.
Compiling the model requires that you `first select a loss function that you want to optimize, such as mean squared error or cross-entropy`.

It also requires that you select an algorithm to perform the optimization procedure, typically __stochastic gradient descent__, or a modern variation, such as __Adam__. It may also require that you `select any performance metrics to keep track of during the model training process`.

__From an API perspective__, this involves calling a function to compile the model with the chosen configuration, which will prepare the appropriate data structures required for the efficient use of the model you have defined.

The __optimizer__ can be specified as a string for a known optimizer class, e.g. ‘sgd‘ for stochastic gradient descent, or you can configure an instance of an optimizer class and use that. [Here](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers)

The three most common loss functions are [Here](https://www.tensorflow.org/api_docs/python/tf/keras/losses):

- `binary_crossentropy` for binary classification.
- `sparse_categorical_crossentropy` for multi-class classification.
- `mse` (mean squared error) for regression.

__Metrics__ are defined as a list of strings for known metric functions or a list of functions to call to evaluate predictions. List of supported metrics [Here](https://www.tensorflow.org/api_docs/python/tf/keras/metrics).

##### Fit the model.
Fitting the model requires that you first `select the training configuration`, such as the __number of epochs__ (loops through the training dataset) and the __batch size__ (number of samples in an epoch used to estimate model error).

__From an API perspective__, this `involves calling a function to perform the training process`. This function will block (not return) until the training process has finished.

- Batch size Tutorial. [How to Control the Stability of Training Neural Networks With the Batch Size](https://machinelearningmastery.com/how-to-control-the-speed-and-stability-of-training-neural-networks-with-gradient-descent-batch-size/)

##### Evaluate the model.
Evaluating the model requires that you `first choose a holdout dataset used to evaluate the model`. This should be data not used in the training process so that we can get an unbiased estimate of the performance of the model when making predictions on new data.

__From an API perspective__, this involves calling a function with the holdout dataset and getting a loss and perhaps other metrics that can be reported.

##### Make predictions.
__From an API perspective__, you simply call a function to `make a prediction of a class label, probability, or numerical value`: whatever you designed your model to predict.

#### 2.2 Sequential Model API (Simple)
It is referred to as “__sequential__” because `it involves defining a Sequential class and adding layers to the model one by one in a linear manner, from input to output` [Here](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential).

The example below defines a __Sequential MLP__ model that accepts __eight inputs__, has __one hidden layer with 10 nodes__ and then an __output layer with 1 node__ to predict a numerical value.

In [2]:
# example of a model defined with the sequential api
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

# define the model
model = Sequential()
model.add(Dense(10, input_shape=(8,)))
model.add(Dense(1))

Note that the visible layer of the network is defined by the “input_shape” argument on the first hidden layer. That means in the above example, the model expects the input for one sample to be a vector of eight numbers.

The sequential API is easy to use because you keep calling model.add() until you have added all of your layers.

For example, here is a deep MLP with five hidden layers.

In [3]:
# example of a model defined with the sequential api
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

# define the model
model = Sequential()
model.add(Dense(100, input_shape=(8,)))
model.add(Dense(80))
model.add(Dense(30))
model.add(Dense(10))
model.add(Dense(5))
model.add(Dense(1))

#### 2.3 Functional Model API (Advanced)

It involves explicitly connecting the output of one layer to the input of another layer. Each connection is specified.

- First, an input layer must be defined via the Input class, and the shape of an input sample is specified. 
- Fully connected layer can be connected to the input by calling the layer and passing the input layer.
- We can then connect this to an output layer in the same manner.
- Once connected, we define a Model object and specify the input and output layers. 

In [4]:
# example of a model defined with the functional api
from tensorflow.keras import Model
from tensorflow.keras import Input
from tensorflow.keras.layers import Dense

# define the input layers
x_in = Input(shape=(8,))

# calling a leyer
x = Dense(10)(x_in)

# conect input layer to output
x_out = Dense(1)(x)

# define the model
model = Model(inputs=x_in, outputs=x_out)

As such, it allows for more complicated model designs, such as models that may have multiple input paths (separate vectors) and models that have multiple output paths (e.g. a word and a number).

The functional API can be a lot of fun when you get used to it. [Here](https://www.tensorflow.org/guide/keras/functional)

### 3. How to Develop Deep Learning Models

Discover how to `develop`, `evaluate`, and `make predictions` with standard deep learning models, including __Multilayer Perceptrons__ (MLP), __Convolutional Neural Networks__ (CNNs), and __Recurrent Neural Networks__ (RNNs).

#### 3.1 Develop Multilayer Perceptron Models (MLP)

Standard fully connected neural network model.

It is comprised of `layers of nodes where each node is connected to all outputs from the previous layer and the output of each node is connected to all inputs` for nodes in the next layer.

An MLP is `created by with one or more Dense layers`. This model is __appropriate for tabular data__, that is data as it looks in a table or spreadsheet with one column for each variable and one row for each variable. There are three `predictive modeling problems` you may want to explore with an MLP; they are binary __classification__, __multiclass classification__, and __regression__.

#### 3.1.1 MLP for Binary Classification

We will use the Ionosphere binary (two-class) classification dataset to demonstrate an MLP for binary classification.

The dataset will be downloaded
- Ionosphere Dataset ([csv](https://raw.githubusercontent.com/jbrownlee/Datasets/master/ionosphere.csv)).
- Ionosphere Dataset Description ([csv](https://raw.githubusercontent.com/jbrownlee/Datasets/master/ionosphere.names)).

We will use a LabelEncoder to encode the string labels to integer values 0 and 1. The model will be fit on 67 percent of the data, and the remaining 33 percent will be used for evaluation, split using the train_test_split() function.


It is a good practice to use ‘__relu__‘ activation with a ‘__he_normal__‘ weight initialization. This `combination goes a long way to overcome the problem of vanishing gradients when training deep neural network models`.

- A Gentle Introduction to the Rectified Linear Unit ([ReLU](https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/)) 

The model predicts the probability of class 1 and uses the sigmoid activation function. 
- The model is optimized using the [adam version of stochastic gradient descent](https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/).
- The model seeks to minimize the [cross-entropy loss](https://machinelearningmastery.com/cross-entropy-for-machine-learning/).

In [3]:
# mlp for binary classification
from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

# load the dataset
# path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/ionosphere.csv'
path = '..\\..\\data\\ionosphere.data.csv'
df = read_csv(path, header=None)
df.shape

(351, 35)

In [6]:
# split into input and output columns
X, y = df.values[:, :-1], df.values[:, -1]

# ensure all data are floating point values
X = X.astype('float32')

In [8]:
# encode strings to integer
y = LabelEncoder().fit_transform(y)

array([1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
       1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
       1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
       1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
       1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
       0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
       0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
       0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
       0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
       0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
       0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
       0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

In [9]:
# split into train and test datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

(235, 34) (116, 34) (235,) (116,)


In [10]:
# determine the number of input features
n_features = X_train.shape[1]
n_features

34

In [11]:
# define model
model = Sequential()
model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,)))
model.add(Dense( 8, activation='relu', kernel_initializer='he_normal'))
model.add(Dense( 1, activation='sigmoid'))

In [12]:
# compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

In [13]:
# fit the model
model.fit(X_train, y_train, epochs=150, batch_size=32, verbose=0)

<tensorflow.python.keras.callbacks.History at 0x2ab51774348>

In [15]:
# evaluate the model
loss, acc = model.evaluate(X_test, y_test, verbose=0)
print('Test loss: %.3f' % loss)
print('Test Accuracy: %.3f' % acc)

Test loss: 0.135
Test Accuracy: 0.931


In [18]:
# make a prediction
row = [1,0,0.99539,-0.05889,0.85243,0.02306,0.83398,-0.37708,1,0.03760,0.85243,-0.17755,0.59755,-0.44945,0.60536,-0.38223,0.84356,-0.38542,0.58212,-0.32192,0.56971,-0.29674,0.36946,-0.47357,0.56811,-0.51171,0.41078,-0.46168,0.21266,-0.34090,0.42267,-0.54487,0.18641,-0.45300]
yhat = model.predict([row])

print(yhat)
print('Predicted: %.3f' % yhat)

[[0.9805202]]
Predicted: 0.981


#### 3.1.2 MLP for Multiclass Classification
We will use the Iris flowers multiclass classification dataset to demonstrate an MLP for multiclass classification.

he dataset will be downloaded
- Iris Dataset ([csv](https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv)).
- Iris Dataset Description ([csv](https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.names)).

Given that it is a multiclass classification, the model __must have one node for each class in the output layer and use the softmax activation function__. The loss function is the ‘__sparse_categorical_crossentropy__‘, which is appropriate for integer encoded class labels (e.g. 0 for one class, 1 for the next class, etc.)

In [19]:
# mlp for multiclass classification
from numpy import argmax
from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

# load the dataset
##path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv'
path = '..\\..\\data\\iris.data.csv'
df = read_csv(path, header=None)
df.shape

(150, 5)

In [31]:
# split into input and output columns
X, y = df.values[:, :-1], df.values[:, -1]

# ensure all data are floating point values
X = X.astype('float32')
y[0:10]

array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa'], dtype=object)

In [32]:
# encode strings to integer
y = LabelEncoder().fit_transform(y)
y[0:10]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [33]:
# split into train and test datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

(100, 4) (50, 4) (100,) (50,)


In [34]:
# determine the number of input features
n_features = X_train.shape[1]
n_features

4

In [35]:
# define model
model = Sequential()
model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,)))
model.add(Dense( 8, activation='relu', kernel_initializer='he_normal'))
model.add(Dense( 3, activation='softmax'))

In [36]:
# compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

In [37]:
# fit the model
model.fit(X_train, y_train, epochs=150, batch_size=32, verbose=0)

<tensorflow.python.keras.callbacks.History at 0x2ab532483c8>

In [38]:
# evaluate the model
loss, acc = model.evaluate(X_test, y_test, verbose=0)

print('Test Loss: %.3f' % loss)
print('Test Accuracy: %.3f' % acc)

Test Loss: 0.624
Test Accuracy: 0.760


In [39]:
# make a prediction
row = [5.1,3.5,1.4,0.2]
yhat = model.predict([row])

print('Predicted: %s (class=%d)' % (yhat, argmax(yhat)))

Predicted: [[0.78086895 0.21446277 0.00466831]] (class=0)


#### 3.1.3 MLP for Regression
We will use the Boston housing regression dataset to demonstrate an MLP for regression predictive modeling.

The dataset will be downloaded
- Boston Housing Dataset ([csv](https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv)).
- Boston Housing Dataset Description ([csv](https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.names)).

This is a regression problem that involves predicting a single numerical value. As such, the output layer has a single node and uses the default or linear activation function (no activation function). The mean squared error (mse) loss is minimized when fitting the model.

__Recall that this is a regression, not classification; therefore, we cannot calculate classification accuracy__.
- Tutorial. [Difference Between Classification and Regression in Machine Learning](https://machinelearningmastery.com/classification-versus-regression-in-machine-learning/)

In [40]:
# mlp for regression
from numpy import sqrt
from pandas import read_csv
from sklearn.model_selection import train_test_split
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

# load the dataset
#path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'
path = '..\\..\\data\\housing.data.csv'
df = read_csv(path, header=None)
df.shape

(506, 14)

In [41]:
# split into input and output columns
X, y = df.values[:, :-1], df.values[:, -1]

# split into train and test datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

(339, 13) (167, 13) (339,) (167,)


In [42]:
# determine the number of input features
n_features = X_train.shape[1]
n_features

13

In [43]:
# define model
model = Sequential()
model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,)))
model.add(Dense( 8, activation='relu', kernel_initializer='he_normal'))
model.add(Dense( 1))

In [44]:
# compile the model
model.compile(optimizer='adam', loss='mse')

In [45]:
# fit the model
model.fit(X_train, y_train, epochs=150, batch_size=32, verbose=0)

<tensorflow.python.keras.callbacks.History at 0x2ab53541ec8>

In [46]:
# evaluate the model
error = model.evaluate(X_test, y_test, verbose=0)
print('MSE: %.3f, RMSE: %.3f' % (error, sqrt(error)))

MSE: 49.971, RMSE: 7.069


In [47]:
# make a prediction
row = [0.00632,18.00,2.310,0,0.5380,6.5750,65.20,4.0900,1,296.0,15.30,396.90,4.98]
yhat = model.predict([row])
print('Predicted: %.3f' % yhat)

Predicted: 29.706


In this case, we can see that the model achieved an MSE of about 60 which is an RMSE of about `7 (units are thousands of dollars)`. A value of about 26 is then predicted for the single example.

#### 3.2 Develop Convolutional Neural Network Models (CNN)
Convolutional Neural Networks, or CNNs for short, are a type of network `designed for image input`.

They are comprised of models with [convolutional layers](https://machinelearningmastery.com/convolutional-layers-for-deep-learning-neural-networks/) that extract features (called feature maps) and [pooling layers](https://machinelearningmastery.com/pooling-layers-for-convolutional-neural-networks/) that distill features down to the most salient elements.

A popular image classification task is the __MNIST handwritten digit classification__. It involves tens of thousands of handwritten digits that must be classified as a number between 0 and 9.

In [1]:
# example of a cnn for image classification
from numpy import asarray
from numpy import unique
from numpy import argmax
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPool2D
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dropout

# MNIST handwritten digit classification
from tensorflow.keras.datasets.mnist import load_data

In [2]:
# load dataset
(x_train, y_train), (x_test, y_test) = load_data()

# reshape data to have a single channel
x_train = x_train.reshape((x_train.shape[0], x_train.shape[1], x_train.shape[2], 1))
x_test = x_test.reshape((x_test.shape[0], x_test.shape[1], x_test.shape[2], 1))

# determine the shape of the input images
in_shape = x_train.shape[1:]

# determine the number of classes
n_classes = len(unique(y_train))
print(in_shape, n_classes)

(28, 28, 1) 10


In [3]:
# normalize pixel values
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

In [4]:
# define model
model = Sequential()
model.add(Conv2D(32, (3,3), activation='relu', kernel_initializer='he_uniform', input_shape=in_shape))
model.add(MaxPool2D((2, 2)))
model.add(Flatten())
model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
model.add(Dropout(0.5))
model.add(Dense(n_classes, activation='softmax'))

In [5]:
# define loss and optimizer
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

In [7]:
# evaluate the model
loss, acc = model.evaluate(x_test, y_test, verbose=0)
print('Loss: %.3f' % loss)
print('Accuracy: %.3f' % acc)

Loss: 2.450
Accuracy: 0.128


In [8]:
# make a prediction
image = x_train[0]
yhat = model.predict(asarray([image]))
print('Predicted: class=%d' % argmax(yhat))

Predicted: class=2


In this case, we can see that the model achieved a classification accuracy of about 98 percent on the test dataset. We can then see that the model predicted class 5 for the first image in the training set.

#### 3.3 Develop Recurrent Neural Network Models (RNN)