In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import models

### Layers: the building blocks of deep learning

The fundamental data structure in neural networks is the layer. A layer is a data-processing module that takes as input one or more tensors and that outputs one or more tensors. Some layers are stateless, but more frequently layers have a state: the layer’s weights, one or several tensors learned with stochastic gradient descent, which together contain the network’s knowledge.

Different types of layers are appropriate for different tensor formats and different types of data processing. For instance, simple vector data, stored in 2D tensors of shape (samples, features), is often processed by densely connected layers, also called fully connected or dense layers (the Dense class in Keras). Sequence data, stored in 3D tensors of shape (samples, timesteps, features), is typically processed by recurrent layers, such as an LSTM layer, or 1D convolution layers (Conv1D). Image data, stored in 4D tensors, is usually processed by 2D convolution layers (Conv2D).
You can think of layers as the LEGO bricks of deep learning, a metaphor that is made explicit by Keras. Building deep-learning models in Keras is done by clipping together compatible layers to form useful data-transformation pipelines.

#### The base **Layer** class in Keras

A simple API should have a single abstraction around which everything is centered. In Keras, that’s the Layer class. Everything in Keras is either a Layer or something that closely interacts with a **Layer**.

A **Layer** is an object that encapsulates some state (**weights**) and some computation (**a forward pass**). The **weights** are typically defined in a **build()** (although they could also be created in the constructor **init()**), and the computation is defined in the **call()** method.

In the previous chapter, we implemented a NaiveDense class that contained two weights W and b and applied the computation output = activation(dot(W, input) + b). This is what the same layer would look like in Keras:

In [None]:
class SimpleDense(keras.layers.Layer):  # All Keras layers inherit from the base Layer class.
  def __init__(self, units, activation=None):
    super(SimpleDense, self).__init__()
    self.units = units
    self.activation = activation

  def build(self, input_shape): # Weight creation takes place in the build() method.
    input_dim = input_shape[-1]
    self.W = self.add_weight(shape=(input_dim, self.units), initializer="random_normal")  # add_weight is a shortcut method for creating weights. It is also possible to create standalone variables and assign them as layer attributes,
                                                                                          #  like: self.W = tf.Variable(tf.random.uniform(w_shape)).
    self.b = self.add_weight(shape=(self.units), initializer="zeros")

  def call(self, inputs): # We define the forward pass computation in the call() method.
    y = tf.matmul(inputs, self.W) + self.b
    if self.activation is not None:
      y = self.activation(y)
    return y

Once instantiated, a layer like this can be used just like a function, taking as input a TensorFlow tensor:

In [None]:
my_dense = SimpleDense(units=32, activation=tf.nn.relu) # Instantiate our layer, defined above
input_tensor = tf.ones(shape=(2, 784))  # Create some test inputs
output_tensor = my_dense(input_tensor)  # Call the layer on the inputs, just like a function
print(output_tensor.shape)

(2, 32)


#### Automatic shape inference: building layers on the fly

Just like with LEGO bricks, you can only “clip” together layers that are compatible. The notion of layer compatibility here refers specifically to the fact that every layer will only accept input tensors of a certain shape and will return output tensors of a certain shape. Consider the following example:

In [None]:
layer = layers.Dense(units=32, activation="relu")

This layer will return a tensor where the first dimension has been transformed to be 32. It can only be connected to a downstream layer that expects 32-dimensional vectors as its input.

When using Keras, you don’t have to worry about size compatibility most of the time, because the layers you add to your models are dynamically built to match the shape of the incoming layer. For instance, suppose you write the following:

In [None]:
model = models.Sequential([
                           layers.Dense(32, activation="relu"),
                           layers.Dense(32)])

### From layers to models

A deep-learning model is a graph of layers. In Keras, that’s the Model class. For now, you’ve only seen Sequential models (a subclass of Model), which are simple stack of layers, mapping a single input to a single output. But as you move forward, you’ll be exposed to a much broader variety of network topologies. Some common ones are:

- Two-branch networks
- Multihead networks
- Residual connections


### The "compile" step: configuring the learning process

Once the model architecture is defined, you still have to choose three more things:

**Loss function (objective function)**  — The quantity that will be minimized during training. It represents a measure of success for the task at hand.

**Optimizer**  — Determines how the network will be updated based on the loss function. It implements a specific variant of stochastic gradient descent (SGD).

**Metrics** — The measures of success you want to monitor during training and validation, such as classification accuracy. Unlike the loss, training will not optimize directly for these metrics. As such, metrics don’t need to be differentiable.

Once you’ve picked your loss, optimizer, and metrics, you can use the built-in compile() and fit() methods to start training your model.

The compile() method configures the training process — you’ve already been introduced to it in your very first neural network example in chapter 2. It takes the argument optimizer, `loss, and metrics (a list):

In [17]:
model = keras.Sequential([keras.layers.Dense(units=1)]) # Define a linear classifier
model.compile(optimizer="rmsprop",  # Specify the optimizer by name: RMSprop (it’s case-insensitve)
              loss="mean_squared_error",  # Specify a list of metrics: in this case, only accuracy
              metrics=["accuracy"]) # Specify a list of metrics: in this case, only accuracy

In the above call to compile(), we passed the optimizer, loss, and metrics as strings (such as 'rmsprop'). These strings are actually shortcuts that get converted to Python objects. For instance, 'rmsprop' becomes keras.optimizers.RMSprop(). Importantly, it’s also possible to specify these arguments as object instances, like this:

In [18]:
model.compile(optimizer=keras.optimizers.RMSprop(),
              loss=keras.losses.MeanSquaredError(),
              metrics=[keras.metrics.BinaryAccuracy()])

This is useful if you want to pass your own custom losses or metrics, or if you want to further configure the objects you’re using — for instance, by passing a learning_rate argument to the optimizer:

In [8]:
# model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-4),
#               loss=my_custom_loss,
#               metrics=[my_custom_metric_1, my_custom_metric_2])

Keras offers a wide range of built-in options that is likely to include what you need:

**Optimizers:**

SGD() (with or without momentum)

RMSprop()

Adam()

Adagrad()

Etc.

**Losses:**

CategoricalCrossentropy()

SparseCategoricalCrossentropy()

BinaryCrossentropy()

MeanSquaredError()

KLDivergence()

CosineSimilarity()

Etc.

**Metrics:**

CategoricalAccuracy()

SparseCategoricalAccuracy()

BinaryAccuracy()

AUC()

Precision()

Recall()

Etc.

### Picking a loss function

Choosing the right loss function for the right problem is extremely important: your network will take any shortcut it can to minimize the loss; so if the objective doesn’t fully correlate with success for the task at hand, your network will end up doing things you may not have wanted. Imagine a stupid, omnipotent AI trained via SGD, with this poorly chosen objective function: “maximizing the average well-being of all humans alive.” To make its job easier, this AI might choose to kill all humans except a few and focus on the well-being of the remaining ones — because average well-being isn’t affected by how many humans are left. That might not be what you intended! Just remember that all neural networks you build will be just as ruthless in lowering their loss function — so choose the objective wisely, or you’ll have to face unintended side effects.

Fortunately, when it comes to common problems such as classification, regression, and sequence prediction, there are simple guidelines you can follow to choose the correct loss. For instance, you’ll use binary crossentropy for a two-class classification problem, categorical crossentropy for a many-class classification problem, and so on.

### Understanding the "fit" method

After compile() comes fit(). The fit method implements the training loop itself. Its key arguments are:

The **data (inputs and targets)** to train on. It will typically be passed either in the form of NumPy arrays, of a TensorFlow Dataset object. 

The number of **epochs** to train for: how many times the training loop should iterate over the data passed.

The **batch size** to use within each epoch of mini-batch gradient descent: the number of training examples considered to compute the gradients for one weight update step.

In [19]:

num_samples_per_class = 1000
negative_samples = np.random.multivariate_normal(
    mean=[0, 3], cov=[[1, 0.5],[0.5, 1]], size=num_samples_per_class)
positive_samples = np.random.multivariate_normal(
    mean=[3, 0], cov=[[1, 0.5],[0.5, 1]], size=num_samples_per_class)
inputs = np.vstack((negative_samples, positive_samples)).astype(np.float32)
targets = np.vstack((np.zeros((num_samples_per_class, 1), dtype='float32'),
                     np.ones((num_samples_per_class, 1), dtype='float32')))

In [20]:
# Calling fit with NumPy data
history = model.fit(
  inputs, # The input examples, as a NumPy array
  targets,  # The corresponding training targets, as a NumPy array
  epochs=5, # The training loop will iterate over the data in batches of 128 examples.
  batch_size=128  # The training loop will iterate over the data in batches of 128 examples.
)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [21]:
history.history

{'binary_accuracy': [0.9279999732971191,
  0.9259999990463257,
  0.9235000014305115,
  0.921999990940094,
  0.9210000038146973],
 'loss': [7.846403121948242,
  7.52724027633667,
  7.257632255554199,
  6.998773097991943,
  6.745824337005615]}

### Monitoring loss & metrics on validation data

To keep an eye on how the model does on new data, it’s standard practice to reserve a subset of the training data as “validation data”: you won’t be training the model on this data, but you will use it to compute a loss value and metrics value. You do this use the **validation_data** argument in **fit()**. Like the training data, the validation data could be passed as NumPy arrays or as a TensorFlow Dataset object.

In [23]:
#  Using the validation data argument
model = keras.Sequential([keras.layers.Dense(1)])
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=0.1),
              loss=keras.losses.MeanSquaredError(),
              metrics=[keras.metrics.BinaryAccuracy()])

indices_permutation = np.random.permutation(len(inputs))  # To avoid having samples from only one class in the validation data,
shuffled_inputs = inputs[indices_permutation]             #  shuffle the inputs and targets using a random indices permutation
shuffled_targets = targets[indices_permutation]

num_validation_samples = int(0.3 * len(inputs))           # Reserve 20% of the training inputs and targets for “validation” 
val_inputs = shuffled_inputs[-num_validation_samples:]    # (we’ll exclude these samples from training and reserve them 
val_targets = shuffled_targets[-num_validation_samples:]  # to compute the “validation loss” and metrics)
training_inputs = shuffled_inputs[:num_validation_samples]
training_targets = shuffled_targets[:num_validation_samples]

history = model.fit(
  training_inputs,  # Training data, used to update the weights of the model
  training_targets,
  epochs=5,
  batch_size=16,
  validation_data=(val_inputs, val_targets) # Validation data, used only to monitor the “validation loss” and metrics
)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


The value of the loss on the validation data is called the “validation loss”, to distinguish it from the “training loss”. Note that it’s essential to keep the training data and validation data strictly separate: the purpose of validation is to monitor whether what the model is learning is actually useful on new data. If any of the validation data has been seen by the model during training, your validation loss and metrics will be flawed.

Note that if you want to compute the validation loss and metrics after training is complete, you can call the evaluate method:

In [40]:
loss_and_metrics = model.evaluate(val_inputs, val_targets, batch_size=128)



evaluate() will iterate in batches (of size batch_size) over the data passed, and return a list of scalars, where the first entry is the validation loss and the following entries are the validation metrics. If the model has no metrics, only the validation loss is returned (rather than a list).

### Inference: using a model after training

Once you’ve trained your model, you’re going to want to use it to make predictions on new data. This is called "inference". To do this, a naive approach would simply be to *call* the model:

In [None]:
predictions = model(new_inputs) # Takes a NumPy array or TensorFlow tensor and returns a TensorFlow tensor
predictions = model.predict(new_inputs, batch_size=128) # Takes a NumPy array or a Dataset and returns a NumPy array

For instance, if we use predict() on some of our validation data with the linear model we trained earlier, we get scalar scores between 0 and 1 — below 0.5 indicates that the model considers the corresponding point to belong to class 0, and above 0.5 indicates that the model considers the corresponding point to belong to class 1.

In [44]:
 predictions = model.predict(val_inputs, batch_size=128)
 print(predictions[:10])

[[1.3338068 ]
 [0.68316877]
 [0.45047593]
 [0.17697117]
 [1.401538  ]
 [0.4708088 ]
 [0.42412478]
 [1.0589097 ]
 [1.368237  ]
 [1.2308586 ]]
