<a href="https://colab.research.google.com/github/isaacsemerson/deeplearning-python-fchollet/blob/main/fchollet_chapter3_6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
# Run these imports before running anything else.
# I have not tested the code in this file. These were individual examples in the book.
import tensorflow as tf
from tensorflow import keras
from keras import layers

In [4]:
class SimpleDense(keras.layers.Layer):
  def __init__(self, units, activation=None):
    super().__init__()
    self.units = units
    self.activation = activation

  def build(self, input_shape):
    input_dim = input_shape[-1]
    self.W = self.add_weight(shape=(input_dim, self.units), initializer="random_normal")
    self.b = self.add_weight(shape=(self.units,), initializer="zeros")

  def call(self, inputs):
    y = tf.matmul(inputs, self.W) + self.b
    if self.activation is not None:
      y = self.activation(y)
    return y

my_dense = SimpleDense(units=32, activation=tf.nn.relu)
input_tensor = tf.ones(shape=(2, 784))
output_tensor = my_dense(input_tensor)
print("Shape of output tensor from SimpleDense implementation:", output_tensor.shape)

Shape of output tensor from SimpleDense implementation: (2, 32)


Listing 3.22 - Here we are building a "Dense" layer, deriving from the Keras layers API. We have a build function to generate the weights, and a call function to perform the actual computation (forward pass). I believe these functions are called from within the Keras API?

The "matmul(input, W)" part of the calculation influences the output shape. As you see in the self.W creation, our W shape is a combination of the last dimension of our input tensor, and the output shape ("units" as passed to the constructor). Dot product can be visualized as two rectangles combining into one, with the result being a tensor shaped after the input rows (samples) and output size (units). Chapter 2 has a great diagram for this.

In [None]:
def __call__(self, inputs):
  if not self.built:
    self.build(inputs.shape)
    self.built = True
  return self.call(inputs)

Listing 3.22b - This is a general idea of how the base layer call function works. As expected above, we reference both build and call. This allows us to perform JIT state registration when the layer is called within the model.

One benefit of JIT state is automatic inference of the layer's input shape. Chapter 2 showed a basic Dense layer implementation where the weights were initalized within the constructor (NaiveDense). The problem with this is that you need to know the previous model weights before compile time, which can be difficult in complex scenarios. The book tells us to think of layers as lego bricks (if that helps, the input needs to match the last output basically).

In [None]:
model = keras.Squential([keras.layers.Dense(1)])
model.compile(optimizer="rmsprop",
              loss="mean_squared_error",
              accuracy=["accuracy"])

Listing 3.22c - Here is an example of the compile step for a keras-backed model. We pass in three parameters:
- Optimizer, this is what moves the weights in a direction reducing loss (main part of backward pass, training the model). So far this has been variants of SGD.
- Loss, this calculates how far away the predictions are from results. (in the case above, we subtract the predictions from results and get a mean value)
- Accuracy, Metrics to determine how successful you are. Biggest difference between accuracy and loss is that the model does not optimize for accuracy.

These strings are references to functions. You can pass in your own functions. There are also parameters for some of the keras functions (such as `keras.optimizers.RMSprop(learning_rate)`).

In [None]:
# This will not work without inputs and targets (need sample data)
history = model.fit(
    inputs,
    targets,
    epochs=5,
    batch_size=128
)

Listing 3.22d - This is the training loop. Essentially the last step to calling a model at runtime. We pass in the input and target arrays, assign an epoch (how many times through the data), and a batch size to pull from.