
# Keras

The most convenient way to use TensorFlow with neural networks is through [Keras](http://keras.io). It provides a high-level interface that is somewhat a compromise between very high-level abstractions like scikit-learn and the complete control of every detail you get when directly using the low-level APIs of libraries like TensorFlow. There is a separate [Keras Documentation](https://keras.io), as well as [Guides](https://www.tensorflow.org/guide/keras), [Tutorials](https://www.tensorflow.org/tutorials/keras), and the [Keras section on the TensorFlow API Documentation](https://www.tensorflow.org/api_docs/python/tf/keras).

In the past, besides TensorFlow, Keras also supported [Theano](http://www.deeplearning.net/software/theano/) and [CNTK](https://docs.microsoft.com/en-us/cognitive-toolkit/). Since end of 2019, the multi-backend version of keras is deprecated and development only continues in the version that is included in TensorFlow. Keras is now also the recommended/default way to work with neural networks in TensorFlow.

## Build a model in Keras

In [None]:
import tensorflow as tf

As a quick example, let's again build a model to classify the "Moons" dataset.

In [None]:
from sklearn.datasets import make_moons

In [None]:
x, y = make_moons(n_samples=10000, noise=0.2)

There are 3 ways to use Keras - via the Sequential API, the Functional API or via creating layers and models by subclassing. Lets start with `Sequential`. This is convenient for all models where we just have one input and one output Tensor with stacked Layers in between. Here we use the `Dense` layer - which is precisely the fully connected NN layer that applies the $\sigma(W\mathbf{x} + \mathbf{b})$ operation.

In [None]:
from tensorflow.keras.layers import Dense

model = tf.keras.models.Sequential([
    # Hidden layer with 2 inputs, 16 outputs
    Dense(16, activation="relu", input_shape=(2,)),
    # Output layer with 16 inputs (determined automatically) and 1 output
    Dense(1, activation="sigmoid")
])

How much parameters will our model have? The answer:

In [None]:
model.summary()

We can also access the underlying Tensors if needed:

In [None]:
model.inputs

In [None]:
model.outputs

In [None]:
model.weights

In [None]:
model.layers

In [None]:
model.layers[0].input

In [None]:
model.layers[0].output

Both models and layers are callables, so you can feed them tensors to get transformed outputs. This can be very useful to experiment and understand what transformations are done:

In [None]:
inputs = tf.constant([[1, 2], [3, 4]], dtype=tf.float32)

In [None]:
model(inputs)

In [None]:
layer = Dense(10)

In [None]:
layer(inputs)

In [None]:
layer.weights

In [None]:
tf.matmul(inputs, layer.weights[0])

## Train the model

Before we can run the training, we have to "compile" the model. This will configure the loss function and optimization Algorithm. You cat pass each loss from [`keras.losses`](https://keras.io/losses) and each optimizer from [`keras.optimizers`](https://keras.io/optimizers) also as a string with the name if you want to use it with default parameters. Here we want to use the "Adam" optimizer with an adjusted initial learning rate, so we pass it directly.

We could also pass some metrics that we want to monitor during training (in addition to the Loss value).

In [None]:
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.1), loss="binary_crossentropy")

The API for fitting looks similar to scikit-learn, but has additional options. There also is a [scikit-learn API  wrapper](https://github.com/adriangb/scikeras) for Keras if you need that in some context.

In [None]:
history = model.fit(x, y, epochs=3, batch_size=128)

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.plot(history.epoch, history.history['loss'])

## Run the model

The model can be run using `model.predict` or simply calling it like a function on an input. The main difference is that `model.predict` supports several parameters (like `batch_size`) and returns a numpy array whereas calling the model like a function returns a Tensor.

In [None]:
import numpy as np

In [None]:
grid = np.meshgrid(
    np.arange(x[:,0].min(), x[:,0].max(), 0.1),
    np.arange(x[:,1].min(), x[:,1].max(), 0.1),
)

In [None]:
xy = np.stack([grid[0].ravel(), grid[1].ravel()], axis=1)
xy

In [None]:
model(xy)

In [None]:
model.predict(xy)

In [None]:
scores = model(xy).numpy()

In [None]:
plt.contourf(grid[0], grid[1], scores.reshape(grid[0].shape), cmap="Spectral_r")
plt.colorbar(label="NN output")
opts = dict(alpha=0.1, marker=".", edgecolors="black")
plt.scatter(x[y==0][:,0], x[y==0][:,1], color="blue", **opts)
plt.scatter(x[y==1][:,0], x[y==1][:,1], color="red", **opts)
plt.xlim(grid[0].min(), grid[0].max())
plt.ylim(grid[1].min(), grid[1].max())

## Functional API

The functional API is very similar to the old (TensorFlow 1) low-level API. That means the computation graph is first build in an abstract way (just specifying input/output shapes, but no data yet). Each layer can be called as a function on an input Tensor and return an output Tensor. One can then build arbitrary computation graphs and finally build a model by passing the input and output Tensors. This is especially useful when we want to organize the processing into different inputs and different outputs or if you want to build computation graphs that have branches.

Suppose we want to do some strangely complicated processing of the "California housing dataset":

In [None]:
from sklearn.datasets import fetch_california_housing

In [None]:
data = fetch_california_housing()

In [None]:
print(data.DESCR)

For convenience, let's put it into a DataFrame

In [None]:
import pandas as pd
df_housing = pd.DataFrame(data.data)
df_housing.columns = data.feature_names
df_housing['MedHouseVal'] = data.target

In [None]:
df_housing.head()

In [None]:
df_housing.describe()

Lets do the following funny exercise:
* Feed the Latitude and Longitude through a separate NN layer
* Combine the output of this layer with the other inputs (except for the median income)
* Add another hidden layer
* Add a target where we first try to predict the median income
* Feed back this predicted median income  together with the outputs of the NN into another hidden layer
* Finally predict the median house value

In [None]:
from tensorflow.keras.layers import Input, Dense, concatenate

# For such more complicated structures it is often useful to give the layers names

inp_feat = Input((5,), name="Features") # ['HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup']
inp_coord = Input((2,), name="Coordinates") # ['Latitude', 'Longitude']
hl_coord = Dense(64, activation="relu")(inp_coord)
joined_inp = concatenate([inp_feat, hl_coord])
hl = Dense(64, activation="relu")(joined_inp)
# no activation function here, this will be a regression target
out_MedInc = Dense(1, name="MedIncOutput")(hl)
joined_inp2 = concatenate([hl, out_MedInc])
hl2 = Dense(64, activation="relu")(joined_inp2)
out_HouseValue = Dense(1, name="HouseValueOutput")(hl2)

So we have a model now with 2 inputs and 2 outputs. We can use `keras.models.Model` to create models with arbitrary many inputs and outputs.

In [None]:
housing_model = tf.keras.Model(
    inputs={
        "Features" : inp_feat,
        "Coordinates" : inp_coord
    },
    outputs={
        "MedIncOutput" : out_MedInc,
        "HouseValueOutput" : out_HouseValue
    }
)
housing_model.summary()

Keras comes with [plotting utilities](https://keras.io/api/utils/model_plotting_utils) that allow a graph visualization for models created with the functional API. Let's check if we stitched the layers together as planned:

In [None]:
tf.keras.utils.plot_model(housing_model)

Since we named the inputs and outputs, we can give input and target data as dictionaries, but before that we want to standardize both the inputs and the targets!

In [None]:
features = ['HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup']

In [None]:
coordinates = ['Latitude', 'Longitude']

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
scaler = StandardScaler().fit(df_housing.values)

In [None]:
df_trf = df_housing.copy()
df_trf[:] = scaler.transform(df_housing.values)
df_trf.describe()

In [None]:
x_housing = {
    "Features" : df_trf[features].values,
    "Coordinates" : df_trf[coordinates].values,
}
y_housing = {
    "MedIncOutput" : df_trf["MedInc"].values.reshape(-1, 1),
    "HouseValueOutput" : df_trf["MedHouseVal"].values.reshape(-1, 1),
}

We need to specify loss functions for all outputs. If different outputs should be trained with different loss functions, you need to pass a list. The total loss will be the sum of the individual losses. One could also pass `loss_weights` to weight them relative to each other, but we don't do that here.

In [None]:
housing_model.compile(loss='mean_squared_error', optimizer='Adam')

In [None]:
housing_model.fit(x_housing, y_housing, epochs=10, shuffle=True, batch_size=128)

Did we predict the median income and finally the house price correctly? Let's have a look at the distributions for true and predicted values.

In [None]:
predictions = housing_model(x_housing)

In [None]:
predictions

In [None]:
predictions

In [None]:
opt = dict(alpha=0.5, bins=100, range=(-3, 5))
plt.hist(df_trf["MedInc"], label="True", **opt)
plt.hist(predictions["MedIncOutput"].numpy().reshape(-1), label="Predicted", **opt)
plt.xlabel("Median income (rescaled)")
plt.legend()

In [None]:
opt = dict(alpha=0.5, bins=100, range=(-3, 5))
plt.hist(df_trf["MedHouseVal"], label="True", **opt)
plt.hist(predictions["HouseValueOutput"].numpy().reshape(-1), label="Predicted", **opt)
plt.xlabel("Median House value (rescaled)")
plt.legend()

## Subclass API

For maximum flexibility you can also inherit from `tf.keras.models.Model` or `tf.keras.layers.Layer` and implement your own forward pass. This is very similar to how [PyTorch models are commonly built](https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html).

Both for models and for layers the minimum amount of methods that you have to implement are `__init__`, where you typically define parameters and any state and then the forward pass in `call`:

In [None]:
class MyDenseReluLayer(tf.keras.layers.Layer):
    
    def __init__(self, n_inputs, n_outputs):
        # call the base class constructor
        super().__init__()
        
        # initialize weights
        self.kernel = tf.Variable(tf.random.uniform((n_inputs, n_outputs)))
        self.biases = tf.Variable(tf.zeros(n_outputs))
        
    def call(self, inputs):
        return tf.nn.relu(tf.matmul(inputs, self.kernel) + self.biases)

Custom layers can be arbitrarily combined with existing layers e.g:

In [None]:
composed_model = tf.keras.models.Sequential([
        MyDenseReluLayer(2, 5),
        Dense(1, activation="sigmoid")
])

In [None]:
inputs = tf.constant([[1, 2], [3, 4]], dtype=tf.float32)

In [None]:
composed_model(inputs)

In [None]:
composed_model.summary()

Models can also be used as layers for new models and you can use existing layers as members of custom layers etc.

More information can be found at https://keras.io/guides/making_new_layers_and_models_via_subclassing. 

## Generators

Sometimes the whole training data might not fit into memory or you might want to do some live pre-processing. The simplest way to do this is via [python generators](https://wiki.python.org/moin/Generators). For maximum efficiency it's worth having a look at [tf.data](https://www.tensorflow.org/guide/data).

Let's write a generator that yields an infinite amount of mini batches for our "moon" dataset. The generator should yield batches of (x, y).

In [None]:
def moon_generator(batch_size=128, buffer_size=10000):
    # let's make an infinite generator
    # - in each pass of the loop we will generate `buffer_size` training examples
    while True:
        x, y = make_moons(n_samples=buffer_size, noise=0.4)
        # this is the loop over mini-batches
        for start in range(0, buffer_size, batch_size):
            yield x[start : start + batch_size], y[start : start + batch_size]

Let's make an overly complicated model and train it with "infinite data"

In [None]:
stupid_model = tf.keras.models.Sequential([
    Dense(1024, activation="relu", input_shape=(2,)),
    Dense(1024, activation="relu"),
    Dense(1024, activation="relu"),
    Dense(1, activation="sigmoid")
])

In [None]:
stupid_model.summary()

In [None]:
stupid_model.compile(optimizer="Adam", loss="binary_crossentropy")

Since our generator is infinite we have to pass the `steps_per_epoch` Argument in `fit` that defines how many batches should be used until one epoch is declared finished. Finite generatiors can be created by inheriting from [`tf.keras.utils.Sequence`](https://keras.io/api/utils/python_utils/#sequence-class) or by using [`tf.data`](https://www.tensorflow.org/guide/data).

In [None]:
stupid_model.fit(moon_generator(), steps_per_epoch=200, epochs=1)

In [None]:
def validate_with_generator(model, generator, steps=5):

    # for plotting, just draw a few examples from the generator
    x = []
    y = []
    for i in range(steps):
        data = next(generator)
        x.append(data[0])
        y.append(data[1])
    x = np.concatenate(x)
    y = np.concatenate(y)
    
    grid = np.meshgrid(
        np.arange(x[:,0].min(), x[:,0].max(), 0.1),
        np.arange(x[:,1].min(), x[:,1].max(), 0.1),
    )
    
    xy = np.stack([grid[0].ravel(), grid[1].ravel()], axis=1)    
    scores = model.predict(xy)

    plt.contourf(grid[0], grid[1], scores.reshape(grid[0].shape), cmap="Spectral_r")
    plt.colorbar(label="NN output")
    opts = dict(alpha=0.2, marker=".", edgecolors="black")
    plt.scatter(x[y==0][:,0], x[y==0][:,1], color="blue", **opts)
    plt.scatter(x[y==1][:,0], x[y==1][:,1], color="red", **opts)
    plt.xlim(grid[0].min(), grid[0].max())
    plt.ylim(grid[1].min(), grid[1].max())

In [None]:
validate_with_generator(stupid_model, moon_generator(), steps=20)

## Visualize hidden layers

For models created with the Sequential or functional API it is easy to create new models that evaluate only part of the computation graph.
Let's use this to visualize the hidden layers of our first neural network in this notebook.

In [None]:
model.summary()

In [None]:
model.layers[0].output

In [None]:
model.input

In [None]:
hidden_output = tf.keras.Model(inputs=[model.input], outputs=[model.layers[0].output])

Let's feed it with a regular grid again for visualization.

In [None]:
step = 0.1
grid = np.meshgrid(
    np.arange(x[:,0].min(), x[:,0].max()+step, step),
    np.arange(x[:,1].min(), x[:,1].max()+step, step)
)

In [None]:
xp = np.stack([grid[0].ravel(), grid[1].ravel()], axis=-1)

In [None]:
hl_out = hidden_output(xp).numpy()

In [None]:
fig, axs = plt.subplots(nrows=4, ncols=4, figsize=(10, 10))
for i in range(16):
    axs.ravel()[i].contourf(grid[0], grid[1], hl_out[:,i].reshape(grid[0].shape))

In [None]:
weights = model.layers[1].weights[0]
bias = model.layers[1].weights[1]
weights, bias

In [None]:
fig, axs = plt.subplots(nrows=16, ncols=2, figsize=(2 * 2, 2 * 16))
total = np.zeros_like(hl_out[:, 0])
for i in range(16):
    total += weights[i, 0] * hl_out[:, i]
    axs[i, 0].contourf(grid[0], grid[1], hl_out[:,i].reshape(grid[0].shape))
    axs[i, 0].set_title(f"+ {weights[i, 0]:.3f} *")
    axs[i, 1].contourf(grid[0], grid[1], total.numpy().reshape(grid[0].shape))
    axs[i, 1].set_title("=")
    axs[i, 0].set_axis_off()
    axs[i, 1].set_axis_off()

In [None]:
hl_out.shape

This gives a nice idea about how a NN composes it's output by combining the outputs of the previous layer. A nice visualization of this can be seen at https://playground.tensorflow.org/