# Keras
Keras is fairly well-known in the Python deep learning community. It used to be a high-level API to make frameworks like CNTK, Theano and TensorFlow easier to use and was framework-agnostic (you only had to set the backend for processing, everything else was abstracted). A few years ago, Keras was migrated to the TF repository and dropped support for other backends. It is now the de-facto high level API for TF.


### Layers and models
The basic component of a Keras model is a layer. A layer is comprised of one or more operations and is meant to be easily reusable. A model is a graph of layers.

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import models

tf.random.set_seed(1024)

We will use the well-known Boston housing dataset. It is a small dataset of 500 examples (homes in Boston) with 13 features where the target variable is the price of a home. Keras has this dataset easily accessible

In [None]:
# Boston housing dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.boston_housing.load_data()

print(f'x_train shape: {x_train.shape}')
print(f'y_train shape: {y_train.shape}')
print(f'y_test  shape: {y_test.shape}')

# Preprocess the data (these are Numpy arrays)
# Standardize by subtracting the mean and dividing with the standard deviation
x_mean = np.mean(x_train, axis=0)
x_std = np.std(x_train, axis=0)
x_train = (x_train - x_mean) / x_std
x_test = (x_test - x_mean) / x_std

n_features = x_train.shape[1]

#### Building a linear regression model
You'll see examples in the TF guide immediately start with neural networks. We will begin with a linear regression model and then move onto a simple neural network, showing that it is very easy to construct both.

The simplest way to construct a Keras model is a sequential model, which is just a sequence of layers (or operations). For simple models, this works well. Our linear regression model has just one layer - a [dense layer](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense), which means a multiplication with a weight and a bias addition. This is just the equation $y = Wx + b$.

This layer can also be used for neural networks, where we would be multiplying matrices. We choose `units = 1` (output a vector of size 1, i.e. a scalar for each example) and `input_shape = (n_features, )`, which means that we have 13 features for each example.

In [None]:
linear_layer = layers.Dense(
    units=1,
    input_shape=(n_features,),
    name='linear_layer'
)

linear_model = models.Sequential([
    linear_layer
], name='linear_model')

# View the weights (parameters) of a layer
linear_layer.weights

Let's look a very useful method which gives us the summary of a model. It shows us all the layers + their shapes and the parameters of the model.

In [None]:
linear_model.summary()

#### Compiling and training the model
Before training, we need to specify the optimizer we will use and which loss we want to minimize. We'll pick the simplest options: stochastic gradient descent for the optimizer and mean squared error for the loss function. We do this by "compiling" the model, which prepares it for training. After that, we can call `.fit`, which is similar to scikit-learn. However, here we have to specify the number of iterations (epochs) we want to run.

In [None]:
linear_model.compile(
    # Optimizer: Stochastic gradient descent
    optimizer=keras.optimizers.SGD(),  
    # Loss function to minimize: mean squared error
    loss=keras.losses.MeanSquaredError()
)

linear_model.fit(
    x_train, 
    y_train, 
    batch_size=0, # Use all data as one batch
    verbose=0,    # Don't print progress
    epochs=20     # Run 20 iterations of gradient updates 
)

y_hat = linear_model.predict(x_test)

mse = keras.metrics.MeanSquaredError()(y_test, y_hat)
print(f'Test MSE: {mse:.4f}')

We see that we also managed to get a decent fit (MSE around 20 should be OK for this dataset). Observe how we managed to do this by avoiding all optimization-related mathematics - we only had to specify how the model computes predictions (the forward pass). Since our model and optimization procedure were very simple, the code isn't really much more complicated than if we used lower-level tools, but for more advanced approaches, it's usually much simpler to use Keras.

### More details for layers and models

A layer holds state and performs computations. Layers should be easily reusable and composable - you can create more complex layers from simpler ones.
A model is a graph of layers and offers train / predict / evaluate APIs. It is easily serializable.

### Functional API example
A core principle of Keras is the progressive increase of complexity. You should always be able to get into lower-level workflows in a gradual way. You shouldn't fall off a cliff if the high-level functionality doesn't exactly match your use case. You should be able to gain more control over the small details while keeping some high-level convenience.

For more sophisticated usecases, we might want a more flexible way to construct models (recurrent models, multiple inputs and outputs...). We'll move onto the Functional API, which offers us just that. Instead of specifying a sequence of layers, we create each layer individually and apply it onto the previous ones. This is the most popular way of creating models in Keras and is recommended. Each layer is a Python class which you apply by calling it with its inputs. The inputs to each layer are tensors (`tf.Tensor`). Once you have all the layers applied, you simply create a model by telling it what is its input and its output. TensorFlow will find the path through the graph of layers you have created and note down which layers are part of the graph (unused parts will not be included).


#### Neural networks
Now, we'll move on to a neural network example. We will apply a neural network with a small hidden layer to the same dataset. Due to the simplicity of Keras, this won't be much more difficult than the linear regression example! 

We'll use only dense layers and the ReLU activation function. In this example, our inputs will have 13 features. With matrix multiplication in the dense layers, we will transform them to a 3-dim and 1-dimensional space, which will be the output.

We need to start with the Input class, which is a placeholder for the actual data we'll send in. The shape we are specifying is the number of features, i.e. length of each vector (example). Even though we can (and should) feed in matrices of data, the batch size (the number of rows in the matrix) is always omitted in Keras.

In [None]:
inputs = layers.Input(shape=(n_features,), name='inputs')

# After we create a layer object, we get its output by calling it and passing the input tensor.
# The activation function can also be added by creating a `layers.ReLU` or `layers.Activation` layer.
layer1 = layers.Dense(3, activation='relu', name='dense_1')
x1 = layer1(inputs)

# Our outputs will be prices (a single value for each example)
layer2 = layers.Dense(1, name='dense_2')
predictions = layer2(x1)

nn_model = models.Model(inputs=inputs, outputs=predictions, name='nn_model')

nn_model.summary()

You can access input / output / weights attributes on both layers and models to see what they contain.

In [None]:
print(f'Layer 1 inputs: {layer1.input}\n')
print(f'Layer 1 outputs: {layer1.output}\n')
print(f'Model inputs: {nn_model.input}\n')
print(f'Model outputs: {nn_model.output}\n')
print(f'Layer 1 weights: {layer1.weights}\n')

Again, we'll need to compile the model before training. We'll use a different optimizer this time. 

In [None]:
tf.random.set_seed(999)

nn_model.compile(
    # Adam optimizer, better suited for neural networks
    optimizer=keras.optimizers.Adam(0.15),  
    # Loss function to minimize: mean squared error
    loss=keras.losses.MeanSquaredError()
)

We will also use a validation set (we don't train on it) to monitor the loss during training. Although training is typically done in minibatches, we will use the whole dataset here as it is fairly small.

In [None]:
history = nn_model.fit(
    x_train,
    y_train,
    batch_size=0,         # The minibatch size (tradeoff between speed and quality) - use all data
    epochs=20,            
    validation_split=0.1  # How much of the training data to use for validation
)

The `.fit` method returns an object which contains the losses for each epoch during training. You can use it to save them or plot them, but we will show an easier and more powerful tool for this in the next notebook.

In [None]:
# Show the loss and metrics history
history.history

The validation results seem good. To make sure, we'll also evaluate our model on the test set.

In [None]:
# Evaluate the model on the test data
mse = nn_model.evaluate(x_test, y_test, batch_size=0)
print(f'Test MSE: {mse}')

We see that our test loss is higher than our validation loss, but overall still OK, even lower than the linear regression loss. Since we're dealing with a small amount of data, overfitting can easily happen - we'll demonstrate some useful techniques to combat it in the next notebook.

### Going further
The [Keras API docs](https://www.tensorflow.org/api_docs/python/tf/keras) are pretty nice and contain a lot more material - definitely check out the list of layers. And even if you can't find something you need - it's easy to create your own layer, metric or loss function with plain TF operations.