# Keras

This is a set of notes on `keras` based on either Prof. Andrew Ng's Deeplearning.ai course, or Francois Chollet's book Deep Learning with Python.

**Inputs** in `keras` are also represented as layers: `keras.layers.Input()`.

The **first** layer needs an `input_shape=` parameters, the shape here should be the input data shape **without** the batch dimension. 

**Activations** can also be standalone layers, `keras.layers.Activations()`. Some layers have optional parameters to build in activations, such as in `Dense()` layers.

Two ways for multi-class classification:
1. one-hot labels, use `categorical_crossentropy` loss.
2. integer labels, use `sparse_categorical_crossentropy` loss. 

**Avoid** shrinking layer dimension smaller than input dimension too quickly to avoid loss of information early in the chain.

When calling `model.compile()`, you can specify a loss function with parameter `loss=`, as well as a metric to monitor with `metrics=` param.

Validation: `model.evaluate()`

## Custom Layers

Extend from `keras.layers.Layer`, must implement `call(self, inputs)` which should always return a value.

## Preprocessing

Always **remove** redundancy in your data. 

**One Hot Encoding**: `keras.utils.np_utils.to_categorical()`

**Sequence Padding**: `keras.preprocessing.sequence.pad_sequence()`

## Modeling Tips

Be aware of **nonstationary** problems. Because such problems change over time, the right move is:
* constantly training on recent data, or
* gather data at a timescale where the problem is stationary.



## Emsembles 

One style that has had recent success is the **wide and deep** category of models. Such models consist of jointly training a deep neural network with a large linear model. 

## RNN 

`keras` RNN layers take input in the shape of `(batch_size, timesteps, input_features)`.

Recurrent layers all have **dropout** related params: 
* `dropout=` floating dropout rate for layer inputs
* `recurrent_dropout=` dropout rate for the recurrent unit

Yarin Gal 2015 PhD thesis: recurrent layer dropout should use the **same** dropout mask for every timestep.

`keras.layers.LSTM()` has boolean parameter `return_sequences=` to either return sequences, or the last element of the returned sequence. 

Parameter `implementation=` (either 1 or 2) controls how computations are done. Looks like mode 2 is vectorized for batch processing. See code [here](https://github.com/keras-team/keras/blob/d9f26a92f4fdc1f1e170a4203a7c13abf3af86e8/keras/layers/recurrent.py#L1821)

`keras.layers.Bidirectional()` for Bidirectional RNN.

### Load Model Weights

Once you build a model, you can use `model.load_weights()` to load previously saved weights.

### Layer Weights

To **freeze** layer weights, set `trainable=False` when instantiating the layer. 

Use `set_weights()` to set layer weights to pre-trained values. Example below, thanks to Andrew Ng's Deeplearning.ai Coursera course:


```
embedding_layer = Embedding(input_dim=vocab_len, output_dim=emb_dim, trainable=False)
# or set embedding_layer.trainable = False

# Build the embedding layer, it is required before setting the weights of the embedding layer. 
# Do not modify the "None".
embedding_layer.build((None,))

# Set the weights of the embedding layer to the embedding matrix. 
# Your layer is now pretrained.
embedding_layer.set_weights([emb_matrix])
```

## Training

### Regularization

Use `keras.regularizers.*`. Instances can be passed to layers using param `kernel_regularizer=`.

### Metrics

* **Balanced**-classification: ROC AUC
* **Imbalanced**-classification: precision and recall, F1 score
* **Ranking/Multi-label classification**: mean average precision. 

### `model.fit_generator()`

[docs](https://keras.io/models/model/#fit_generator)

First argument is expected to be a python generator that will yield **batches** of inputs and targets **indefinitely**, i.e. returns `(samples, target)`, where `len(samples) == batch_size`. 

How many samples are drawn for each epoch is defined by param `steps_per_epoch=`.

`validation_data=` can be either a generator or numpy arrays. `validate_steps=` should be specified if a generator is given.

Example: `keras.preprocessing.image.ImageDataGenerator`

### Saving Trained Model

Models can be saved by calling `model.save('path.h5')`.

### Multiple Inputs

Jointly train multiple networks with a combined loss function. Keras `functional` API provides flexible ways to achieve this. 

Model object can be created with multiple inputs, specified as a **list** of inputs. When calling `fit()`, input can either be: 
* a list, or 
* a dict with keys matching `Input` layer names.

Output from different sources are combined with `keras.layers.concatenate()` beforing feeding to the next layer. 

### Multiple Outputs / Loss functions

Multiple loss functions can be used but in the end there needs to be a single loss. `keras` provides a way to simply sum the losses to produce a single loss. This is done by specifying in the `model.compile()` call with either: 

* a list of loss functions, e.g. `['mse', 'categorical_crossentropy', 'binary_crossentropy']`,
* a dictionary with keys matching the output layer names, e.g. `{'age': 'mse', 'income': 'categorical_crossentropy'}`. 
* loss weights can be specified by `loss_weights=` for weighted sums.

### Layer Weight Sharing

A layer instance can be used multiple times, the weights in this case would be shared across all calls. Example use is a **Siamese** network. 

Models can also be used as layers. Example: dual camera inputs.

### Plot Models

Use `keras.utils.plot_model()`. If `show_shapes=True` then layer shapes are also shown. [docs](https://keras.io/utils/#plot_model)

### Callbacks

Built-in callbacks: `keras.callbacks`, [docs](https://keras.io/callbacks/)

Examples of callbacks:

* Model checkpointing
* Early stopping
* Dynamically adjust certain parameters during training, such as learning rate
* logging training and validation metrics

### Tensorboard

Use `keras.callbacks.TensorBoard`, [docs](https://keras.io/callbacks/#tensorboard).

Steps:
1. create logging directory `z`
2. create TensorBoard callback, provide `log_dir=z`, see docs for other params such as `histogram_freq=` and `embeddings_freq=`.
3. pass the callback to fit() with `callbacks=` param.
4. at command prompt, call: `tensorboard --logdir=my_log_dir`
5. connect to host `http://localhost:6006`


### Hyperparameter Optimization

Hyperparameter space is usually made of discrete decisions and thus isn't continuous or differentiable. Hence gradient descent doesn't work. Need gradient free methods, which is far less efficient. 

* Hyperopt
* Hyperas (integrates Hyperopt with Keras)

### Hyperparameter Optimization

Hyperparameter space is usually made of discrete decisions and thus isn't continuous or differentiable. Hence gradient descent doesn't work. Need gradient free methods, which is far less efficient. 

* Hyperopt
* Hyperas (integrates Hyperopt with Keras)