## Usage of initializers

Initializations define the way to set the initial random weights of Keras layers.

The keyword arguments used for passing initializers to layers will depend on the layer. Usually it is simply kernel_initializer and bias_initializer:

In [1]:
from keras.layers import Dense

Dense?

Using TensorFlow backend.


In [2]:
layer = Dense(10, kernel_initializer='lecun_uniform', bias_initializer='ones')

In [3]:
from keras.initializers import Constant

layer = Dense(10, kernel_initializer='he_normal', bias_initializer=Constant(7))

As you can see there are plenty of initializers, you can even make your own:

In [4]:
from keras import backend as K

def my_init(shape, dtype=None):
    return K.random_normal(shape, dtype=dtype)

Dense(64, kernel_initializer=my_init)

<keras.layers.core.Dense at 0x113292750>

## Usage of activations

Activations can either be used through an Activation layer, or through the activation argument supported by all forward layers:

In [5]:
from keras.layers import Activation, Dense, Input

x = Input((1,))
x = Dense(64)(x)
x = Activation('tanh')(x)

This is equivalent to:

In [6]:
x = Input((1,))
x = Dense(64, activation='tanh')(x)

You can also pass an element-wise Tensorflow/Theano function as an activation:

In [7]:
from keras import backend as K

x = Input((1,))
x = Dense(64, activation=K.tanh)(x)
x = Activation(K.tanh)(x)

## Usage of regularizers

Regularizers allow to apply penalties on layer parameters or layer activity during optimization. These penalties are incorporated in the loss function that the network optimizes.

The penalties are applied on a per-layer basis. The exact API will depend on the layer, but the layers Dense, Conv1D, Conv2D and Conv3D have a unified API.

These layers expose 3 keyword arguments:

* kernel_regularizer: instance of keras.regularizers.Regularizer
* bias_regularizer: instance of keras.regularizers.Regularizer
* activity_regularizer: instance of keras.regularizers.Regularizer

In [8]:
from keras import regularizers
Dense(64, input_dim=64,
                kernel_regularizer=regularizers.l2(0.01),
                activity_regularizer=regularizers.l1(0.01))

<keras.layers.core.Dense at 0x113326090>

In [9]:
# available regularizers
regularizers.l1(0.)
regularizers.l2(0.)
regularizers.l1_l2(0.)

<keras.regularizers.L1L2 at 0x113292350>

In [10]:
# Custom regularizer
from keras import backend as K

def l1_reg(weight_matrix):
    return 0.01 * K.sum(K.abs(weight_matrix))

Dense(64, input_dim=64,
                kernel_regularizer=l1_reg)

<keras.layers.core.Dense at 0x113351cd0>

In addition there is the activity regularization layer that can help with this:

In [11]:
from keras.layers import ActivityRegularization

ActivityRegularization?

## Usage of constraints

Functions from the constraints module allow setting constraints (eg. non-negativity) on network parameters during optimization.

The penalties are applied on a per-layer basis. The exact API will depend on the layer, but the layers Dense, Conv1D, Conv2D and Conv3D have a unified API.

These layers expose 2 keyword arguments:

* kernel_constraint for the main weights matrix
* bias_constraint for the bias.

In [12]:
from keras.constraints import max_norm

Dense(64, kernel_constraint=max_norm(2.))

<keras.layers.core.Dense at 0x113292990>

Available constraints

* max_norm(max_value=2, axis=0): maximum-norm constraint
* non_neg(): non-negativity constraint
* unit_norm(): unit-norm constraint, enforces the matrix to have unit norm along the last axis

## Putting it all together

So you can apply all the concepts to a core layer or use them to make your own! Below I show you how to make a layer that uses all of the above:

In [13]:
from keras.engine.topology import Layer
from keras.activations import hard_sigmoid
from keras import regularizers
import numpy as np


class MyLayer(Layer):

    def __init__(self, output_dim, **kwargs):
        self.output_dim = output_dim
        super(MyLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        # Create a trainable weight variable for this layer.
        self.kernel = self.add_weight(shape=(input_shape[1], self.output_dim),
                                      initializer='uniform',
                                      contraint='unit_norm',
                                      regularizer=regularizers.l1(1.),
                                      trainable=True)
        
        # Another way to enable this regularization is with the add loss function
        # self.add_loss(self.kernel, inputs=None)
        
        super(MyLayer, self).build(input_shape) 

    def call(self, x):
        return hard_sigmoid(K.dot(x, self.kernel))

    def compute_output_shape(self, input_shape):
        return (input_shape[0], self.output_dim)

Notice how this all applied to the add_weight function. Both the initializer and the constraint can only be applied there! Regularization as you can see, can be applied up and down the pipe. And activations are pretty self evident.