## L1 and L2 Regularization

Another way to prevent neural networks from overfitting is to use L1 and L2 regularization to constrain network's connection weights (but not the biases typically).

In TensorFlow, we can simply add the appropriate regularization terms to the cost function. Like so:

In [1]:
%%script false
# Suppose that our network has only one hidden layer and an output layer.
# We can construct the loss using L1 regularization as following:
weight1 = tf.get_default_graph().get_tensor_by_name("hidden1/kernel:0")
weight2 = tf.get_default_graph().get_tensor_by_name("outputs/kernel:0")

scale = 0.001

with tf.name_scope("loss"):
    xen = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
    base_loss = tf.reduce_mean(xen, name="avg_xentropy")
    reg_loss = tf.reduce_sum(tf.abs(weights1)) + tf.reduce_sum(tf.abs(weight2))
    loss = tf.add(base_loss, scale * reg_loss, name="loss")

However, if we're training a very deep neural network, adding regularization terms like above example could be very tedious. Fortunately, TensorFlow provides us another option.

In TensorFlow, many functions that create variables such as `get_variable()`, `dense()` accept a `*_regularizer` argument for each created variable. We can pass any function taking weights as argument and returning the regularization loss. Specifically, we may use `tf.contrib.layers.l1_regularizer()`, `tf.contrib.layers.l2_regularizer()` and `tf.contrib.layers.l1_l2_regularizer()`

In [2]:
import tensorflow as tf
import pprint
from functools import partial
from tensorflow.examples.tutorials.mnist import input_data

tf.reset_default_graph()

printer = pprint.PrettyPrinter(indent=4)

n_inputs = 784
n_hidden1 = 300
n_hidden2 = 100
n_outputs = 10
learning_rate = 0.01

scale = 0.001

X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int64, shape=(None), name="y")

In [3]:
# Construct fully connected layer using L1 regularization
# Note that we used the kernel_regularizer argument of the tf.layers.dense() function
regularized_dense_layer = partial(tf.layers.dense, 
                                  activation=tf.nn.relu,
                                  kernel_regularizer=tf.contrib.layers.l1_regularizer(scale))
with tf.name_scope("dnn"):
    hidden1 = regularized_dense_layer(X, n_hidden1, name="hidden1")
    hidden2 = regularized_dense_layer(hidden1, n_hidden2, name="hidden2")
    logits = regularized_dense_layer(hidden2, n_outputs, activation=None, name="outputs")
    
with tf.name_scope("loss"):
    xen = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
    base_loss = tf.reduce_mean(xen, name="avg_xentropy")
    # TensorFlow automatically adds L1 regularization nodes to a special collection
    # containing all the regularization losses
    reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
    printer.pprint(reg_losses)
    # Add the regularization losses to the overall loss
    loss = tf.add([base_loss], reg_losses, name="loss")

[   <tf.Tensor 'dnn/hidden1/kernel/Regularizer/l1_regularizer:0' shape=() dtype=float32>,
    <tf.Tensor 'dnn/hidden2/kernel/Regularizer/l1_regularizer:0' shape=() dtype=float32>,
    <tf.Tensor 'dnn/outputs/kernel/Regularizer/l1_regularizer:0' shape=() dtype=float32>]


In [4]:
W1 = tf.get_default_graph().get_tensor_by_name("hidden1/kernel:0")
W2 = tf.get_default_graph().get_tensor_by_name("hidden2/kernel:0")
W3 = tf.get_default_graph().get_tensor_by_name("outputs/kernel:0")

In [5]:
with tf.name_scope("train"):
    optimizer = tf.train.MomentumOptimizer(learning_rate, momentum=0.9)
    training_op = optimizer.minimize(loss)

In [6]:
with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

In [7]:
init = tf.global_variables_initializer()

n_epochs = 50
batch_size = 100

mnist = input_data.read_data_sets("/tmp/data/")

def print_weights(sess, W, name):
    print()
    print(name, "weights:")
    data = sess.run(W)
    print("Shape:", data.shape)
    print(data)
    print()
    

with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(len(mnist.test.labels) // batch_size):
            X_batch, y_batch = mnist.train.next_batch(batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        acc_test = accuracy.eval(feed_dict={X: mnist.test.images, y: mnist.test.labels})
        print("Epoch:", epoch, "--", "Test Accuracy:", acc_test)
    
    print_weights(sess, W1, "Hidden 1")
    print_weights(sess, W2, "Hidden 2")
    print_weights(sess, W3, "Outputs")

Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
Epoch: 0 -- Test Accuracy: 0.8921
Epoch: 1 -- Test Accuracy: 0.9212
Epoch: 2 -- Test Accuracy: 0.9278
Epoch: 3 -- Test Accuracy: 0.9359
Epoch: 4 -- Test Accuracy: 0.9421
Epoch: 5 -- Test Accuracy: 0.943
Epoch: 6 -- Test Accuracy: 0.9417
Epoch: 7 -- Test Accuracy: 0.9491
Epoch: 8 -- Test Accuracy: 0.9479
Epoch: 9 -- Test Accuracy: 0.9524
Epoch: 10 -- Test Accuracy: 0.9565
Epoch: 11 -- Test Accuracy: 0.9513
Epoch: 12 -- Test Accuracy: 0.9526
Epoch: 13 -- Test Accuracy: 0.9554
Epoch: 14 -- Test Accuracy: 0.9574
Epoch: 15 -- Test Accuracy: 0.9601
Epoch: 16 -- Test Accuracy: 0.9602
Epoch: 17 -- Test Accuracy: 0.9553
Epoch: 18 -- Test Accuracy: 0.9639
Epoch: 19 -- Test Accuracy: 0.9646
Epoch: 20 -- Test Accuracy: 0.9652
Epoch: 21 -- Test Accuracy: 0.9618
Epoch: 22 -- Test Accuracy: 0.9569
Epoch: 23 -- Tes