# Regularization
## 3. Max-norm regularization

Another method to regularize model is **max-norm regularization**. This method restricts **L2 norm of w**,

which is in each neuron, to be less than r, which is hyper parameter.

After finishing each train step, calculate L2 norm of w and clip w as w = w*r/(l2 norm of w).

** And it helps reduce vanishing and exploding gradient problem when we don't use batch normalization.**

##### Add_

Its appealing points is that it can not only improve performance of model but completely 

avoid exploding gradient, due to its restriction.

##### In TensorFlow
Code below shows how to realize max-norm regularization.

Calculating weights in first hidden layer and by using clip_by_norm() function, realize clipping weights.

In [1]:
import tensorflow as tf

def max_norm_regularizer(threshold, axes=1, name="max_norm", collection="max_norm"):
    
    def max_norm(weights):
        clipped = tf.clip_by_norm(weights, clip_norm=threshold, axes=axes)
        clip_weights = tf.assign(weights, clipped, name=name)
        tf.add_to_collection(collection, clip_weights)
        return None
    return max_norm

In [2]:
max_norm_Reg = max_norm_regularizer(threshold = 1.0)
'''
with tf.name_scope("dnn"):
    hidden1 = tf.layers.dense(X, n_hidden1, activation=tf.nn.relu,
                              kernel_regularizer=max_norm_reg, name="hidden1")
    
    hidden2 = tf.layers.dense(hidden1, n_hidden2, activation=tf.nn.relu,
                              kernel_regularizer=max_norm_reg, name="hidden2")
    
    logits = tf.layers.dense(hidden2, n_outputs, name="outputs")
'''

'\nwith tf.name_scope("dnn"):\n    hidden1 = tf.layers.dense(X, n_hidden1, activation=tf.nn.relu,\n                              kernel_regularizer=max_norm_reg, name="hidden1")\n    \n    hidden2 = tf.layers.dense(hidden1, n_hidden2, activation=tf.nn.relu,\n                              kernel_regularizer=max_norm_reg, name="hidden2")\n    \n    logits = tf.layers.dense(hidden2, n_outputs, name="outputs")\n'

Max-norm method doesn't have to add regularization on loss function. That is why we get None in max_norm(). 

But we need to get a control to calculate clip_weights after every step.

And this is why we add clip_weights in max_norm() function on 'max_norm' collection.

Finally,

In [3]:
'''
clip_all_weights = tf.get_collection("max_norm")

with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(mnist.train.num_examples // batch_size):
            X_batch, Y_batch = mnist.train.next_batch(batch_size)
            sess.run(training_op, feed_dict={X: X_batch, Y: Y_batch})
            sess.run(clip_all_weights)
'''

'\nclip_all_weights = tf.get_collection("max_norm")\n\nwith tf.Session() as sess:\n    init.run()\n    for epoch in range(n_epochs):\n        for iteration in range(mnist.train.num_examples // batch_size):\n            X_batch, Y_batch = mnist.train.next_batch(batch_size)\n            sess.run(training_op, feed_dict={X: X_batch, Y: Y_batch})\n            sess.run(clip_all_weights)\n'