## Regularization

上一个小节提到了，现在的网络的主要的问题是出现过拟合，目前我们可以对过拟合进行处理的方式有：

- more data： 更多的数据
- shallow model: 使用浅层的网络
- Regularization： 正则化，是的复杂网络更加简单
- Dropout
- data argumention：数据增强
- early stopping: 提前终止网络训练
- ...

那么什么是正则化的，正则化就是为了减少模型的复杂度来减少过拟合的一种方式。其原理，就是在loss函数中增加了一个约束，这个约束就是限制这个目标函数不能将参数来过大的变化，这样就是导致我们训练的参数限制在一个合理的范围里面。

In [1]:
import tensorflow as tf
from tensorflow.keras import datasets,layers,optimizers,Sequential,metrics

def preprocess(x,y):
    x = tf.cast(x,dtype=tf.float32)/255.
    y =tf.cast(y,dtype=tf.int32)
    return x,y

In [2]:
batchsz = 128
(x, y), (x_val, y_val) = datasets.mnist.load_data()
print('datasets:', x.shape, y.shape, x.min(), x.max())



db = tf.data.Dataset.from_tensor_slices((x,y))
db = db.map(preprocess).shuffle(60000).batch(batchsz).repeat(10)

ds_val = tf.data.Dataset.from_tensor_slices((x_val, y_val))
ds_val = ds_val.map(preprocess).batch(batchsz) 

datasets: (60000, 28, 28) (60000,) 0 255


In [3]:
network = Sequential([layers.Dense(256, activation='relu'),
                     layers.Dense(128, activation='relu'),
                     layers.Dense(64, activation='relu'),
                     layers.Dense(32, activation='relu'),
                     layers.Dense(10)])
network.build(input_shape=(None, 28*28))
network.summary()

optimizer = optimizers.Adam(lr=0.01)

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 256)               200960    
_________________________________________________________________
dense_1 (Dense)              (None, 128)               32896     
_________________________________________________________________
dense_2 (Dense)              (None, 64)                8256      
_________________________________________________________________
dense_3 (Dense)              (None, 32)                2080      
_________________________________________________________________
dense_4 (Dense)              (None, 10)                330       
Total params: 244,522
Trainable params: 244,522
Non-trainable params: 0
_________________________________________________________________


In [9]:
for step,(x,y) in enumerate(db):

    with tf.GradientTape() as tape:
        x = tf.reshape(x,(-1,28*28))
        out = network(x)

        y_onehot = tf.one_hot(y,depth=10)

        # 计算loss值
        loss = tf.reduce_mean(tf.losses.categorical_crossentropy(y_onehot,out,from_logits=True))
        # 添加正则化
        loss_regularization = []

        # 计算正则化的值
        # 正则化1：计算每个参数的l2值
        for p in network.trainable_variables:
            loss_regularization.append(tf.nn.l2_loss(p))
        # 正则化2：累加所以的参数并求平均值
        loss_regularization = tf.reduce_sum(tf.stack(loss_regularization))
        # 正则化3：将这个惩罚因子添加到loss function中
        loss = loss + 0.0001 * loss_regularization

    grads = tape.gradient(loss,network.trainable_variables)
    optimizer.apply_gradients(zip(grads,network.trainable_variables))


    if step % 100 == 0:
        print(step, 'loss:', float(loss), 'loss_regularization:', float(loss_regularization)) 

    # evaluate
    if step % 500 == 0:
        total, total_correct = 0., 0

        for step, (x, y) in enumerate(ds_val): 
            # [b, 28, 28] => [b, 784]
            x = tf.reshape(x, (-1, 28*28))
            # [b, 784] => [b, 10]
            out = network(x) 
            # [b, 10] => [b] 
            pred = tf.argmax(out, axis=1) 
            pred = tf.cast(pred, dtype=tf.int32)
            # bool type 
            correct = tf.equal(pred, y)
            # bool tensor => int tensor => numpy
            total_correct += tf.reduce_sum(tf.cast(correct, dtype=tf.int32)).numpy()
            total += x.shape[0]

        print(step, 'Evaluate Acc:', total_correct/total)

        


0 loss: 2.3405680656433105 loss_regularization: 349.9940490722656
78 Evaluate Acc: 0.1824
100 loss: 0.32610294222831726 loss_regularization: 563.30615234375
200 loss: 0.19031131267547607 loss_regularization: 643.8613891601562
300 loss: 0.16033883392810822 loss_regularization: 720.5924682617188
400 loss: 0.33136171102523804 loss_regularization: 791.5477905273438
500 loss: 0.2671230733394623 loss_regularization: 855.9791259765625
78 Evaluate Acc: 0.9542
600 loss: 0.26694703102111816 loss_regularization: 887.657958984375
700 loss: 0.25909659266471863 loss_regularization: 927.4588012695312
800 loss: 0.22581970691680908 loss_regularization: 964.7063598632812
900 loss: 0.16775313019752502 loss_regularization: 1002.3997802734375
1000 loss: 0.1411725878715515 loss_regularization: 1011.0069580078125
78 Evaluate Acc: 0.9566
1100 loss: 0.18427859246730804 loss_regularization: 1039.88720703125
1200 loss: 0.24137620627880096 loss_regularization: 1056.3935546875
1300 loss: 0.16228419542312622 loss_r