# Kernel Regularization

* Deeper layer might lead to overfitting.
* Use **Kernel Regularization** to avoid one node affecting the result too much.

### Ridge Regression (L2 Regression)

![Regularization.png](Regularization.png)

To increase only training accuracy, minimize only SSE($\sum_{i} (y_i-{\hat{y}}_i)^2 $).

To get higher accuracy in general too, parameter($\beta_j$) also need to be minimized.

Thus implement the penalty term($\lambda \sum_{j} {{\beta}_j}^2$) to suppress parameter$\beta$ from increasing abnormal.

### Lasso Regression (L1 Regression)


$$L_1(\beta) = \sum_{i} (y_i-{\hat{y}}_i)^2 + \lambda \sum_{j}|\beta_{j}|$$


Implement penalty term. ($\lambda \sum_{j} |\beta_{j}|$: not square, but absolute.)

## Kernel Regularization Analysis

for $Y=f(x)+\epsilon, \, \widehat{Y} = \hat{f}(x)$,

$E\big[ (\hat{Y}-Y)^2 \big] = E(\hat{Y}^2 + Y^2 -2\hat{Y}Y)$<br>
$\qquad= E(\hat{Y}^2) + E(Y^2) - 2E(\hat{Y}\cdot Y)$<br>
$\qquad=V(\hat Y) + E(\hat Y)^2 + V(Y) + E(Y)^2 - 2E(\hat{Y} \cdot Y) \quad (\because E(X^2)=E(X)^2+V(X))$<br>
$\qquad=V(\hat Y)+V(Y)+E(\hat Y)^2 + E(Y)^2-2f(x)E(\hat Y)$<br>
$\qquad(\because 2E(\hat{Y}\cdot Y)=2E(\hat{Y}(f(x)+\epsilon))=2E(\hat{Y}\cdot f(x))+2E(\hat{Y}\cdot\epsilon)=2f(x)E(\hat Y))$<br>
$\qquad=V(\hat Y)+(E(\hat Y)-f(x))^2+V(Y)$<br>
$\qquad=(model\_estimate\_variance)+{bias}^2+\sigma^2$

![Regularization_graph.png](Regularization_graph.png)

When overfitted,
* Model Estimate Variance: Increase
* Bias: Decrease

suppress parameter$\beta$ in order to prevent overfitting.

## Kernel Regularization using ```tensorflow.keras```
```python
tensorflow.keras.regularizers.l1_l2(l1=0.01, l2=0.01)
tensorflow.keras.regularizers.l1(l1=0.01)
tensorflow.keras.regularizers.l2(l2=0.01)
```

eg)
```python
h = keras.layers.Conv2D(32, kernel_size=(3, 3), activation='relu', kernel_regularizer=keras.regularizers.l2(1.E-04), name='conv1')(h)
```

Note that parameter ```l1``` and ```l2``` means following:
* ```l1```: L1 regularization factor which corresponds to $\lambda$ of equation for L1 above.
* ```l2```: L2 regularization factor which corresponds to $\lambda$ of equation for L2 above.

In [1]:
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras import datasets 
from tensorflow.keras.utils import to_categorical

# load CIFAR10 Dataset
(x_train, y_train), (x_test, y_test) = datasets.cifar10.load_data()
Y_train = to_categorical(y_train)
Y_test = to_categorical(y_test)

print("Length of train set:", len(Y_train))
print("Shape of x_train:", x_train.shape[1:])

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
Length of train set: 50000
Shape of x_train: (32, 32, 3)


In [2]:
img_rows, img_cols, channel = x_train.shape[1:]

# Unifying image size (reshape X_train, X_test)
X_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, channel)
X_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, channel)
input_shape = (img_rows, img_cols, channel)

# Normalize pixel value in image
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

# Label is already one-hot encoded
print(Y_train[0])
num_classes = 10
batch_size = 32
print(input_shape)

[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
(32, 32, 3)


In [3]:
x = layers.Input(shape=input_shape,  name='input')
h = layers.BatchNormalization()(x)
h = layers.Conv2D(32, kernel_size=(3, 3), activation='relu',
                  kernel_regularizer=tf.keras.regularizers.l2(1.E-04), name='conv1')(h)
h = layers.Conv2D(32, kernel_size=(3, 3), activation='relu',
                  kernel_regularizer=tf.keras.regularizers.l2(1.E-04), name='conv2')(h)
h = layers.BatchNormalization()(h)
h = layers.MaxPooling2D(pool_size=(2, 2), name='pool1')(h)
h = layers.Dropout(0.2)(h)

h = layers.Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same',
                  kernel_regularizer=tf.keras.regularizers.l2(1.E-04), name='conv3')(h)
h = layers.BatchNormalization()(h)
h = layers.Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same',
                  kernel_regularizer=tf.keras.regularizers.l2(1.E-04), name='conv4')(h)
h = layers.BatchNormalization()(h)
h = layers.MaxPooling2D(pool_size=(2, 2), name='pool2')(h)
h = layers.Dropout(0.3)(h)
h = layers.Conv2D(128, kernel_size=(3, 3), activation='relu', padding='same',
                  kernel_regularizer=tf.keras.regularizers.l2(1.E-04), name='conv5')(h)
h = layers.BatchNormalization()(h)
h = layers.Conv2D(128, kernel_size=(3, 3), activation='relu', padding='same',
                  kernel_regularizer=tf.keras.regularizers.l2(1.E-04), name='conv6')(h)
h = layers.BatchNormalization()(h)
h = layers.MaxPooling2D(pool_size=(2, 2), name='pool3')(h)
h = layers.Dropout(0.4)(h)

h = layers.Flatten()(h)
y = layers.Dense(num_classes, activation='softmax', name='output')(h)


model = models.Model(x, y)
print(model.summary())

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input (InputLayer)           [(None, 32, 32, 3)]       0         
_________________________________________________________________
batch_normalization (BatchNo (None, 32, 32, 3)         12        
_________________________________________________________________
conv1 (Conv2D)               (None, 30, 30, 32)        896       
_________________________________________________________________
conv2 (Conv2D)               (None, 28, 28, 32)        9248      
_________________________________________________________________
batch_normalization_1 (Batch (None, 28, 28, 32)        128       
_________________________________________________________________
pool1 (MaxPooling2D)         (None, 14, 14, 32)        0         
_________________________________________________________________
dropout (Dropout)            (None, 14, 14, 32)        0     

In [4]:
epochs = 25
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(X_train, Y_train, batch_size=batch_size,
          epochs=epochs, validation_split=0.1, verbose=2)

score = model.evaluate(X_test, Y_test)
print()
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Epoch 1/25
1407/1407 - 40s - loss: 1.7927 - accuracy: 0.4387 - val_loss: 1.1565 - val_accuracy: 0.6130
Epoch 2/25
1407/1407 - 7s - loss: 1.0964 - accuracy: 0.6312 - val_loss: 0.9966 - val_accuracy: 0.6844
Epoch 3/25
1407/1407 - 7s - loss: 0.9275 - accuracy: 0.6989 - val_loss: 0.8007 - val_accuracy: 0.7444
Epoch 4/25
1407/1407 - 7s - loss: 0.8517 - accuracy: 0.7301 - val_loss: 0.7570 - val_accuracy: 0.7618
Epoch 5/25
1407/1407 - 7s - loss: 0.7984 - accuracy: 0.7576 - val_loss: 0.7626 - val_accuracy: 0.7742
Epoch 6/25
1407/1407 - 7s - loss: 0.7652 - accuracy: 0.7748 - val_loss: 0.7829 - val_accuracy: 0.7794
Epoch 7/25
1407/1407 - 7s - loss: 0.7404 - accuracy: 0.7888 - val_loss: 0.7962 - val_accuracy: 0.7756
Epoch 8/25
1407/1407 - 7s - loss: 0.7188 - accuracy: 0.8028 - val_loss: 0.7970 - val_accuracy: 0.7802
Epoch 9/25
1407/1407 - 7s - loss: 0.7060 - accuracy: 0.8116 - val_loss: 0.7135 - val_accuracy: 0.8156
Epoch 10/25
1407/1407 - 7s - loss: 0.6917 - accuracy: 0.8196 - val_loss: 0.7211 -