## top-5 error, top-1 error

classifier가 새로운 테스트 이미지에 대해서 예측한 클래스 top-1 class가 실제 클래스와 같다면 top-1 error는 0%,
top-5 class 에 실제 클래스가 포함되어 있다면 top-5 error는 0%이다.    
물론 class가 5개이면 무조건 0%가 나올 것이지만 보통은 class는 5개보다 훨씬 많다.    
1000개의 클래스로 훈련된 classifier의 top-5 error가 5%보다 작으면 분류 능력이 좋다고 판단한다.(1000개의 클래스에서 높은 확률로 판단한 top-5의 경우 서로 유사할 가능성이 크기 때문에)    

https://bskyvision.com/422

https://www.ted.com/talks/fei_fei_li_how_we_re_teaching_computers_to_understand_pictures

### VGG
VGG의 원래 목적은 네트워크의 깊이가 성능에 어떤 영향을 미치는지 확인하는 것 이었다. 때문에 CNN 필터 커널 사이즈를 3X3으로 고정했다..

![image](https://user-images.githubusercontent.com/63278762/127593483-1f5cad2d-7216-4a20-a9e3-5f0ef4d65e44.png)

* D : VGG16
* E : VGG19

VGG16 레이어 수는 16개. 레이어를 쌓아나가는 방식이다.

### GoogLeNet
여기서 주목해야 할 것은 인셉션 모듈을 사용했다는 것이다.    
![image](https://user-images.githubusercontent.com/63278762/127593797-168ed8e7-16c5-474c-a43c-936e139d49f6.png)

레이어의 수는 22개, inception block을 사용해 사고를 확장해나갔다.    
* inception block : 하나의 레이어에 filter, pooling 같은 기능을 넣었다.

## Vanishing Gradient

모델이 깊어질 수록 gradient가 사라지는 현상이 발생. vanishing gradient = 기울기 소실    
https://www.youtube.com/watch?v=qhXZsFVxGKo&feature=youtu.be


매우 깊은 신경망의 문제는 기울기 소실과 폭발이다.    
vanishing/exploding gradients

### ResNet

Vanishing Gradient를 해결한 네트워크    
Skip Connection 구조를 이용해서 Vanishing Gradient를 해결했다.

![image](https://user-images.githubusercontent.com/63278762/127600932-64718ecd-8466-45b4-ab69-f73c7efd2a53.png)

https://www.youtube.com/watch?v=ZILIbUvp5lk&feature=youtu.be    

지름길이라 생각해도 될 것 같다.    
skip connection을 사용한 네트워크 : https://theaisummer.com/skip-connections/

# VGG-16
cifar100 이미지 분류기를 VGG로 바꿔보자.

In [2]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.python.keras import layers
from tensorflow.python.keras.applications import imagenet_utils

cifar100 = keras.datasets.cifar100

(X_train, y_train), (X_test, y_test) = cifar100.load_data()
X_train, X_test = X_train/255.0, X_test/255.0

In [4]:
X_train.shape

(50000, 32, 32, 3)

In [7]:
img_input = layers.Input(shape=(32,32, 3))
# Block 1
x = layers.Conv2D(64, (3, 3),
                  activation='relu',
                  padding='same',
                  name='block1_conv1')(img_input)
x = layers.Conv2D(64, (3, 3),
                  activation='relu',
                  padding='same',
                  name='block1_conv2')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)
# Block 2
x = layers.Conv2D(128, (3, 3),
                  activation='relu',
                  padding='same',
                  name='block2_conv1')(x)
x = layers.Conv2D(128, (3, 3),
                  activation='relu',
                  padding='same',
                  name='block2_conv2')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)
# Block 3
x = layers.Conv2D(256, (3, 3),
                  activation='relu',
                  padding='same',
                  name='block3_conv1')(x)
x = layers.Conv2D(256, (3, 3),
                  activation='relu',
                  padding='same',
                  name='block3_conv2')(x)
x = layers.Conv2D(256, (3, 3),
                  activation='relu',
                  padding='same',
                  name='block3_conv3')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x)

# Block 4
x = layers.Conv2D(512, (3, 3),
                  activation='relu',
                  padding='same',
                  name='block4_conv1')(x)
x = layers.Conv2D(512, (3, 3),
                  activation='relu',
                  padding='same',
                  name='block4_conv2')(x)
x = layers.Conv2D(512, (3, 3),
                  activation='relu',
                  padding='same',
                  name='block4_conv3')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x)

# Block 5
x = layers.Conv2D(512, (3, 3),
                  activation='relu',
                  padding='same',
                  name='block5_conv1')(x)
x = layers.Conv2D(512, (3, 3),
                  activation='relu',
                  padding='same',
                  name='block5_conv2')(x)
x = layers.Conv2D(512, (3, 3),
                  activation='relu',
                  padding='same',
                  name='block5_conv3')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x)

# Classification block
x = layers.Flatten(name='flatten')(x)
x = layers.Dense(4096, activation='relu', name='fc1')(x)
x = layers.Dense(4096, activation='relu', name='fc2')(x)
classes=100
output = layers.Dense(classes, activation='softmax', name='predictions')(x)

In [9]:
model = keras.Model(name='VGG16', inputs=img_input, outputs=output)
model.summary()

Model: "VGG16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_3 (InputLayer)         [(None, 32, 32, 3)]       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 32, 32, 64)        1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 32, 32, 64)        36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 16, 16, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 16, 16, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 16, 16, 128)       147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 8, 8, 128)         0     

In [10]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=1)



<tensorflow.python.keras.callbacks.History at 0x7f3020cd7cd0>

# ResNet50
cifar100 이미지 분류기를 ResNet로 바꿔보자.

In [13]:
from tensorflow.python.keras import backend
from tensorflow.python.keras import regularizers
from tensorflow.python.keras import initializers
from tensorflow.python.keras import models

# L2 regularizer
def _gen_l2_regularizer(use_l2_regularizer=True, l2_weight_decay=1e-4):
    return regularizers.l2(l2_weight_decay) if use_l2_regularizer else None

In [21]:
def identity_block(input_tensor,
               kernel_size,
               filters,
               stage,
               block,
               strides=(2, 2),
              use_l2_regularizer=True,
              batch_norm_decay=0.9,
              batch_norm_epsilon=1e-5):
    filters1, filters2, filters3 = filters
    if backend.image_data_format() == 'channels_last': bn_axis = 3
    else: bn_axis = 1
    conv_name_base = 'res' + str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'

    x = layers.Conv2D(
        filters1, (1, 1),
        use_bias=False,
        kernel_initializer='he_normal',
        kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer),
        name=conv_name_base + '2a')(
            input_tensor)
    x = layers.BatchNormalization(
        axis=bn_axis,
        momentum=batch_norm_decay,
        epsilon=batch_norm_epsilon,
        name=bn_name_base + '2a')(
            x)
    x = layers.Activation('relu')(x)

    x = layers.Conv2D(
        filters2,
        kernel_size,
        padding='same',
        use_bias=False,
        kernel_initializer='he_normal',
        kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer),
        name=conv_name_base + '2b')(
            x)
    x = layers.BatchNormalization(
        axis=bn_axis,
        momentum=batch_norm_decay,
        epsilon=batch_norm_epsilon,
        name=bn_name_base + '2b')(
            x)
    x = layers.Activation('relu')(x)

    x = layers.Conv2D(
        filters3, (1, 1),
        use_bias=False,
        kernel_initializer='he_normal',
        kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer),
        name=conv_name_base + '2c')(
            x)
    x = layers.BatchNormalization(
        axis=bn_axis,
        momentum=batch_norm_decay,
        epsilon=batch_norm_epsilon,
        name=bn_name_base + '2c')(
            x)

    x = layers.add([x, input_tensor])
    x = layers.Activation('relu')(x)
    return x

In [22]:
def conv_block(input_tensor,
               kernel_size,
               filters,
               stage,
               block,
               strides=(2, 2),
               use_l2_regularizer=True,
               batch_norm_decay=0.9,
               batch_norm_epsilon=1e-5):

    filters1, filters2, filters3 = filters
    if backend.image_data_format() == 'channels_last':
          bn_axis = 3
    else:
          bn_axis = 1
    conv_name_base = 'res' + str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'

    x = layers.Conv2D(
        filters1, (1, 1),
        use_bias=False,
        kernel_initializer='he_normal',
        kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer),
        name=conv_name_base + '2a')(
            input_tensor)
    x = layers.BatchNormalization(
        axis=bn_axis,
        momentum=batch_norm_decay,
        epsilon=batch_norm_epsilon,
        name=bn_name_base + '2a')(
            x)
    x = layers.Activation('relu')(x)

    x = layers.Conv2D(
        filters2,
        kernel_size,
        strides=strides,
        padding='same',
        use_bias=False,
        kernel_initializer='he_normal',
        kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer),
        name=conv_name_base + '2b')(
            x)
    x = layers.BatchNormalization(
        axis=bn_axis,
        momentum=batch_norm_decay,
        epsilon=batch_norm_epsilon,
        name=bn_name_base + '2b')(
            x)
    x = layers.Activation('relu')(x)

    x = layers.Conv2D(
        filters3, (1, 1),
        use_bias=False,
        kernel_initializer='he_normal',
        kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer),
        name=conv_name_base + '2c')(
            x)
    x = layers.BatchNormalization(
        axis=bn_axis,
        momentum=batch_norm_decay,
        epsilon=batch_norm_epsilon,
        name=bn_name_base + '2c')(
            x)

    shortcut = layers.Conv2D(
        filters3, (1, 1),
        strides=strides,
        use_bias=False,
        kernel_initializer='he_normal',
        kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer),
        name=conv_name_base + '1')(
            input_tensor)
    shortcut = layers.BatchNormalization(
        axis=bn_axis,
        momentum=batch_norm_decay,
        epsilon=batch_norm_epsilon,
        name=bn_name_base + '1')(
            shortcut)

    x = layers.add([x, shortcut])
    x = layers.Activation('relu')(x)
    return x


In [27]:
def resnet50(num_classes,
             batch_size=None,
             use_l2_regularizer=True,
             rescale_inputs=False,
             batch_norm_decay=0.9,
             batch_norm_epsilon=1e-5):
    input_shape = (32, 32, 3)
    img_input = layers.Input(shape=input_shape, batch_size=batch_size)

    if rescale_inputs:
        x = layers.Lambda(
              lambda x: backend.permute_dimensions(x, (0, 3, 1, 2)),
              name='transpose')(img_input)
        bn_axis = 1
    else:  # channels_last
        x = img_input
        bn_axis = 3

    x = layers.ZeroPadding2D(padding=(3, 3), name='conv1_pad')(x)
    x = layers.Conv2D(
        64, (7, 7),
        strides=(2, 2),
        padding='valid',
        use_bias=False,
        kernel_initializer='he_normal',
        kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer),
        name='conv1')(
            x)
    x = layers.BatchNormalization(
        axis=bn_axis,
        momentum=batch_norm_decay,
        epsilon=batch_norm_epsilon,
        name='bn_conv1')(
            x)
    x = layers.Activation('relu')(x)
    x = layers.MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)

    x = conv_block(
        x,
        3, [64, 64, 256],
        stage=2,
        block='a',
        strides=(1, 1),
        use_l2_regularizer=use_l2_regularizer)
    x = identity_block(
        x,
        3, [64, 64, 256],
        stage=2,
        block='b',
        use_l2_regularizer=use_l2_regularizer)
    x = identity_block(
        x,
        3, [64, 64, 256],
        stage=2,
        block='c',
        use_l2_regularizer=use_l2_regularizer)

    x = conv_block(
        x,
        3, [128, 128, 512],
        stage=3,
        block='a',
        use_l2_regularizer=use_l2_regularizer)
    x = identity_block(
        x,
        3, [128, 128, 512],
        stage=3,
        block='b',
        use_l2_regularizer=use_l2_regularizer)
    x = identity_block(
        x,
        3, [128, 128, 512],
        stage=3,
        block='c',
        use_l2_regularizer=use_l2_regularizer)
    x = identity_block(
        x,
        3, [128, 128, 512],
        stage=3,
        block='d',
        use_l2_regularizer=use_l2_regularizer)

    x = conv_block(
        x,
        3, [256, 256, 1024],
        stage=4,
        block='a',
        use_l2_regularizer=use_l2_regularizer)
    x = identity_block(
        x,
        3, [256, 256, 1024],
        stage=4,
        block='b',
        use_l2_regularizer=use_l2_regularizer)
    x = identity_block(
        x,
        3, [256, 256, 1024],
        stage=4,
        block='c',
        use_l2_regularizer=use_l2_regularizer)
    x = identity_block(
        x,
        3, [256, 256, 1024],
        stage=4,
        block='d',
        use_l2_regularizer=use_l2_regularizer)
    x = identity_block(
        x,
        3, [256, 256, 1024],
        stage=4,
        block='e',
        use_l2_regularizer=use_l2_regularizer)
    x = identity_block(
        x,
        3, [256, 256, 1024],
        stage=4,
        block='f',
        use_l2_regularizer=use_l2_regularizer)

    x = conv_block(
        x,
        3, [512, 512, 2048],
        stage=5,
        block='a',
        use_l2_regularizer=use_l2_regularizer)
    x = identity_block(
        x,
        3, [512, 512, 2048],
        stage=5,
        block='b',
        use_l2_regularizer=use_l2_regularizer)
    x = identity_block(
        x,
        3, [512, 512, 2048],
        stage=5,
        block='c',
        use_l2_regularizer=use_l2_regularizer)

    rm_axes = [1, 2] if backend.image_data_format() == 'channels_last' else [2, 3]
    x = layers.Lambda(lambda x: backend.mean(x, rm_axes), name='reduce_mean')(x)
    x = layers.Dense(
        num_classes,
        kernel_initializer=initializers.RandomNormal(stddev=0.01),
        kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer),
        bias_regularizer=_gen_l2_regularizer(use_l2_regularizer),
        name='fc1000')(x)

    # A softmax that is followed by the model loss must be done cannot be done
    # in float16 due to numeric issues. So we pass dtype=float32.
    x = layers.Activation('softmax', dtype='float32')(x)

    # Create model.
    return models.Model(img_input, x, name='resnet50')


In [28]:
model = resnet50(num_classes=100)
model.summary()

Model: "resnet50"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_7 (InputLayer)            [(None, 32, 32, 3)]  0                                            
__________________________________________________________________________________________________
conv1_pad (ZeroPadding2D)       (None, 38, 38, 3)    0           input_7[0][0]                    
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 16, 16, 64)   9408        conv1_pad[0][0]                  
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization)   (None, 16, 16, 64)   256         conv1[0][0]                      
___________________________________________________________________________________________

In [29]:
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(X_train, y_train, epochs=1)



<tensorflow.python.keras.callbacks.History at 0x7f3090876810>