Title: Dense Net
Date: 2020-03-17 13:40
Category: CNN
Tags: Implementations
Slug: Dense Net
Author: Jordan Chen
Summary: Implementations of DneseNet


## 前言

ResNet模型的出現是CNN史上的一個里程碑，ResNet可以訓練出更深的CNN模型，從而實現更高的準確度。 ResNet模型的核心是通過建立前面層與後面層之間的“短路連接”（shortcuts，skip connection），這有助於訓練過程中梯度的反向傳播，從而能訓練出更深的CNN網絡。以下要介紹的是DenseNet模型，它的基本思路與ResNet一致，但是它建立的是前面所有層與後面層的密集連接（dense connection），它的名稱也是由此而來。 DenseNet的另一大特色是通過特徵在channel上的連接來實現特徵重用（feature reuse）。這些特點讓DenseNet在參數和計算成本更少的情形下實現比ResNet更優的性能，DenseNet也因此斬獲CVPR 2017的最佳論文獎。


## DenseNet 架構
DenseNets分為DenseBlocks，其中要素圖的尺寸在一個塊內保持不變，但是過濾器的數量在它們之間變化。 它們之間的這些層稱為過渡層，並應用批量歸一化，1x1卷積和2x2池化層來進行下採樣。

<img src='https://github.com/jordanchenml/DL_Implementations_with_Tensorflow/blob/master/models/DenseNet/assets/densenet_1.png?raw=true' width=800px>

### Dense Blocks

相比ResNet，DenseNet提出了一個更激進的密集連接機制：即互相連接所有的層，具體來說就是每個層都會接受其前面所有層作為其額外的輸入。以下為ResNet與DenseNet網絡的連接機制，作為對比，可以看到，ResNet是每個層與前面的某層（一般是2~3層）短路連接在一起，連接方式是通過 element-wise 相加。而在DenseNet中，每個層都會與前面所有層在channel維度上連接（concat）在一起（這裡各個層的 feature map 大小是相同的，後面會有說明），並作為下一層的輸入。對於一個 L 層的網絡，DenseNet共包含 L（L + 1）/ 2 個連接，相比ResNet，這是一種密集連接。而且DenseNet是直接連接來自不同層的特徵圖，這可以實現特徵重用，提升效率，這一特點是DenseNet與ResNet最主要的區別。

<img src='https://github.com/jordanchenml/DL_Implementations_with_Tensorflow/blob/master/models/DenseNet/assets/densenet_2.png?raw=true' width=500px>

下圖為公式表示之比較 

<img src='https://github.com/jordanchenml/DL_Implementations_with_Tensorflow/blob/master/models/DenseNet/assets/densenet_4.png?raw=true' width=800px> 

DenseNet的前向過程如下圖所示，可以更直觀地理解其密集連接方式，比如 h3 的輸入不僅包括來自 h2 的 x2 ，還包括前面兩層的 x0 和 x1 ，它們是在channel維度上連接在一起的。
<img src='https://github.com/jordanchenml/DL_Implementations_with_Tensorflow/blob/master/models/DenseNet/assets/densenet_3.gif?raw=true' width=800px> 

#### Growth Rate

由於連接 feature map ，因此 channek 的尺寸在每一層都在增加。 如果我們使 H_1 每次生成 k 個特徵圖，那麼我們可以歸納為第l層為：

<img src='https://github.com/jordanchenml/DL_Implementations_with_Tensorflow/blob/master/models/DenseNet/assets/growth_rate_formula.png?raw=true' width=200px> 

該超參數 k 即是 growth rate。 growth rate調節每層網絡中添加了多少訊息。
我們可以將 feature map 視為網絡的訊息。 每個層都可以訪問其先前的 feature map ，因此可以訪問 collective knowledge。 然後，每層都在具體的 k 個 feature map 中將新訊息添加到此 collective knowledge 中

### Compression Rate

對於我們希望減少輸出 feature map 數量的情況。 compression rate 決定了減少的幅度。假設才特定layer上具有m個 feature map ，經過壓縮後的大小為 compression_rate * m。 Compression rate 的範圍是 [0-1]。 因此，當 compression_rate = 1時，DenseNets將保持不變。

### Bottleneck Layers

在進行BN-ReLU-3×3轉換之前，先完成BN-ReLU-1×1轉換。
bottleneck 的用意在於減少計算量，能使最後輸出的總資料量為 growth rate 乘上最後輸出的量(不含前面重覆連接的部分)，因為密集連接的方式會使得最後一個 channel 的輸出為 n 平方，當堆疊越多時這樣的計算量會照成很大的影響。以 DenseNet_121 第三個Dense Block 為例，若不經過 bottleneck layer 最後一層的輸出為 32\*32。而bottleneck layer 可以把輸出轉為 4*growth_rate = 4*32 = 128，大大降低模型的複雜性和大小。

<img src='https://github.com/jordanchenml/DL_Implementations_with_Tensorflow/blob/master/models/DenseNet/assets/bottleneck.png?raw=true' width=800px> 

### Transition Layers
1×1 Conv 和 2×2 Average Pooling 被用作兩個連續 Dense Block 之間的 transition layer。在 Dense Block 內的 Feature Map 大小相同，因此可以輕鬆地將它們串聯在一起。

<img src='https://github.com/jordanchenml/DL_Implementations_with_Tensorflow/blob/master/models/DenseNet/assets/multipledenseblocks.png?raw=true' width=800px> 

### Global Average Pooling

在最後一個密集塊的末尾，執行全局平均池化，然後附加softmax分類器。

## Advantages of DenseNet

### Strong Gradient Flow
誤差信號可以很容易地更直接地傳播到較早的層。這是一種隱式的深度監管，因為較早的層可以直接從最終分類層獲得回饋。

<img src='https://github.com/jordanchenml/DL_Implementations_with_Tensorflow/blob/master/models/DenseNet/assets/densenet_5.png?raw=true' width=800px> 


### Parameter & Computational Efficiency
對於每一層，ResNet 中的參數數量與 C×C 成正比，而 DenseNet 中的參數數量與 l×k×k 成正比。 由於 k << C，DenseNet 的大小比 ResNet 小得多。

<img src='https://github.com/jordanchenml/DL_Implementations_with_Tensorflow/blob/master/models/DenseNet/assets/densenet_6.png?raw=true' width=800px> 


### More Diversified Features
由於 DenseNet 中的每一層都接收所有先前的層作為輸入，因此特徵更加多樣化。

<img src='https://github.com/jordanchenml/DL_Implementations_with_Tensorflow/blob/master/models/DenseNet/assets/densenet_7.png?raw=true' width=800px> 


### Maintains Low Complexity Features
在標準ConvNet中，分類器使用最複雜的功能。

<img src='https://github.com/jordanchenml/DL_Implementations_with_Tensorflow/blob/master/models/DenseNet/assets/densenet_8.png?raw=true' width=800px> 

在 DenseNet 中，分類器使用所有復雜度級別的功能。它傾向於給出更平滑的決策邊界。這也解釋了為什麼訓練數據不足時 DenseNet 表現良好。

<img src='https://github.com/jordanchenml/DL_Implementations_with_Tensorflow/blob/master/models/DenseNet/assets/densenet_9.png?raw=true' width=800px> 


In [0]:
try:
    # %tensorflow_version only exists in Colab.
    %tensorflow_version 2.x
except Exception:
    pass

In [1]:
# !pip install tensorflow==2.0.0
import tensorflow as tf
tf.config.experimental_run_functions_eagerly(True)
print(tf.__version__)

2.0.0


In [0]:
# Parameters
ACCURACY_THRESHOLD = 0.99
NUM_CLASSES = 10
IMAGE_HEIGHT = 32
IMAGE_WIDTH = 32
CHANNELS = 3

In [0]:
class BottleNeck(tf.keras.layers.Layer):
    def __init__(self, growth_rate, drop_rate):
        super(BottleNeck, self).__init__()
        self.bn1 = tf.keras.layers.BatchNormalization()
        self.conv1 = tf.keras.layers.Conv2D(filters=4 * growth_rate,
                                            kernel_size=(1, 1),
                                            strides=1,
                                            padding="same")
        self.bn2 = tf.keras.layers.BatchNormalization()
        self.conv2 = tf.keras.layers.Conv2D(filters=growth_rate,
                                            kernel_size=(3, 3),
                                            strides=1,
                                            padding="same")
        self.dropout = tf.keras.layers.Dropout(rate=drop_rate)

    def call(self, inputs, training=None, **kwargs):
        x = self.bn1(inputs, training=training)
        x = tf.nn.relu(x)
        x = self.conv1(x)
        x = self.bn2(x, training=training)
        x = tf.nn.relu(x)
        x = self.conv2(x)
        x = self.dropout(x, training=training)
        return x

In [0]:
class DenseBlock(tf.keras.layers.Layer):
    def __init__(self, num_layers, growth_rate, drop_rate):
        super(DenseBlock, self).__init__()
        self.num_layers = num_layers
        self.growth_rate = growth_rate
        self.drop_rate = drop_rate
        self.features_list = []

    def _make_layer(self, x, training):
        y = BottleNeck(growth_rate=self.growth_rate, drop_rate=self.drop_rate)(x, training=training)
        self.features_list.append(y)
        y = tf.concat(self.features_list, axis=-1)
        return y

    def call(self, inputs, training=None, **kwargs):
        self.features_list.append(inputs)
        x = self._make_layer(inputs, training=training)
        for i in range(1, self.num_layers):
            x = self._make_layer(x, training=training)
        self.features_list.clear()
        return x

In [0]:
class TransitionLayer(tf.keras.layers.Layer):
    def __init__(self, out_channels):
        super(TransitionLayer, self).__init__()
        self.bn = tf.keras.layers.BatchNormalization()
        self.conv = tf.keras.layers.Conv2D(filters=out_channels,
                                           kernel_size=(1, 1),
                                           strides=1,
                                           padding="same")
        self.pool = tf.keras.layers.MaxPool2D(pool_size=(2, 2),
                                              strides=2,
                                              padding="same")

    def call(self, inputs, training=None, **kwargs):
        x = self.bn(inputs, training=training)
        x = tf.nn.relu(x)
        x = self.conv(x)
        x = self.pool(x)
        return x

In [0]:
class DenseNet(tf.keras.Model):
    def __init__(self, num_init_features, growth_rate, block_layers, compression_rate, drop_rate):
        super(DenseNet, self).__init__()
        self.conv = tf.keras.layers.Conv2D(filters=num_init_features,
                                           kernel_size=(7, 7),
                                           strides=2,
                                           padding="same")
        self.bn = tf.keras.layers.BatchNormalization()
        self.pool = tf.keras.layers.MaxPool2D(pool_size=(3, 3),
                                              strides=2,
                                              padding="same")
        self.num_channels = num_init_features
        self.dense_block_1 = DenseBlock(num_layers=block_layers[0], growth_rate=growth_rate, drop_rate=drop_rate)
        self.num_channels += growth_rate * block_layers[0]
        self.num_channels = compression_rate * self.num_channels
        self.transition_1 = TransitionLayer(out_channels=int(self.num_channels))
        self.dense_block_2 = DenseBlock(num_layers=block_layers[1], growth_rate=growth_rate, drop_rate=drop_rate)
        self.num_channels += growth_rate * block_layers[1]
        self.num_channels = compression_rate * self.num_channels
        self.transition_2 = TransitionLayer(out_channels=int(self.num_channels))
        self.dense_block_3 = DenseBlock(num_layers=block_layers[2], growth_rate=growth_rate, drop_rate=drop_rate)
        self.num_channels += growth_rate * block_layers[2]
        self.num_channels = compression_rate * self.num_channels
        self.transition_3 = TransitionLayer(out_channels=int(self.num_channels))
        self.dense_block_4 = DenseBlock(num_layers=block_layers[3], growth_rate=growth_rate, drop_rate=drop_rate)

        self.avgpool = tf.keras.layers.GlobalAveragePooling2D()
        self.fc = tf.keras.layers.Dense(units=NUM_CLASSES,
                                        activation=tf.keras.activations.softmax)

    def call(self, inputs, training=None, mask=None):
        x = self.conv(inputs)
        x = self.bn(x, training=training)
        x = tf.nn.relu(x)
        x = self.pool(x)

        x = self.dense_block_1(x, training=training)
        x = self.transition_1(x, training=training)
        x = self.dense_block_2(x, training=training)
        x = self.transition_2(x, training=training)
        x = self.dense_block_3(x, training=training)
        x = self.transition_3(x, training=training)
        x = self.dense_block_4(x, training=training)

        x = self.avgpool(x)
        x = self.fc(x)

        return x
    
    def build_graph(self, input_shape): 
        input_shape_nobatch = input_shape[1:]
        self.build(input_shape)
        inputs = tf.keras.Input(shape=input_shape_nobatch)
        
        if not hasattr(self, 'call'):
            raise AttributeError("User should define 'call' method in sub-class model!")
        
        _ = self.call(inputs)

In [0]:
def densenet_121():
    return DenseNet(num_init_features=64, growth_rate=32,
                    block_layers=[6, 12, 24, 16], compression_rate=0.5,
                    drop_rate=0.5)


def densenet_169():
    return DenseNet(num_init_features=64, growth_rate=32,
                    block_layers=[6, 12, 32, 32], compression_rate=0.5,
                    drop_rate=0.5)


def densenet_201():
    return DenseNet(num_init_features=64, growth_rate=32,
                    block_layers=[6, 12, 48, 32], compression_rate=0.5,
                    drop_rate=0.5)


def densenet_264():
    return DenseNet(num_init_features=64, growth_rate=32,
                    block_layers=[6, 12, 64, 48], compression_rate=0.5,
                    drop_rate=0.5)


In [0]:
def prepare_dataset():
    cifar10 = tf.keras.datasets.cifar10
    (x_train, y_train), (x_test, y_test) = cifar10.load_data()

    x_train = x_train.astype('float32')
    x_test = x_test.astype('float32')

    x_train = tf.convert_to_tensor(x_train)
    x_test = tf.convert_to_tensor(x_test)

    y_train = tf.convert_to_tensor(y_train)
    y_test = tf.convert_to_tensor(y_test)

    print("### Dataset Shape ###")
    print('x_train shape:', x_train.shape)
    print('x_test shape:', x_test.shape)
    print('y_train shape:', y_train.shape)
    print('y_test shape:', y_test.shape)

    return x_train, x_test, y_train, y_test

In [0]:
class myCallback(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs={}):
        if logs.get('accuracy'):
            if (logs.get('accuracy') > 0.99):
                print("\nReached 99% accuracy so cancelling training!")
                self.model.stop_training = True

In [0]:
def print_model_summary(network):
    network.build_graph((1, IMAGE_HEIGHT, IMAGE_WIDTH, CHANNELS))
    network.summary()

In [0]:
# GPU settings
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    print("### Using GPU ###")
    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, True)

x_train, x_test, y_train, y_test = prepare_dataset()

model = densenet_121()
print_model_summary(model)

initial_learning_rate = 0.1
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate,
    decay_steps=100000,
    decay_rate=0.96,
    staircase=True)

model.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=lr_schedule,
                                                momentum=0.0,
                                                nesterov=False),
                loss='sparse_categorical_crossentropy',
                metrics=['accuracy'])

print('\n### Fit model on training data ###')
callbacks = myCallback()
history = model.fit(x_train, y_train, epochs=1500, validation_split=0.1, batch_size=1024, callbacks=[callbacks])
print('\nhistory dict:', history.history)

model.evaluate(x_test, y_test, verbose=2)

print('\nhistory dict:', history.history)

# Evaluate the model on the test data using `evaluate`
print('\n# Evaluate on test data')
results = model.evaluate(x_test, y_test, batch_size=128)
print('test loss, test acc:', results)

# Generate predictions (probabilities -- the output of the last layer)
# on new data using `predict`
print('\n# Generate predictions for 3 samples')
predictions = model.predict(x_test[:3])
print('predictions:', predictions.argmax(axis=1))
print('Answer:', y_test[:3])


### Using GPU ###
### Dataset Shape ###
x_train shape: (50000, 32, 32, 3)
x_test shape: (10000, 32, 32, 3)
y_train shape: (50000, 1)
y_test shape: (10000, 1)
Model: "dense_net"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 16, 16, 64)        9472      
_________________________________________________________________
batch_normalization (BatchNo (None, 16, 16, 64)        256       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 8, 8, 64)          0         
_________________________________________________________________
dense_block (DenseBlock)     (None, 8, 8, 256)         0         
_________________________________________________________________
transition_layer (Transition (None, 4, 4, 128)         33920     
_________________________________________________________________
dense_block_1 (DenseBlock)   (N

Epoch 41/1500
Epoch 42/1500
Epoch 43/1500
Epoch 44/1500
Epoch 45/1500
Epoch 46/1500
Epoch 47/1500
Epoch 48/1500
Epoch 49/1500
Epoch 50/1500
Epoch 51/1500
Epoch 52/1500
Epoch 53/1500
Epoch 54/1500
Epoch 55/1500
Epoch 56/1500
Epoch 57/1500
Epoch 58/1500
Epoch 59/1500
Epoch 60/1500
Epoch 61/1500
Epoch 62/1500
Epoch 63/1500
Epoch 64/1500
Epoch 65/1500
Epoch 66/1500
Epoch 67/1500
Epoch 68/1500
Epoch 69/1500
Epoch 70/1500
Epoch 71/1500
Epoch 72/1500
Epoch 73/1500
Epoch 74/1500
Epoch 75/1500
Epoch 76/1500
Epoch 77/1500
Epoch 78/1500
Epoch 79/1500
Epoch 80/1500
Epoch 81/1500
Epoch 82/1500
Epoch 83/1500
Epoch 84/1500
Epoch 85/1500
Epoch 86/1500
Epoch 87/1500
Epoch 88/1500
Epoch 89/1500
Epoch 90/1500
Epoch 91/1500
Epoch 92/1500
Epoch 93/1500


Epoch 94/1500
Epoch 95/1500
Epoch 96/1500
Epoch 97/1500
Epoch 98/1500
Epoch 99/1500
Epoch 100/1500
Epoch 101/1500
Epoch 102/1500
Epoch 103/1500
Epoch 104/1500
Epoch 105/1500
Epoch 106/1500
Epoch 107/1500
Epoch 108/1500
Epoch 109/1500
Epoch 110/1500
Epoch 111/1500
Epoch 112/1500
Epoch 113/1500
Epoch 114/1500
Epoch 115/1500
Epoch 116/1500
Epoch 117/1500
Epoch 118/1500
Epoch 119/1500
Epoch 120/1500
Epoch 121/1500
Epoch 122/1500
Epoch 123/1500
Epoch 124/1500
Epoch 125/1500
Epoch 126/1500
Epoch 127/1500
Epoch 128/1500
Epoch 129/1500
Epoch 130/1500
Epoch 131/1500
Epoch 132/1500
Epoch 133/1500
Epoch 134/1500
Epoch 135/1500
Epoch 136/1500
Epoch 137/1500
Epoch 138/1500
Epoch 139/1500
Epoch 140/1500
Epoch 141/1500
Epoch 142/1500
Epoch 143/1500
Epoch 144/1500
Epoch 145/1500
Epoch 146/1500
Epoch 147/1500


Epoch 148/1500
Epoch 149/1500
Epoch 150/1500
Epoch 151/1500
Epoch 152/1500
Epoch 153/1500
Epoch 154/1500
Epoch 155/1500
Epoch 156/1500
Epoch 157/1500
Epoch 158/1500
Epoch 159/1500
Epoch 160/1500
Epoch 161/1500
Epoch 162/1500
Epoch 163/1500
Epoch 164/1500
Epoch 165/1500
Epoch 166/1500
Epoch 167/1500
Epoch 168/1500
Epoch 169/1500
Epoch 170/1500
Epoch 171/1500
Epoch 172/1500
Epoch 173/1500
Epoch 174/1500
Epoch 175/1500
Epoch 176/1500
Epoch 177/1500
Epoch 178/1500
Epoch 179/1500
Epoch 180/1500
Epoch 181/1500
Epoch 182/1500
Epoch 183/1500
Epoch 184/1500
Epoch 185/1500
Epoch 186/1500
Epoch 187/1500
Epoch 188/1500
Epoch 189/1500
Epoch 190/1500
Epoch 191/1500
Epoch 192/1500
Epoch 193/1500
Epoch 194/1500
Epoch 195/1500
Epoch 196/1500
Epoch 197/1500
Epoch 198/1500
Epoch 199/1500
Epoch 200/1500


Epoch 201/1500
Epoch 202/1500
Epoch 203/1500
Epoch 204/1500
Epoch 205/1500
Epoch 206/1500
Epoch 207/1500
Epoch 208/1500
Epoch 209/1500
Epoch 210/1500
Epoch 211/1500
Epoch 212/1500
Epoch 213/1500
Epoch 214/1500
Epoch 215/1500
Epoch 216/1500
Epoch 217/1500
Epoch 218/1500
Epoch 219/1500
Epoch 220/1500
Epoch 221/1500
Epoch 222/1500
Epoch 223/1500
Epoch 224/1500
Epoch 225/1500
Epoch 226/1500
Epoch 227/1500
Epoch 228/1500
Epoch 229/1500
Epoch 230/1500
Epoch 231/1500
Epoch 232/1500
Epoch 233/1500
Epoch 234/1500
Epoch 235/1500
Epoch 236/1500
Epoch 237/1500
Epoch 238/1500
Epoch 239/1500
Epoch 240/1500
Epoch 241/1500
Epoch 242/1500
Epoch 243/1500
Epoch 244/1500
Epoch 245/1500
Epoch 246/1500
Epoch 247/1500
Epoch 248/1500
Epoch 249/1500
Epoch 250/1500
Epoch 251/1500
Epoch 252/1500
Epoch 253/1500


Epoch 254/1500
Epoch 255/1500
Epoch 256/1500
Epoch 257/1500
Epoch 258/1500
Epoch 259/1500
Epoch 260/1500
Epoch 261/1500
Epoch 262/1500
Epoch 263/1500
Epoch 264/1500
Epoch 265/1500
Epoch 266/1500
Epoch 267/1500
Epoch 268/1500
Epoch 269/1500
Epoch 270/1500
Epoch 271/1500
Epoch 272/1500
Epoch 273/1500
Epoch 274/1500
Epoch 275/1500
Epoch 276/1500
Epoch 277/1500
Epoch 278/1500
Epoch 279/1500
Epoch 280/1500
Epoch 281/1500
Epoch 282/1500
Epoch 283/1500
Epoch 284/1500
Epoch 285/1500
Epoch 286/1500
Epoch 287/1500
Epoch 288/1500
Epoch 289/1500
Epoch 290/1500
Epoch 291/1500
Epoch 292/1500
Epoch 293/1500
Epoch 294/1500
Epoch 295/1500
Epoch 296/1500
Epoch 297/1500
Epoch 298/1500
Epoch 299/1500
Epoch 300/1500
Epoch 301/1500
Epoch 302/1500
Epoch 303/1500
Epoch 304/1500
Epoch 305/1500
Epoch 306/1500


Epoch 307/1500
Epoch 308/1500
Epoch 309/1500
Epoch 310/1500
Epoch 311/1500
Epoch 312/1500
Epoch 313/1500
Epoch 314/1500
Epoch 315/1500
Epoch 316/1500
Epoch 317/1500
Epoch 318/1500
Epoch 319/1500
Epoch 320/1500
Epoch 321/1500
Epoch 322/1500
Epoch 323/1500
Epoch 324/1500
Epoch 325/1500
Epoch 326/1500
Epoch 327/1500
Epoch 328/1500
Epoch 329/1500
Epoch 330/1500
Epoch 331/1500
Epoch 332/1500
Epoch 333/1500
Epoch 334/1500
Epoch 335/1500
Epoch 336/1500
Epoch 337/1500
Epoch 338/1500
Epoch 339/1500
Epoch 340/1500
Epoch 341/1500
Epoch 342/1500
Epoch 343/1500
Epoch 344/1500
Epoch 345/1500
Epoch 346/1500
Epoch 347/1500
Epoch 348/1500
Epoch 349/1500
Epoch 350/1500
Epoch 351/1500
Epoch 352/1500
Epoch 353/1500
Epoch 354/1500
Epoch 355/1500
Epoch 356/1500
Epoch 357/1500
Epoch 358/1500
Epoch 359/1500


Epoch 360/1500
Epoch 361/1500
Epoch 362/1500
Epoch 363/1500
Epoch 364/1500
Epoch 365/1500
Epoch 366/1500
Epoch 367/1500
Epoch 368/1500
Epoch 369/1500
Epoch 370/1500
Epoch 371/1500
Epoch 372/1500
Epoch 373/1500
Epoch 374/1500
Epoch 375/1500
Epoch 376/1500
Epoch 377/1500
Epoch 378/1500
Epoch 379/1500
Epoch 380/1500
Epoch 381/1500
Epoch 382/1500
Epoch 383/1500
Epoch 384/1500
Epoch 385/1500
Epoch 386/1500
Epoch 387/1500
Epoch 388/1500
Epoch 389/1500
Epoch 390/1500
Epoch 391/1500
Epoch 392/1500
Epoch 393/1500
Epoch 394/1500
Epoch 395/1500
Epoch 396/1500
Epoch 397/1500
Epoch 398/1500
Epoch 399/1500
Epoch 400/1500
Epoch 401/1500
Epoch 402/1500
Epoch 403/1500
Epoch 404/1500
Epoch 405/1500
Epoch 406/1500
Epoch 407/1500
Epoch 408/1500
Epoch 409/1500
Epoch 410/1500
Epoch 411/1500
Epoch 412/1500


Epoch 413/1500
Epoch 414/1500
Epoch 415/1500
Epoch 416/1500
Epoch 417/1500
Epoch 418/1500
Epoch 419/1500
Epoch 420/1500
Epoch 421/1500
Epoch 422/1500
Epoch 423/1500
Epoch 424/1500
Epoch 425/1500
Epoch 426/1500
Epoch 427/1500
Epoch 428/1500
Epoch 429/1500
Epoch 430/1500
Epoch 431/1500
Epoch 432/1500
Epoch 433/1500
Epoch 434/1500
Epoch 435/1500
Epoch 436/1500
Epoch 437/1500
Epoch 438/1500
Epoch 439/1500
Epoch 440/1500
Epoch 441/1500
Epoch 442/1500
Reached 99% accuracy so cancelling training!

history dict: {'loss': [2.737706036843194, 2.365289028167725, 2.2838134432474773, 2.1918446654425727, 2.0953566450330947, 2.0108596756193373, 1.937411028289795, 1.878830226220025, 1.835526297484504, 1.802133373260498, 1.7713647003809612, 1.7421015484703912, 1.7174761914147272, 1.6967490636401705, 1.6703651131100126, 1.65090719487932, 1.628482229338752, 1.6140626823213364, 1.5942828046586779, 1.5759410623338488, 1.5555738201989069, 1.5432473668840196, 1.5264959496604071, 1.5076922081629436, 1.4994235

10000/1 - 355s - loss: 2.0228 - accuracy: 0.6987

history dict: {'loss': [2.737706036843194, 2.365289028167725, 2.2838134432474773, 2.1918446654425727, 2.0953566450330947, 2.0108596756193373, 1.937411028289795, 1.878830226220025, 1.835526297484504, 1.802133373260498, 1.7713647003809612, 1.7421015484703912, 1.7174761914147272, 1.6967490636401705, 1.6703651131100126, 1.65090719487932, 1.628482229338752, 1.6140626823213364, 1.5942828046586779, 1.5759410623338488, 1.5555738201989069, 1.5432473668840196, 1.5264959496604071, 1.5076922081629436, 1.4994235100852118, 1.4819844037585788, 1.4737261624654134, 1.4531545957989163, 1.4439668891482884, 1.423618915939331, 1.420724436039395, 1.4054122832404243, 1.393440122795105, 1.3785895584742227, 1.364434857643975, 1.3628378510369195, 1.3471690735287136, 1.3357195006052653, 1.3215956913630167, 1.312903962855869, 1.3179113380855985, 1.3003098608228896, 1.2919891833411323, 1.2832869088490804, 1.2765205810758802, 1.2690983471340602, 1.2551142485512627, 



test loss, test acc: [1.755268715286255, 0.699]

# Generate predictions for 3 samples
predictions: [3 8 8]
Answer: tf.Tensor(
[[3]
 [8]
 [8]], shape=(3, 1), dtype=uint8)
