# Сверточные нейронные сети

## Что такое сверточный слой?

Сверточный слой - более упрощенный слой, который позволяет сократить количество параметров сети. Важной особенностью слоя является то, что производится операция свертки слоя c набором весов. 

Формально, математическая модель определена следующим образом:

рассмотрим двумерный канал изображения размером $W \times H$, определим сверточный слой с ядром размера $K \times K$ как операцию свертки для каждого квадрата размером $K \times K$ с матрицей весов.

Параметры сверточного слоя:

* количество входных фильтров - $in$
* количество выходных фильтров - $out$.

Для каждого фильтра для пикселя выход определяется следующим образом:

$$
   out_{i,j} = \sum_{s = -[(k-1)/2]}^{[k/2]} \sum_{t=-[(k-1)/2]}^{[k/2]} W_{s, t} I_{i +s, j + t} + b,
$$

где $W$ - матрица весов для фильтра, $b$ - смещение (bias), $I$ - фильтр (двумерный массив размера $W \times H$), к которому применяется свертка.

**Вопрос**. Какое количество тренируемых параметров используется в сверточном слое?

**Ответ** $(K \times K + 1) \times in \times out$.



## Дополнительные параметры сверточного слоя

Дополнительно необходимо определить следующие параметры сверточного слоя:
* stride - шаг, с каким производится свертка
* padding - начальное и конечное положение, с которого начинается свертка.

**Вопрос.** Какой будет размер выходного фильтра, если используется свертка с ядром $K \times K$, stride - (1, 1), начало и конец находятся в вершинах изображения?

**Ответ.** $ (W - K + 1) \times (H - K + 1)$.

Чтобы размер фильтра не менялся, применяется следующая стратегия: входной фильтр дополняется нулями таким образом, чтобы размер выходного слоя был $W \times H$. Такая стратегия называется same padding. Изначальная стратегия называется valid padding.

Приступим к реализации сверточного слоя

In [0]:
import numpy as np
def conv2d_one_filter(X, W, padding='same', stride=(1, 1)):
    """
        @param X: input image, [w \times h]
        @param W: weights, [K \times K]
        @param padding: padding type - same or full
    """
    
    kernel_y, kernel_x = W.shape[:2]
    
    # Calculating shape of new pad
    
    if padding == 'same':
        y_shape = X.shape[0] + kernel_y - 1
        x_shape = X.shape[1] + kernel_x - 1
    else:
        y_shape, x_shape = X.shape[:2]
    
    x_padded = np.zeros((y_shape, x_shape), dtype=X.dtype)
    print(x_padded.shape)
    
    if padding == 'valid':
        padding_left = 0
        padding_top = 0
    else:
        padding_left = (kernel_x - 1) // 2
        padding_top = (kernel_y - 1) // 2
    
    x_padded[
        padding_top:padding_top + X.shape[0],
        padding_left:padding_left + X.shape[1]
    ] = X

    result = np.zeros((x_padded.shape[0] - kernel_y + 1, x_padded.shape[1] - kernel_x + 1))
    
    for y in range(x_padded.shape[0]):
        for x in range(x_padded.shape[1]):
            if y + kernel_y > x_padded.shape[0] or x + kernel_x > x_padded.shape[1]:
                continue
            result[y, x] = np.sum(x_padded[y:y + kernel_y, x:x + kernel_x] * W)
    return result
  

In [0]:
import scipy.signal

In [0]:
!nproc

4


In [0]:
conv2d_one_filter(np.array([
   [1, 2, 3],
   [4, 5, 6],
   [7, 8, 9]
]), np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ])
)

(5, 5)


array([[ 94., 154., 106.],
       [186., 285., 186.],
       [106., 154.,  94.]])

In [0]:
import tensorflow as tf

In [0]:
sess = tf.InteractiveSession()
a = tf.placeholder(tf.float32, [1, 3, 3, 1])
w = tf.placeholder(tf.float32, [3, 3, 1, 1])
out_same = tf.nn.conv2d(a, w, padding='SAME')
out_valid = tf.nn.conv2d(a, w, padding='VALID')

In [0]:
sess.run(out_same, feed_dict={
    a: np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ]).reshape((1, 3, 3, 1)),
    w: np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ]).reshape((3, 3, 1, 1))
})

array([[[[ 94.],
         [154.],
         [106.]],

        [[186.],
         [285.],
         [186.]],

        [[106.],
         [154.],
         [ 94.]]]], dtype=float32)

In [0]:
sess.run(out_valid, feed_dict={
    a: np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ]).reshape((1, 3, 3, 1)),
    w: np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ]).reshape((3, 3, 1, 1))
})

array([[[[285.]]]], dtype=float32)

##  Pooling

Pooling - операция, которая позволяет аггрегировать информацию по некоторому фильтру. Примеры pooling-слоев:
* max pooling - вычисляет максимальное значение в ядре
* average pooling - вычисляет среднее значение в ядре

Является одним из способов снижения размерности:

Если взять max pooling с ядром $2 \times 2$ и с шагом по каждой оси $2 \times 2$, то как изменится размер выходного тензора?

In [0]:
a = tf.placeholder(tf.float32, (1, 3, 3, 1))

In [0]:
pool_valid = tf.nn.max_pool2d(a, ksize=(2, 2), strides=(1, 1), padding='VALID')
pool_same = tf.nn.max_pool2d(a, ksize=(2, 2), strides=(1, 1), padding='SAME')

In [0]:
sess.run(pool_valid, feed_dict={
    a: np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ]).reshape((1, 3, 3, 1)),
})

array([[[[5.],
         [6.]],

        [[8.],
         [9.]]]], dtype=float32)

In [0]:
sess.run(pool_same, feed_dict={
    a: np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ]).reshape((1, 3, 3, 1)),
})

array([[[[5.],
         [6.],
         [6.]],

        [[8.],
         [9.],
         [9.]],

        [[8.],
         [9.],
         [9.]]]], dtype=float32)

In [0]:
import math

In [0]:
def max_pooling(X, kernel_size=(2, 2), padding='same', strides=(2, 2)):
    height, width = X.shape[:2]
    if padding == 'same':
        out_height = math.ceil(height / strides[0])
        out_width = math.ceil(width / strides[1])
    else:
        out_height = (height - kernel_size[0] + 1) // strides[0]
        out_width = (width - kernel_size[1] + 1) // strides[1]
    
    result = np.zeros((out_height, out_width), dtype=X.dtype)
    
    for y in range(out_height):
        for x in range(out_width):
            start_y = y * strides[0]
            start_x = x * strides[1]
            
            result[y, x] = np.max(
                X[
                    start_y:start_y + kernel_size[0],
                    start_x:start_x + kernel_size[1]
                ]
            )
    return result

In [0]:
max_pooling(
    X=np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ])
)

array([[5, 6],
       [8, 9]])

In [0]:
max_pooling(
    X=np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ]),
    padding='valid'
)

array([[5]])

In [0]:
max_pooling(
    X=np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ]),
    padding='valid',
    strides=(1, 1)
)

array([[5, 6],
       [8, 9]])

In [0]:
max_pooling(
    X=np.array([
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ]),
    padding='same',
    strides=(1, 1)
)

array([[5, 6, 6],
       [8, 9, 9],
       [8, 9, 9]])

## Базовые блоки

In [0]:
import tensorflow as tf
import numpy as np

sess = tf.InteractiveSession()

def conv_layer(
        input_tensor,
        output_channels,
        name='conv',
        kernel_size=(3, 3),
        strides=(1, 1),
        padding='SAME'
    ):
    with tf.variable_scope(name, reuse=tf.AUTO_REUSE):
        input_shape = input_tensor.get_shape().as_list()
        
        input_channels = input_shape[-1]
        
        print(input_channels, output_channels)
        
        weights = tf.get_variable(name='weights', shape=[
            kernel_size[0], kernel_size[1], input_channels, output_channels
        ])
        
        bias = tf.get_variable(
            name='bias',
            shape=[output_channels],
            initializer=tf.zeros_initializer()
        )
        
        conv = tf.nn.conv2d(
            input=input_tensor,
            filter=weights,
            strides=strides,
            padding=padding,
            name='conv'
        )
        
        output = tf.nn.bias_add(conv, bias, name='output')
    return output

In [3]:
a = tf.placeholder(tf.float32, (1, 3, 3, 1))
b = conv_layer(a, 3)
example = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]).reshape((1, 3, 3, 1))
sess.run(tf.global_variables_initializer())
sess.run(b, feed_dict={
    a: example
})

1 3
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


array([[[[-0.76100045, -2.3675907 ,  1.2371047 ],
         [-0.49668634, -1.8237138 ,  0.9400004 ],
         [-1.4553041 , -0.72431755,  1.2220502 ]],

        [[-1.4469042 , -5.3037224 ,  2.486132  ],
         [-0.28262454, -3.467764  ,  2.2337024 ],
         [-1.0618773 , -2.0336287 ,  3.0898426 ]],

        [[ 0.5349655 , -5.1479397 ,  1.8384641 ],
         [ 2.3510103 , -4.6678867 ,  4.5698705 ],
         [ 2.1267679 , -2.318857  ,  4.3377833 ]]]], dtype=float32)

In [0]:
def max_pool(
    input_tensor,
    kernel_size=(2, 2),
    strides=(2, 2),
    padding='SAME',
    name='pool'
):
    with tf.variable_scope(name, reuse=tf.AUTO_REUSE):
        output = tf.nn.max_pool2d(input_tensor, ksize=kernel_size, strides=strides, padding=padding, name='pool')
    return output

In [0]:
def avg_pool(
    input_tensor,
    kernel_size=(2, 2),
    strides=(2, 2),
    padding='SAME',
    name='pool'
):
    with tf.variable_scope(name, reuse=tf.AUTO_REUSE):
        output = tf.nn.avg_pool2d(input_tensor, ksize=kernel_size, strides=strides, padding=padding, name='pool')
    return output

In [6]:
a = tf.placeholder(tf.float32, (1, 3, 3, 1))
b = max_pool(a)
example = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]).reshape((1, 3, 3, 1))
sess.run(tf.global_variables_initializer())
sess.run(b, feed_dict={
    a: example
})

array([[[[5.],
         [6.]],

        [[8.],
         [9.]]]], dtype=float32)

In [0]:
def flatten(
    input_tensor,
    name='flatten'
):
    with tf.variable_scope(name, reuse=tf.AUTO_REUSE):
        shape = input_tensor.get_shape().as_list()[1:]
        num_elements = np.prod(shape)
        return tf.reshape(input_tensor, [-1, num_elements], name='reshape')

In [8]:
a = tf.zeros([1, 3, 3, 1])
b = flatten(a)
sess.run(b)

array([[0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)

In [0]:
def dense(
    input_tensor,
    output_neurons,
    name='fc'
):
    with tf.variable_scope(name, reuse=tf.AUTO_REUSE):
        input_neurons = input_tensor.get_shape().as_list()[1]
        
        weights = tf.get_variable(
            name='weights',
            shape=[input_neurons, output_neurons]
        )
        
        bias = tf.get_variable(
            name='bias',
            shape=[output_neurons]
        )
        
        product = tf.matmul(input_tensor, weights, name='product')
        
        output = tf.nn.bias_add(product, bias, name='output')
    return output

In [0]:
a = tf.zeros([1, 9])
b = dense(a, 18, name='fc')

In [11]:
sess.run(tf.global_variables_initializer())
sess.run(b)

array([[-0.07266536,  0.04110315, -0.06682327, -0.08272639,  0.18806207,
         0.033252  ,  0.4061237 ,  0.15455014,  0.35857058, -0.31745896,
         0.09687263, -0.15119013,  0.16809756, -0.25666726,  0.05608675,
         0.25709748,  0.07112962, -0.39342153]], dtype=float32)

##Архитектуры сетей

### LeNet

In [0]:
def conv_block(
    x,
    output_channels,
    name,
    strides=(1, 1),
    kernel_size=(3, 3),
    padding='SAME'
):
    with tf.variable_scope(name, reuse=tf.AUTO_REUSE):
        conv_out = conv_layer(x, output_channels,
            kernel_size=kernel_size,
            padding=padding,
            strides=strides
        )
        activation = tf.nn.relu(conv_out, name='relu')
    return activation

In [13]:
a = tf.zeros((1, 3, 3, 1))
b = conv_block(a, 6, 'conv1')

1 6


In [14]:
sess.run(tf.global_variables_initializer())
sess.run(b).shape

(1, 3, 3, 6)

In [0]:
def le_net(input_tensor):
    with tf.variable_scope('le_net', reuse=tf.AUTO_REUSE):
        conv1_out = conv_block(
            input_tensor, 6,
            name='conv1',
            kernel_size=(5, 5),
            padding='VALID'
        )
        pool1_out = max_pool(conv1_out, name='pool1')
        conv2_out = conv_block(
            pool1_out, 16,
            name='conv2',
            kernel_size=(5, 5),
            padding='VALID'
        )
        
        pool2_out = max_pool(conv2_out, name='pool2')
        
        flatten_out = flatten(pool2_out)
        
        fc1_out = dense(flatten_out, 120, name='fc1')
        fc2_out = dense(fc1_out, 84, name='fc2')
        
        output = dense(fc2_out, 10, name='fc3')
    return output
        

In [16]:
digits_placeholder = tf.placeholder(tf.float32, [None, 32, 32, 3])
logits = le_net(digits_placeholder)

3 6
6 16


In [0]:
labels_placeholder = tf.placeholder(tf.float32, [None, 10], name='le_net_labels')

In [18]:
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
    labels=labels_placeholder,
    logits=logits
)
)

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


In [0]:
le_net_trainable_variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='le_net')
optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss, var_list=le_net_trainable_variables)


In [0]:
le_net_predictions = tf.argmax(logits, axis=1)
le_net_target = tf.argmax(labels_placeholder, axis=1)

In [0]:
def define_metrics(scope_name, variables: dict):
    with tf.name_scope(f'{scope_name}train/'):
        accuracy_train, accuracy_train_op = tf.metrics.accuracy(
            labels=variables['target'],
            predictions=variables['predictions']
        )
        loss_train, loss_train_op = tf.metrics.mean(
            values=variables['loss'],
            name='loss'
        )
    with tf.name_scope(f'{scope_name}val/'):
        accuracy_val, accuracy_val_op = tf.metrics.accuracy(
            labels=variables['target'],
            predictions=variables['predictions']
        )
        loss_val, loss_val_op = tf.metrics.mean(
            values=variables['loss'],
            name='loss'
        )
    
    return {
        'train_acc': accuracy_train,
        'train_update_acc': accuracy_train_op,
        'val_acc': accuracy_val,
        'val_update_acc': accuracy_val_op,
        'train_loss': loss_train,
        'val_loss': loss_val,
        'train_update_loss': loss_train_op,
        'val_update_loss': loss_val_op
    }

In [0]:
le_net_metrics = define_metrics(
    'le_net/metrics/',
    variables={
        'target': le_net_target,
        'predictions': le_net_predictions,
        'loss': loss
    }
)

In [23]:
tf.local_variables()

[<tf.Variable 'le_net/metrics/train/accuracy/total:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net/metrics/train/accuracy/count:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net/metrics/train/loss/total:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net/metrics/train/loss/count:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net/metrics/val/accuracy/total:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net/metrics/val/accuracy/count:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net/metrics/val/loss/total:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net/metrics/val/loss/count:0' shape=() dtype=float32_ref>]

In [24]:
from keras.datasets import cifar10
from keras.utils import to_categorical

(X_train, y_train), (X_test, y_test) = cifar10.load_data()
y_train_labels = to_categorical(y_train)
y_test_labels = to_categorical(y_test)

Using TensorFlow backend.


Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


In [0]:
def reset_metrics(scope):
    stream_variables = [v for v in tf.local_variables() if scope in v.name]
    sess.run(tf.variables_initializer(stream_variables))

In [26]:
# Check that data is ready

sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
sess.run([loss, le_net_metrics['train_acc']], feed_dict={
    digits_placeholder: X_train[:10],
    labels_placeholder: y_train_labels[:10]
})

[32.925728, 0.0]

In [27]:
le_net_trainable_variables

[<tf.Variable 'le_net/conv1/conv/weights:0' shape=(5, 5, 3, 6) dtype=float32_ref>,
 <tf.Variable 'le_net/conv1/conv/bias:0' shape=(6,) dtype=float32_ref>,
 <tf.Variable 'le_net/conv2/conv/weights:0' shape=(5, 5, 6, 16) dtype=float32_ref>,
 <tf.Variable 'le_net/conv2/conv/bias:0' shape=(16,) dtype=float32_ref>,
 <tf.Variable 'le_net/fc1/weights:0' shape=(400, 120) dtype=float32_ref>,
 <tf.Variable 'le_net/fc1/bias:0' shape=(120,) dtype=float32_ref>,
 <tf.Variable 'le_net/fc2/weights:0' shape=(120, 84) dtype=float32_ref>,
 <tf.Variable 'le_net/fc2/bias:0' shape=(84,) dtype=float32_ref>,
 <tf.Variable 'le_net/fc3/weights:0' shape=(84, 10) dtype=float32_ref>,
 <tf.Variable 'le_net/fc3/bias:0' shape=(10,) dtype=float32_ref>]

In [0]:
def iterate_batches(X, y, batch_size, shuffle=True):
    assert len(X) == len(y)
    
    indices = np.arange(len(X))
    if shuffle:
        np.random.shuffle(indices)
    
    for index in range(0, len(X), batch_size):
        yield X[index:index + batch_size], y[index:index + batch_size]

In [29]:
for epoch_num in range(50):
    reset_metrics('le_net/metrics/train')
    reset_metrics('le_net/metrics/val')
    for X_batch, y_batch in iterate_batches(X_train, y_train_labels, 500):
        _, _, _ = sess.run([
            optimizer,
            le_net_metrics['train_update_loss'], le_net_metrics['train_update_acc']
        ], feed_dict={
            digits_placeholder: X_batch,
            labels_placeholder: y_batch
        })
        # print(loss_value, accuracy)
    
    print(f'Epoch {epoch_num + 1} train [acc, loss]:', sess.run([
        le_net_metrics['train_acc'],
        le_net_metrics['train_loss']
    ]))
    
    for X_batch, y_batch in iterate_batches(X_test, y_test_labels, 500, shuffle=False):
        _, _ = sess.run([
            le_net_metrics['val_update_loss'],
            le_net_metrics['val_update_acc']
        ], feed_dict = {
            digits_placeholder: X_batch,
            labels_placeholder: y_batch
        })
    print(
        f'Epoch {epoch_num + 1} val [acc, loss]:',
        sess.run([
            le_net_metrics['val_acc'],
            le_net_metrics['val_loss']
        ])
    )

Epoch 1 train [acc, loss]: [0.14222, 1.957949]
Epoch 1 val [acc, loss]: [0.166, 0.43585435]
Epoch 2 train [acc, loss]: [0.18926, 0.39143124]
Epoch 2 val [acc, loss]: [0.1932, 0.35415822]
Epoch 3 train [acc, loss]: [0.2346, 0.32454842]
Epoch 3 val [acc, loss]: [0.2621, 0.30565962]
Epoch 4 train [acc, loss]: [0.27802, 0.2972584]
Epoch 4 val [acc, loss]: [0.2879, 0.29335552]
Epoch 5 train [acc, loss]: [0.30456, 0.28565913]
Epoch 5 val [acc, loss]: [0.3146, 0.28203708]
Epoch 6 train [acc, loss]: [0.32786, 0.2790289]
Epoch 6 val [acc, loss]: [0.3349, 0.27817523]
Epoch 7 train [acc, loss]: [0.34214, 0.2748662]
Epoch 7 val [acc, loss]: [0.3436, 0.27464357]
Epoch 8 train [acc, loss]: [0.355, 0.27109167]
Epoch 8 val [acc, loss]: [0.3559, 0.2714793]
Epoch 9 train [acc, loss]: [0.36774, 0.2676653]
Epoch 9 val [acc, loss]: [0.3641, 0.26994246]
Epoch 10 train [acc, loss]: [0.37606, 0.2645841]
Epoch 10 val [acc, loss]: [0.3761, 0.2671972]
Epoch 11 train [acc, loss]: [0.3839, 0.26192906]
Epoch 11 val

### Batch Norm

Идея была высказана в 2014 году. Говорится,  что из-за смещенности градиентов нарушаются общие правила нормальности, применимые после входного слоя. Поэтому предлагается производить смещение в новый масштаб.

Иными словами,

$$
    out = \gamma \cdot \frac{x - \mathrm{E}x}{\sqrt{\mathrm{D}x + \varepsilon}} + \beta,
$$

где $\gamma$ и $\beta$ являются обучаемыми параметрами. 


**Вопрос** Как вычислять значение $\mathrm{E}x$, $\mathrm{D}x$?

**Ответ** Во время обучения: вычислять по batch-у, во время валидации - вычислять скользящее среднее по $\mathrm{E}$ и $\mathrm{D}$.

In [0]:
def batch_norm(input_tensor, is_training, momentum=0.99, name='batch_norm'):
    with tf.variable_scope(name, reuse=tf.AUTO_REUSE):
        shapes = input_tensor.get_shape().as_list()
        
        gamma = tf.get_variable('gamma', shape=shapes[1:], initializer=tf.ones_initializer())
        beta = tf.get_variable('beta', shape=shapes[1:], initializer=tf.zeros_initializer())
        
        moving_mean = tf.get_variable('moving_mean', shape=shapes[1:], initializer=tf.zeros_initializer())
        moving_var = tf.get_variable('moving_var', shape=shapes[1:], initializer=tf.ones_initializer())
        
        
        updates = []
        
        def training_fn():
            current_mean, current_var = tf.nn.moments(input_tensor, axes=0)
            x_norm = (input_tensor - current_mean) / tf.sqrt(current_var + 1e-3)
            updates.append(
                tf.assign(moving_mean, moving_mean * momentum + current_mean * (1 - momentum))
            )
            updates.append(
                tf.assign(moving_var, moving_var * momentum + current_var * (1 - momentum))
            )
            
            return x_norm, updates
        
        def test_fn():
            x_norm = (input_tensor - moving_mean) / tf.sqrt(moving_var + 1e-3)
            return x_norm, updates
        
        
        x_norm, updates = tf.cond(
            is_training,
            true_fn=training_fn,
            false_fn=test_fn
        )
        
        return gamma * x_norm + beta, updates
            

In [0]:
def conv_block_with_bn(
    x,
    output_channels,
    is_training,
    name,
    kernel_size=(3, 3),
    strides=(1, 1),
    padding='SAME'
):
    with tf.variable_scope(name, reuse=tf.AUTO_REUSE):
        conv_out = conv_layer(
            x, output_channels,
            kernel_size=kernel_size,
            padding=padding,
            strides=strides
        )
        bn_out, updates = batch_norm(conv_out, is_training)
        activation = tf.nn.relu(conv_out, name='relu')
    return activation, updates

In [0]:
def le_net_with_bn(input_tensor, is_training):
    with tf.variable_scope('le_net_with_bn', reuse=tf.AUTO_REUSE):
        conv1_out, updates_conv1 = conv_block_with_bn(
            input_tensor, 6,
            is_training=is_training,
            name='conv1',
            kernel_size=(5, 5),
            padding='VALID'
        )
        pool1_out = max_pool(conv1_out, name='pool1')
        conv2_out, updates_conv2 = conv_block_with_bn(
            pool1_out, 16,
            is_training=is_training,
            name='conv2',
            kernel_size=(5, 5),
            padding='VALID'
        )
        
        pool2_out = max_pool(conv2_out, name='pool2')
        
        flatten_out = flatten(pool2_out)
        
        fc1_out = dense(flatten_out, 120, name='fc1')
        fc2_out = dense(fc1_out, 84, name='fc2')
        
        output = dense(fc2_out, 10, name='fc3')
    return output, updates_conv1 + updates_conv2
        

In [35]:
digits_placeholder = tf.placeholder(tf.float32, [None, 32, 32, 3], name='digits_bn')
le_net_is_training = tf.placeholder(tf.bool, [])

logits, le_net_bn_updates = le_net_with_bn(digits_placeholder, is_training=le_net_is_training)
labels_placeholder = tf.placeholder(tf.float32, [None, 10], name='le_net_labels_bn')


loss = tf.reduce_mean(
    tf.nn.sigmoid_cross_entropy_with_logits(
        labels=labels_placeholder,
        logits=logits
    ),
    name='le_net_loss_bn'
)

3 6
6 16


In [36]:
le_net_bn_updates

[<tf.Tensor 'le_net_with_bn/conv1/batch_norm/cond/Merge_1:0' shape=(28, 28, 6) dtype=float32_ref>,
 <tf.Tensor 'le_net_with_bn/conv1/batch_norm/cond/Merge_2:0' shape=(28, 28, 6) dtype=float32_ref>,
 <tf.Tensor 'le_net_with_bn/conv2/batch_norm/cond/Merge_1:0' shape=(10, 10, 16) dtype=float32_ref>,
 <tf.Tensor 'le_net_with_bn/conv2/batch_norm/cond/Merge_2:0' shape=(10, 10, 16) dtype=float32_ref>]

In [0]:
le_net_predictions = tf.argmax(logits, axis=1)
le_net_target = tf.argmax(labels_placeholder, axis=1)

In [0]:
le_net_trainable_variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='le_net_with_bn')
optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss, var_list=le_net_trainable_variables)


In [39]:
le_net_trainable_variables

[<tf.Variable 'le_net_with_bn/conv1/conv/weights:0' shape=(5, 5, 3, 6) dtype=float32_ref>,
 <tf.Variable 'le_net_with_bn/conv1/conv/bias:0' shape=(6,) dtype=float32_ref>,
 <tf.Variable 'le_net_with_bn/conv1/batch_norm/gamma:0' shape=(28, 28, 6) dtype=float32_ref>,
 <tf.Variable 'le_net_with_bn/conv1/batch_norm/beta:0' shape=(28, 28, 6) dtype=float32_ref>,
 <tf.Variable 'le_net_with_bn/conv1/batch_norm/moving_mean:0' shape=(28, 28, 6) dtype=float32_ref>,
 <tf.Variable 'le_net_with_bn/conv1/batch_norm/moving_var:0' shape=(28, 28, 6) dtype=float32_ref>,
 <tf.Variable 'le_net_with_bn/conv2/conv/weights:0' shape=(5, 5, 6, 16) dtype=float32_ref>,
 <tf.Variable 'le_net_with_bn/conv2/conv/bias:0' shape=(16,) dtype=float32_ref>,
 <tf.Variable 'le_net_with_bn/conv2/batch_norm/gamma:0' shape=(10, 10, 16) dtype=float32_ref>,
 <tf.Variable 'le_net_with_bn/conv2/batch_norm/beta:0' shape=(10, 10, 16) dtype=float32_ref>,
 <tf.Variable 'le_net_with_bn/conv2/batch_norm/moving_mean:0' shape=(10, 10, 16) 

In [0]:
le_net_bn_metrics = define_metrics(
    'le_net_with_bn/metrics/',
    variables={
        'target': le_net_target,
        'predictions': le_net_predictions,
        'loss': loss
    }
)

In [41]:
le_net_bn_metrics

{'train_acc': <tf.Tensor 'le_net_with_bn/metrics/train/accuracy/value:0' shape=() dtype=float32>,
 'train_loss': <tf.Tensor 'le_net_with_bn/metrics/train/loss/value:0' shape=() dtype=float32>,
 'train_update_acc': <tf.Tensor 'le_net_with_bn/metrics/train/accuracy/update_op:0' shape=() dtype=float32>,
 'train_update_loss': <tf.Tensor 'le_net_with_bn/metrics/train/loss/update_op:0' shape=() dtype=float32>,
 'val_acc': <tf.Tensor 'le_net_with_bn/metrics/val/accuracy/value:0' shape=() dtype=float32>,
 'val_loss': <tf.Tensor 'le_net_with_bn/metrics/val/loss/value:0' shape=() dtype=float32>,
 'val_update_acc': <tf.Tensor 'le_net_with_bn/metrics/val/accuracy/update_op:0' shape=() dtype=float32>,
 'val_update_loss': <tf.Tensor 'le_net_with_bn/metrics/val/loss/update_op:0' shape=() dtype=float32>}

In [42]:
# Check that data is ready

sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
sess.run([loss, le_net_bn_metrics['train_acc']], feed_dict={
    digits_placeholder: X_train[:10],
    labels_placeholder: y_train_labels[:10]
})

[22.364988, 0.0]

In [43]:
sess.run([le_net_bn_updates], feed_dict={
    digits_placeholder: X_train[:10],
    labels_placeholder: y_train_labels[:10],
    le_net_is_training: True
})

[[array([[[-0.16407284,  1.0582566 ,  0.49639064, -0.57666016,
           -1.404595  ,  0.15376146],
          [-0.14000908,  1.0887748 ,  0.42709774, -0.48115355,
           -1.431537  ,  0.19883224],
          [-0.1307742 ,  1.0461471 ,  0.35974836, -0.502143  ,
           -1.5157933 ,  0.25849476],
          ...,
          [-0.10115059,  1.1696378 ,  0.4968646 , -0.47575235,
           -1.6587679 ,  0.22849438],
          [-0.13130185,  1.1651713 ,  0.4736112 , -0.5471004 ,
           -1.6478977 ,  0.21327946],
          [-0.11243341,  1.189932  ,  0.44842914, -0.5278385 ,
           -1.6744676 ,  0.28889567]],
  
         [[-0.15556505,  1.0896169 ,  0.5028192 , -0.50595677,
           -1.4080948 ,  0.11055938],
          [-0.19068243,  1.1225442 ,  0.44356486, -0.49531278,
           -1.4199865 ,  0.13103385],
          [-0.1498605 ,  1.0961143 ,  0.41902775, -0.5402847 ,
           -1.4916962 ,  0.22528479],
          ...,
          [-0.08607972,  1.1639233 ,  0.40153524, -0.4513

In [44]:
tf.local_variables()

[<tf.Variable 'le_net/metrics/train/accuracy/total:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net/metrics/train/accuracy/count:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net/metrics/train/loss/total:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net/metrics/train/loss/count:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net/metrics/val/accuracy/total:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net/metrics/val/accuracy/count:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net/metrics/val/loss/total:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net/metrics/val/loss/count:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net_with_bn/metrics/train/accuracy/total:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net_with_bn/metrics/train/accuracy/count:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net_with_bn/metrics/train/loss/total:0' shape=() dtype=float32_ref>,
 <tf.Variable 'le_net_with_bn/metrics/train/loss/count:0' shape=() dtype=float32_ref>,
 <tf

In [0]:
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())

In [46]:
for epoch_num in range(50):
    reset_metrics('le_net_with_bn/metrics/train')
    reset_metrics('le_net_with_bn/metrics/val')
    for X_batch, y_batch in iterate_batches(X_train, y_train_labels, 500):
        _, _, _, _ = sess.run([
            optimizer,
            le_net_bn_metrics['train_update_acc'],
            le_net_bn_metrics['train_update_loss'],
            le_net_bn_updates
        ], feed_dict={
            digits_placeholder: X_batch,
            labels_placeholder: y_batch,
            le_net_is_training: True
        })
        # print(loss_value, accuracy)
    
    print(f'Epoch {epoch_num + 1} train [acc, loss]:', sess.run([
        le_net_bn_metrics['train_acc'],
        le_net_bn_metrics['train_loss']
    ]))
    
    for X_batch, y_batch in iterate_batches(X_test, y_test_labels, 500, shuffle=False):
        _, _ = sess.run([
            le_net_bn_metrics['val_update_acc'],
            le_net_bn_metrics['val_update_loss']
        ], feed_dict = {
            digits_placeholder: X_batch,
            labels_placeholder: y_batch,
            le_net_is_training: False
        })
    print(
        f'Epoch {epoch_num + 1} val [acc, loss]:',
        sess.run([
            le_net_bn_metrics['val_acc'], 
            le_net_bn_metrics['val_loss']
        ])
    )

Epoch 1 train [acc, loss]: [0.12042, 1.5332496]
Epoch 1 val [acc, loss]: [0.1422, 0.3861373]
Epoch 2 train [acc, loss]: [0.15902, 0.343173]
Epoch 2 val [acc, loss]: [0.1616, 0.33117783]
Epoch 3 train [acc, loss]: [0.17058, 0.32882544]
Epoch 3 val [acc, loss]: [0.1645, 0.32951722]
Epoch 4 train [acc, loss]: [0.17964, 0.32515132]
Epoch 4 val [acc, loss]: [0.184, 0.32179418]
Epoch 5 train [acc, loss]: [0.18896, 0.32161912]
Epoch 5 val [acc, loss]: [0.1825, 0.32318428]
Epoch 6 train [acc, loss]: [0.1944, 0.3194045]
Epoch 6 val [acc, loss]: [0.1919, 0.3188106]
Epoch 7 train [acc, loss]: [0.20412, 0.31675336]
Epoch 7 val [acc, loss]: [0.2055, 0.31677675]
Epoch 8 train [acc, loss]: [0.2202, 0.31325755]
Epoch 8 val [acc, loss]: [0.2145, 0.317038]
Epoch 9 train [acc, loss]: [0.25834, 0.30268973]
Epoch 9 val [acc, loss]: [0.2886, 0.30004504]
Epoch 10 train [acc, loss]: [0.28802, 0.2937021]
Epoch 10 val [acc, loss]: [0.2981, 0.29480597]
Epoch 11 train [acc, loss]: [0.3013, 0.28903592]
Epoch 11 va

### AlexNet

In [0]:
def dropout(input_tensor, rate, name='dropout'):
    return tf.nn.dropout(input_tensor, rate=rate, name=name)

In [0]:
tf.reset_default_graph()

In [35]:
sess = tf.InteractiveSession()



In [0]:
def alex_net(input_tensor, dropout_rate, num_classes):
    with tf.variable_scope('alex_net', reuse=tf.AUTO_REUSE):
        conv1 = conv_block(
            input_tensor,
            output_channels=96,
            name='conv1',
            kernel_size=(11, 11),
            strides=(4, 4),
            padding='VALID'
        )
        pool1 = max_pool(
            conv1,
            kernel_size=(3, 3),
            strides=(2, 2),
            padding='VALID',
            name='pool1'
        )
        conv2 = conv_block(
            pool1,
            name='conv2',
            kernel_size=(5, 5),
            output_channels=128,
            strides=(1, 1),
            padding='SAME'
        )
        pool2 = max_pool(
            conv2,
            kernel_size=(3, 3),
            strides=(2, 2),
            padding='VALID',
            name='pool2'
        )
        conv3 = conv_block(
            pool2,
            name='conv3',
            kernel_size=(3, 3),
            output_channels=128,
            strides=(1, 1),
            padding='SAME'
        )
        conv4 = conv_block(
            conv3,
            name='conv4',
            kernel_size=(3, 3),
            output_channels=128,
            strides=(1, 1),
            padding='SAME'
        )
        conv5 = conv_block(
            conv4,
            name='conv5',
            kernel_size=(3, 3),
            output_channels=128,
            strides=(1, 1),
            padding='SAME'
        )
        pool5 = max_pool(
            conv5,
            kernel_size=(3, 3),
            strides=(2, 2),
            padding='VALID',
            name='pool5'
        )
        
        flattened = flatten(pool5)
        
        dropout1 = dropout(flattened, dropout_rate, name='dropout1')
        
        dense1 = dense(dropout1, 4096, name='fc1')
        
        dropout2 = dropout(dense1, dropout_rate, name='dropout2')
        dense2 = dense(dropout2, 4096, name='fc2')
        
        logits = dense(dense2, num_classes, name='logits')
    return logits

In [37]:
alex_net_input = tf.placeholder(tf.float32, [None, 224, 224, 3])
alex_net_dropout_rate = tf.placeholder(tf.float32, [])
alex_net_logits = alex_net(alex_net_input, alex_net_dropout_rate, num_classes=5)

3 96
96 128
128 128
128 128
128 128


In [38]:
%%time
sess.run(tf.global_variables_initializer())
sess.run(alex_net_logits, feed_dict={
    alex_net_input: np.zeros((1, 224, 224, 3)),
    alex_net_dropout_rate: 0.5
}).shape

CPU times: user 52.3 ms, sys: 9.78 ms, total: 62 ms
Wall time: 58 ms


In [39]:
alex_net_placeholder = tf.placeholder(tf.float32, [None, 224, 224, 3], name='alex_net_placeholder')
alex_net_is_training = tf.placeholder(tf.bool, [])
alex_net_dropout_rate = tf.placeholder(tf.float32, [])

alex_net_logits = alex_net(
    alex_net_placeholder,
    dropout_rate=alex_net_dropout_rate,
    num_classes=5
)

alex_net_labels_placeholder = tf.placeholder(tf.float32, [None, 5], name='alex_net_labels')


alex_net_loss = tf.reduce_mean(
    tf.nn.sigmoid_cross_entropy_with_logits(
        labels=alex_net_labels_placeholder,
        logits=alex_net_logits
    ),
    name='alex_net_loss'
)

3 96
96 128
128 128
128 128
128 128


In [0]:
alex_net_predictions = tf.argmax(alex_net_logits, axis=1)
alex_net_target = tf.argmax(alex_net_labels_placeholder, axis=1)

In [0]:
alex_net_metrics = define_metrics(
    'alex_net/metrics/',
    variables={
        'target': alex_net_target,
        'predictions': alex_net_predictions,
        'loss': alex_net_loss
    }
)

In [0]:
alex_net_trainable_variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='alex_net')
optimizer = tf.train.AdamOptimizer(
    learning_rate=0.001
).minimize(
    alex_net_loss,
    var_list=alex_net_trainable_variables
)



In [43]:
alex_net_trainable_variables

[<tf.Variable 'alex_net/conv1/conv/weights:0' shape=(11, 11, 3, 96) dtype=float32_ref>,
 <tf.Variable 'alex_net/conv1/conv/bias:0' shape=(96,) dtype=float32_ref>,
 <tf.Variable 'alex_net/conv2/conv/weights:0' shape=(5, 5, 96, 128) dtype=float32_ref>,
 <tf.Variable 'alex_net/conv2/conv/bias:0' shape=(128,) dtype=float32_ref>,
 <tf.Variable 'alex_net/conv3/conv/weights:0' shape=(3, 3, 128, 128) dtype=float32_ref>,
 <tf.Variable 'alex_net/conv3/conv/bias:0' shape=(128,) dtype=float32_ref>,
 <tf.Variable 'alex_net/conv4/conv/weights:0' shape=(3, 3, 128, 128) dtype=float32_ref>,
 <tf.Variable 'alex_net/conv4/conv/bias:0' shape=(128,) dtype=float32_ref>,
 <tf.Variable 'alex_net/conv5/conv/weights:0' shape=(3, 3, 128, 128) dtype=float32_ref>,
 <tf.Variable 'alex_net/conv5/conv/bias:0' shape=(128,) dtype=float32_ref>,
 <tf.Variable 'alex_net/fc1/weights:0' shape=(3200, 4096) dtype=float32_ref>,
 <tf.Variable 'alex_net/fc1/bias:0' shape=(4096,) dtype=float32_ref>,
 <tf.Variable 'alex_net/fc2/we

### Загрузка данных

In [0]:
from google.colab import drive

In [45]:
drive.mount('/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /drive


In [0]:
DATASET_FOLDER = "/drive/My Drive/Datasets/flowers_resized"

In [0]:
import os

CLASS_INDICES = {
    class_name: index for index, class_name in enumerate(os.listdir(DATASET_FOLDER))
}

In [48]:
CLASS_INDICES

{'daisy': 2, 'dandelion': 0, 'rose': 4, 'sunflower': 1, 'tulip': 3}

In [0]:
def get_flower_paths():
    paths = []
    indices = []
    
    for class_name in sorted(os.listdir(DATASET_FOLDER)):
        class_folder = os.path.join(DATASET_FOLDER, class_name)
        
        for filename in sorted(os.listdir(class_folder)):
            if not filename.endswith('jpg'):
                continue
            path = os.path.join(class_folder, filename)
            indices.append(CLASS_INDICES[class_name])
            paths.append(path)
    
    return paths, indices

In [0]:
flower_paths, flower_indices = get_flower_paths()

In [0]:
from keras.utils import to_categorical
from sklearn.model_selection import train_test_split

In [0]:
flower_y = to_categorical(flower_indices)

In [53]:
flower_y.shape

(4323, 5)

In [0]:
flower_paths_train, flower_paths_val, flower_y_train, flower_y_val = train_test_split(
    flower_paths,
    flower_y,
    random_state=42,
    test_size=0.2
)

In [0]:
flower_paths_train = list(filter(lambda x: os.path.exists(x), flower_paths_train))

In [0]:
flower_paths_val = list(filter(lambda x: os.path.exists(x), flower_paths_val))

In [57]:
len(flower_paths_train)

3458

In [0]:
from tqdm import tqdm

In [0]:
from joblib import Parallel, delayed

In [0]:
import cv2
%matplotlib inline
import matplotlib.pyplot as plt

def process_image(filename):
    img = cv2.imread(filename)
    return img / 127.5 - 1.0

In [61]:
flowers_x_train = Parallel(n_jobs=8, verbose=10)(delayed(process_image)(path) for path in flower_paths_train)
flowers_x_train = np.concatenate([np.expand_dims(img, 0) for img in flowers_x_train])

[Parallel(n_jobs=8)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done   2 tasks      | elapsed:    1.3s
[Parallel(n_jobs=8)]: Done   9 tasks      | elapsed:    1.6s
[Parallel(n_jobs=8)]: Done  16 tasks      | elapsed:    1.8s
[Parallel(n_jobs=8)]: Done  25 tasks      | elapsed:    2.1s
[Parallel(n_jobs=8)]: Done  34 tasks      | elapsed:    2.4s
[Parallel(n_jobs=8)]: Done  45 tasks      | elapsed:    2.8s
[Parallel(n_jobs=8)]: Done  56 tasks      | elapsed:    3.1s
[Parallel(n_jobs=8)]: Done  69 tasks      | elapsed:    3.5s
[Parallel(n_jobs=8)]: Done  82 tasks      | elapsed:    3.9s
[Parallel(n_jobs=8)]: Done  97 tasks      | elapsed:    4.4s
[Parallel(n_jobs=8)]: Done 112 tasks      | elapsed:    4.9s
[Parallel(n_jobs=8)]: Done 129 tasks      | elapsed:    5.5s
[Parallel(n_jobs=8)]: Done 146 tasks      | elapsed:    6.0s
[Parallel(n_jobs=8)]: Done 165 tasks      | elapsed:    6.8s
[Parallel(n_jobs=8)]: Done 184 tasks      | elapsed:    7.3s
[Parallel(

In [62]:
flowers_x_val = Parallel(n_jobs=8, verbose=10)(delayed(process_image)(path) for path in flower_paths_val)
flowers_x_val = np.concatenate([np.expand_dims(img, 0) for img in flowers_x_val])

[Parallel(n_jobs=8)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done   2 tasks      | elapsed:    0.3s
[Parallel(n_jobs=8)]: Done   9 tasks      | elapsed:    0.5s
[Parallel(n_jobs=8)]: Done  16 tasks      | elapsed:    0.7s
[Parallel(n_jobs=8)]: Done  25 tasks      | elapsed:    0.9s
[Parallel(n_jobs=8)]: Done  34 tasks      | elapsed:    1.3s
[Parallel(n_jobs=8)]: Done  45 tasks      | elapsed:    1.6s
[Parallel(n_jobs=8)]: Done  56 tasks      | elapsed:    2.0s
[Parallel(n_jobs=8)]: Done  69 tasks      | elapsed:    2.5s
[Parallel(n_jobs=8)]: Done  82 tasks      | elapsed:    2.8s
[Parallel(n_jobs=8)]: Done  97 tasks      | elapsed:    3.3s
[Parallel(n_jobs=8)]: Done 112 tasks      | elapsed:    3.8s
[Parallel(n_jobs=8)]: Done 129 tasks      | elapsed:    4.5s
[Parallel(n_jobs=8)]: Done 146 tasks      | elapsed:    5.0s
[Parallel(n_jobs=8)]: Done 165 tasks      | elapsed:    5.5s
[Parallel(n_jobs=8)]: Done 184 tasks      | elapsed:    6.3s
[Parallel(

In [0]:
def iterate_batches(X, y, batch_size, shuffle=True):
    assert len(X) == len(y)
    
    indices = np.arange(len(X))
    if shuffle:
        np.random.shuffle(indices)
    
    for index in range(0, len(X), batch_size):
        yield X[indices[index:index + batch_size]], y[indices[index:index + batch_size]]

In [0]:
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())

In [65]:
tf.local_variables()

[<tf.Variable 'alex_net/metrics/train/accuracy/total:0' shape=() dtype=float32_ref>,
 <tf.Variable 'alex_net/metrics/train/accuracy/count:0' shape=() dtype=float32_ref>,
 <tf.Variable 'alex_net/metrics/train/loss/total:0' shape=() dtype=float32_ref>,
 <tf.Variable 'alex_net/metrics/train/loss/count:0' shape=() dtype=float32_ref>,
 <tf.Variable 'alex_net/metrics/val/accuracy/total:0' shape=() dtype=float32_ref>,
 <tf.Variable 'alex_net/metrics/val/accuracy/count:0' shape=() dtype=float32_ref>,
 <tf.Variable 'alex_net/metrics/val/loss/total:0' shape=() dtype=float32_ref>,
 <tf.Variable 'alex_net/metrics/val/loss/count:0' shape=() dtype=float32_ref>]

In [98]:
for epoch_num in range(30):
    reset_metrics('alex_net/metrics/train')
    reset_metrics('alex_net/metrics/val')
    for X_batch, y_batch in iterate_batches(flowers_x_train, flower_y_train, 64):
        
        _, _, _, ll = sess.run([
            optimizer,
            alex_net_metrics['train_update_loss'],
            alex_net_metrics['train_update_acc'],
            alex_net_loss
        ], feed_dict={
            alex_net_placeholder: X_batch,
            alex_net_labels_placeholder: y_batch,
            alex_net_dropout_rate: 0.5
        })
#         print(ll)
#         print(loss_value, accuracy)
    
    print(f'Epoch {epoch_num + 1} train [acc, loss]:', sess.run([
        alex_net_metrics['train_acc'],
        alex_net_metrics['train_loss']
    ]))
    
    for X_batch, y_batch in iterate_batches(flowers_x_val, flower_y_val, 64, shuffle=False):
        _, _ = sess.run([
            alex_net_metrics['val_update_loss'],
            alex_net_metrics['val_update_acc']
        ], feed_dict = {
            alex_net_placeholder: X_batch,
            alex_net_labels_placeholder: y_batch,
            alex_net_dropout_rate: 0.0
        })
    print(
        f'Epoch {epoch_num + 1} val [acc, loss]:',
        sess.run([
            alex_net_metrics['val_acc'],
            alex_net_metrics['val_loss']
        ])
    )

Epoch 1 train [acc, loss]: [0.23163678, 0.65364194]
Epoch 1 val [acc, loss]: [0.2531792, 0.50296575]
Epoch 2 train [acc, loss]: [0.23019086, 0.50425833]
Epoch 2 val [acc, loss]: [0.28208092, 0.48282388]
Epoch 3 train [acc, loss]: [0.26980913, 0.49187708]
Epoch 3 val [acc, loss]: [0.28323698, 0.4893869]
Epoch 4 train [acc, loss]: [0.2987276, 0.49007678]
Epoch 4 val [acc, loss]: [0.30404624, 0.4807075]
Epoch 5 train [acc, loss]: [0.29728165, 0.47487703]
Epoch 5 val [acc, loss]: [0.33872834, 0.47995716]
Epoch 6 train [acc, loss]: [0.3857721, 0.45324466]
Epoch 6 val [acc, loss]: [0.39537573, 0.5010353]
Epoch 7 train [acc, loss]: [0.4537305, 0.43031052]
Epoch 7 val [acc, loss]: [0.3549133, 0.47957882]
Epoch 8 train [acc, loss]: [0.45951417, 0.42228708]
Epoch 8 val [acc, loss]: [0.50289017, 0.42467302]
Epoch 9 train [acc, loss]: [0.47310585, 0.4144768]
Epoch 9 val [acc, loss]: [0.47861272, 0.41729823]
Epoch 10 train [acc, loss]: [0.52718335, 0.38885385]
Epoch 10 val [acc, loss]: [0.5283237, 

# ResNet 

In [0]:
from typing import List


def resnet_block(
    input_tensor,
    kernel_size,
    filters: List[int],
    is_training,
    strides=(2, 2),
    name='resnet_block'
):
    with tf.variable_scope(f'{name}', reuse=tf.AUTO_REUSE):
        conv1 = conv_layer(
            input_tensor,
            filters[0],
            name=f'conv1',
            kernel_size=kernel_size,
            strides=strides,
        )
        
        bn1, bn1_updates = batch_norm(conv1, is_training, name='bn1')
        
        relu1 = tf.nn.relu(bn1, name='relu1')
        
        
        conv2 = conv_layer(
            relu1,
            filters[1],
            name='conv2',
            kernel_size=kernel_size,
            strides=(1, 1)
        )
        
        bn2, bn2_updates = batch_norm(conv2, is_training, name='bn2')
        
        
        shortcut = conv_layer(
            input_tensor,
            filters[1],
            name='shortcut',
            kernel_size=(1, 1),
            strides=strides
        )
        
        bn_shortcut, bn_shortcut_updates = batch_norm(shortcut, is_training, name='bn_shortcut')
        
        
        output = bn2 + bn_shortcut
        
        
        output_activated = tf.nn.relu(output, name='relu')
    
    return output_activated, bn1_updates + bn2_updates + bn_shortcut_updates
        

In [0]:
class BatchNormNetwork:
    
    def __init__(self):
        self.updates = []
        self.is_training = tf.placeholder(tf.bool, shape=[])
        
    def add_block(self, build_fn):
        output_tensor, output_updates = build_fn()
        
        self.updates.extend(output_updates)
        return output_tensor



In [0]:
def global_average_pooling(input_tensor, name):
    return tf.reduce_mean(input_tensor, axis=(1, 2), name=name)

In [0]:
a = tf.placeholder(tf.float32, [1, 32, 32, 8])

b = global_average_pooling(a, name='test_global_avg_pooling')

In [70]:
sess.run(b, feed_dict={a: np.zeros((1, 32, 32, 8))}).shape

(1, 8)

In [0]:
class ResNet18(BatchNormNetwork):
    
    def __init__(self, input_tensor, num_classes):
        super().__init__()
        
        self.input_tensor = input_tensor
        self.num_classes = num_classes
        
        self.build_network()
    
        
    def build_network(self):
        with tf.variable_scope('resnet18', reuse=tf.AUTO_REUSE):

            self.block1 = self.add_block(
                lambda: conv_block_with_bn(
                    self.input_tensor,
                    output_channels=64,
                    is_training=self.is_training,
                    name='conv1',
                    strides=(2, 2)
                )
            )
            
            self.resnet_block1 = self.build_resnet_block(
                self.block1,
                output_filters=64,
                name='resnet_block1'
            )
            
            self.resnet_block2 = self.build_resnet_block(
                self.resnet_block1,
                output_filters=128,
                name='resnet_block2'
            )
            
            self.resnet_block3 = self.build_resnet_block(
                self.resnet_block2,
                output_filters=256,
                name='resnet_block3'
            )
            
            self.resnet_block4 = self.build_resnet_block(
                self.resnet_block3,
                output_filters=512,
                name='resnet_block4'
            )
            
            self.avg_pool = global_average_pooling(
                self.resnet_block4,
                name='global_avg_pool'
            )
            
            self.fc = dense(
                self.avg_pool,
                self.num_classes,
                name='fc'
            )
            
    def build_resnet_block(self, input_tensor, output_filters, name):
        with tf.variable_scope(name, reuse=tf.AUTO_REUSE):
            block1 = self.add_block(
                lambda: resnet_block(
                    input_tensor,
                    filters=[output_filters, output_filters],
                    name='branch1',
                    is_training=self.is_training,
                    strides=(2,2),
                    kernel_size=(3, 3)
                )
            )
            
            block2 = self.add_block(
                lambda: resnet_block(
                    block1,
                    filters=[output_filters, output_filters],
                    name='branch2',
                    is_training=self.is_training,
                    strides=(1, 1),
                    kernel_size=(3, 3)
                )
            )
        return block2
    
    def get_trainable_variables(self):
        return tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='resnet18')
    
    def get_logits(self):
        return self.fc

In [0]:
class LossMeter:
    def __init__(self, network, inputs, labels, name):
        self.network = network
        self.inputs = inputs
        self.labels = labels
        
        self.logits = self.network.get_logits()
        self.name = name
        
        with tf.variable_scope(name, reuse=tf.AUTO_REUSE):
            self.loss = tf.reduce_mean(
                tf.nn.sigmoid_cross_entropy_with_logits(
                    labels=self.labels,
                    logits=self.logits
                )
            )
            
            self.predictions = tf.argmax(self.logits, axis=1)
            self.target = tf.argmax(self.labels, axis=1)
            
            self.metrics = define_metrics(
                f'{name}/metrics/',
                variables={
                    'target': self.target,
                    'predictions': self.predictions,
                    'loss': self.loss
                }
            )
    
            self.optimizer = tf.train.AdamOptimizer(
                learning_rate=0.0001
            ).minimize(
                self.loss,
                var_list=self.network.get_trainable_variables()
            )
        
    def reset_metrics(self):
        reset_metrics(f'{self.name}/metrics/train')
        reset_metrics(f'{self.name}/metrics/val')

In [73]:
tf.reset_default_graph()
sess = tf.InteractiveSession()



In [0]:
a = tf.placeholder_with_default(np.zeros((10, 224, 224, 3), dtype=np.float32), shape=(None, 224, 224, 3))
b = tf.placeholder_with_default(np.zeros((10, 5), dtype=np.float32), shape=(None, 5))

In [75]:
net = ResNet18(a, num_classes=5)

3 64
64 64
64 64
64 64
64 64
64 64
64 64
64 128
128 128
64 128
128 128
128 128
128 128
128 256
256 256
128 256
256 256
256 256
256 256
256 512
512 512
256 512
512 512
512 512
512 512


In [0]:
loss_meter = LossMeter(net, a, b, name='resnet18')

In [77]:
tf.local_variables()

[<tf.Variable 'resnet18/metrics/train/accuracy/total:0' shape=() dtype=float32_ref>,
 <tf.Variable 'resnet18/metrics/train/accuracy/count:0' shape=() dtype=float32_ref>,
 <tf.Variable 'resnet18/metrics/train/loss/total:0' shape=() dtype=float32_ref>,
 <tf.Variable 'resnet18/metrics/train/loss/count:0' shape=() dtype=float32_ref>,
 <tf.Variable 'resnet18/metrics/val/accuracy/total:0' shape=() dtype=float32_ref>,
 <tf.Variable 'resnet18/metrics/val/accuracy/count:0' shape=() dtype=float32_ref>,
 <tf.Variable 'resnet18/metrics/val/loss/total:0' shape=() dtype=float32_ref>,
 <tf.Variable 'resnet18/metrics/val/loss/count:0' shape=() dtype=float32_ref>]

In [78]:
!nproc

4


In [79]:
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())

for epoch_num in range(30):
    loss_meter.reset_metrics()
    for X_batch, y_batch in iterate_batches(flowers_x_train, flower_y_train, 64):
        _, _, _, _ = sess.run([
            loss_meter.optimizer,
            loss_meter.metrics['train_update_acc'],
            loss_meter.metrics['train_update_loss'],
            net.updates
        ], feed_dict={
            loss_meter.inputs: X_batch,
            loss_meter.labels: y_batch,
            net.is_training: True
        })
        
        loss_value, accuracy = sess.run([
            loss_meter.metrics['train_acc'],
            loss_meter.metrics['train_loss']
        ])
#         print(loss_value, accuracy)
    
    print(f'Epoch {epoch_num + 1} train [acc, loss]:', sess.run([
        loss_meter.metrics['train_acc'],
        loss_meter.metrics['train_loss']
    ]))
    
    for X_batch, y_batch in iterate_batches(flowers_x_val, flower_y_val, 64, shuffle=False):
        _, _ = sess.run([
            loss_meter.metrics['val_update_acc'],
            loss_meter.metrics['val_update_loss']
        ], feed_dict = {
            loss_meter.inputs: X_batch,
            loss_meter.labels: y_batch,
            net.is_training: False
        })
    print(
        f'Epoch {epoch_num + 1} val [acc, loss]:',
        sess.run([
            loss_meter.metrics['val_acc'], 
            loss_meter.metrics['val_loss']
        ])
    )

Epoch 1 train [acc, loss]: [0.47715443, 0.42476192]
Epoch 1 val [acc, loss]: [0.2786127, 0.5649623]
Epoch 2 train [acc, loss]: [0.5902256, 0.3478902]
Epoch 2 val [acc, loss]: [0.2786127, 0.86159575]
Epoch 3 train [acc, loss]: [0.6194332, 0.33121774]
Epoch 3 val [acc, loss]: [0.2786127, 0.8962528]
Epoch 4 train [acc, loss]: [0.63042223, 0.32702613]
Epoch 4 val [acc, loss]: [0.2786127, 1.0483978]
Epoch 5 train [acc, loss]: [0.6341816, 0.32974702]
Epoch 5 val [acc, loss]: [0.2797688, 0.8316002]
Epoch 6 train [acc, loss]: [0.64083284, 0.32595542]
Epoch 6 val [acc, loss]: [0.2867052, 1.064349]
Epoch 7 train [acc, loss]: [0.64285713, 0.32021117]
Epoch 7 val [acc, loss]: [0.3017341, 0.89284617]
Epoch 8 train [acc, loss]: [0.6474841, 0.320752]
Epoch 8 val [acc, loss]: [0.3479769, 0.6585222]
Epoch 9 train [acc, loss]: [0.6376518, 0.32781297]
Epoch 9 val [acc, loss]: [0.4647399, 0.4748876]
Epoch 10 train [acc, loss]: [0.65442455, 0.31295115]
Epoch 10 val [acc, loss]: [0.5653179, 0.39214614]
Epoc

KeyboardInterrupt: ignored

## Inception

Основные концепции:

* Используем вместо residual слоев набор слоев разного вида
* Используем дополнительные классификаторы с меньшей функцией потерь для быстрого прохода градиентов
* (v2+, упражнение) Даем каждому классу вероятность $\alpha$, чтобы кросс-энтропия не скатывалась в нуль 

In [0]:
from collections import namedtuple

FilterSpec = namedtuple('FilterSpec', [
    'filters_1x1',
    'filters_3x3',
    'filters_3x3_reduce',
    'filters_5x5',
    'filters_5x5_reduce',
    'filters_pool'
])


class InceptionV1Network:
    def __init__(self, input_tensor, num_classes):
        self.input_tensor = input_tensor
        
        self.num_classes = num_classes
        
        self.build_network()
        
    def build_network(self):
        with tf.variable_scope('inception_v1', reuse=tf.AUTO_REUSE):
            self.conv1 = conv_block(
                self.input_tensor,
                output_channels=64,
                name='conv1',
                kernel_size=(7, 7),
                strides=(2, 2)
            )
            
            self.pool1 = max_pool(
                self.conv1,
                kernel_size=(3, 3),
                strides=(2, 2),
                name='pool1'
            )
            
            self.conv2a = conv_block(
                self.pool1,
                output_channels=64,
                kernel_size=(3, 3),
                strides=(1, 1),
                name='conv2a'
            )
            
            self.conv2b = conv_block(
                self.conv2a,
                output_channels=192,
                kernel_size=(3, 3),
                strides=(1, 1),
                name='conv2b'
            )
            
            self.pool2 = max_pool(
                self.conv2b,
                kernel_size=(3, 3),
                strides=(2, 2),
                name='pool2'
            )
            
            self.block3a = self.inception_block(
                self.pool2,
                FilterSpec(
                    filters_1x1=64,
                    filters_3x3_reduce=96,
                    filters_3x3=128,
                    filters_5x5_reduce=16,
                    filters_5x5=32,
                    filters_pool=32,
                ),
                name='block3a'
            )
            
            self.block3b = self.inception_block(
                self.block3a,
                FilterSpec(
                    filters_1x1=128,
                    filters_3x3_reduce=128,
                    filters_3x3=192,
                    filters_5x5_reduce=32,
                    filters_5x5=96,
                    filters_pool=64
                ),
                name='block3b'
            )
            
            self.pool3 = max_pool(
                self.block3b,
                kernel_size=(3, 3),
                strides=(2, 2),
                name='pool3'
            )
            
            
            self.block4a = self.inception_block(
                self.pool3,
                FilterSpec(
                    filters_1x1=192,
                    filters_3x3_reduce=96,
                    filters_3x3=208,
                    filters_5x5_reduce=16,
                    filters_5x5=48,
                    filters_pool=64
                ),
                name='block4a'
            )
            
            self.block4b = self.inception_block(
                self.block4a,
                FilterSpec(
                    filters_1x1=160,
                    filters_3x3_reduce=112,
                    filters_3x3=224,
                    filters_5x5_reduce=24,
                    filters_5x5=64,
                    filters_pool=64
                ),
                name='block4b'
            )
            
            self.block4c = self.inception_block(
                self.block4b,
                FilterSpec(
                    filters_1x1=128,
                    filters_3x3_reduce=128,
                    filters_3x3=256,
                    filters_5x5_reduce=24,
                    filters_5x5=64,
                    filters_pool=64
                ),
                name='block4c'
            )
            
            self.block4d = self.inception_block(
                self.block4c,
                FilterSpec(
                    filters_1x1=112,
                    filters_3x3_reduce=144,
                    filters_3x3=288,
                    filters_5x5_reduce=32,
                    filters_5x5=64,
                    filters_pool=64
                ),
                name='block4d'
            )
            
            self.block4e = self.inception_block(
                self.block4d,
                FilterSpec(
                    filters_1x1=256,
                    filters_3x3_reduce=160,
                    filters_3x3=320,
                    filters_5x5_reduce=32,
                    filters_5x5=128,
                    filters_pool=128
                ),
                name='block4e'
            )
            
            self.pool4 = max_pool(
                self.block4e,
                kernel_size=(3, 3),
                strides=(2, 2),
                name='pool4'
            )
            
            self.block5a = self.inception_block(
                self.pool4,
                FilterSpec(
                    filters_1x1=256,
                    filters_3x3_reduce=160,
                    filters_3x3=320,
                    filters_5x5_reduce=32,
                    filters_5x5=128,
                    filters_pool=128
                ),
                name='block5a'
            )
            
            self.block5b = self.inception_block(
                self.block5a,
                FilterSpec(
                    filters_1x1=384,
                    filters_3x3_reduce=192,
                    filters_3x3=384,
                    filters_5x5_reduce=48,
                    filters_5x5=128,
                    filters_pool=128
                ),
                name='block5b'
            )
            
            self.fc = self.build_classifier(
                self.block5b,
                kernel_size=(7, 7),
                strides=(7, 7),
                name='classifier',
                aux=False
            )
            
            self.aux1 = self.build_classifier(
                self.block4a,
                kernel_size=(5, 5),
                strides=(3, 3),
                name='aux1',
                aux=True
            )
            
            self.aux2 = self.build_classifier(
                self.block4d,
                kernel_size=(5, 5),
                strides=(3, 3),
                name='aux2',
                aux=True
            )
            
    def inception_block(self, input_tensor, filter_spec: FilterSpec, name):
        with tf.variable_scope(name, reuse=tf.AUTO_REUSE):
            conv_1x1 = conv_block(
                input_tensor,
                output_channels=filter_spec.filters_1x1,
                kernel_size=(1, 1),
                name='conv_1x1',
            )
            
            conv_3x3_reduce = conv_block(
                input_tensor,
                output_channels=filter_spec.filters_3x3_reduce,
                kernel_size=(1, 1),
                name='conv_3x3_reduce',
            )
            
            conv_3x3 = conv_block(
                conv_3x3_reduce,
                output_channels=filter_spec.filters_3x3,
                kernel_size=(3, 3),
                name='conv_3x3'
            )
            
            conv_5x5_reduce = conv_block(
                input_tensor,
                output_channels=filter_spec.filters_5x5_reduce,
                kernel_size=(1, 1),
                name='conv_5x5_reduce'
            )
            
            conv_5x5 = conv_block(
                conv_5x5_reduce,
                output_channels=filter_spec.filters_5x5,
                name='conv_5x5'
            )
            
            pool_reduce = max_pool(
                input_tensor,
                kernel_size=(3, 3),
                strides=(1, 1),
                name='pool_reduce'
            )
            
            pool = conv_block(
                pool_reduce,
                output_channels=filter_spec.filters_pool,
                name='pool'
            )
            
            output = tf.concat(
                [conv_1x1, conv_3x3, conv_5x5, pool],
                axis=3, 
                name='concat'
            )
        return output
    
    def build_classifier(self, input_tensor, kernel_size, strides, name, aux=True):
        with tf.variable_scope(name, reuse=tf.AUTO_REUSE):
            pool = avg_pool(
                input_tensor,
                kernel_size=kernel_size,
                strides=strides,
                name='pool'
            )
            
            if aux:
                conv = conv_block(
                    pool,
                    output_channels=128,
                    kernel_size=(1, 1),
                    strides=(1, 1),
                    name='conv'
                )
                
                flatten_out = flatten(conv)
                
                fc1 = dense(flatten_out, 1024, name='fc1')
                relu1 = tf.nn.relu(fc1, name='relu1')
                
                fc = dense(relu1, self.num_classes, name='fc')
                
            else:
                flatten_out = flatten(pool)
                fc = dense(flatten_out, self.num_classes, name='fc')
            
            return fc
     
    def get_logits(self):
        return self.fc
    
    def get_trainable_variables(self):
        return tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='inception_v1')

In [114]:
tf.reset_default_graph()

sess = tf.InteractiveSession()



In [0]:
a = tf.placeholder_with_default(np.zeros((10, 224, 224, 3), dtype=np.float32), shape=(None, 224, 224, 3))
b = tf.placeholder_with_default(np.zeros((10, 5), dtype=np.float32), shape=(None, 5))

In [116]:
net = InceptionV1Network(a, num_classes=5)

3 64
64 64
64 192
192 64
192 96
96 128
192 16
16 32
192 32
256 128
256 128
128 192
256 32
32 96
256 64
480 192
480 96
96 208
480 16
16 48
480 64
512 160
512 112
112 224
512 24
24 64
512 64
512 128
512 128
128 256
512 24
24 64
512 64
512 112
512 144
144 288
512 32
32 64
512 64
528 256
528 160
160 320
528 32
32 128
528 128
832 256
832 160
160 320
832 32
32 128
832 128
832 384
832 192
192 384
832 48
48 128
832 128
512 128
528 128


In [0]:
sess.run(tf.global_variables_initializer())

In [0]:
class LossMeterInception:
    def __init__(self, network, inputs, labels, name):
        self.network = network
        self.inputs = inputs
        self.labels = labels
        
        self.logits = self.network.get_logits()
        self.name = name
        
        with tf.variable_scope(name, reuse=tf.AUTO_REUSE):
            self.loss = tf.reduce_mean(
                tf.nn.sigmoid_cross_entropy_with_logits(
                    labels=self.labels,
                    logits=self.logits
                )
            ) + 0.3 * tf.reduce_mean(
                tf.nn.sigmoid_cross_entropy_with_logits(
                    labels=self.labels,
                    logits=self.network.aux1
                )
            ) + 0.3 * tf.reduce_mean(
                tf.nn.sigmoid_cross_entropy_with_logits(
                    labels=self.labels,
                    logits=self.network.aux2
                )
            )
            
            self.predictions = tf.argmax(self.logits, axis=1)
            self.target = tf.argmax(self.labels, axis=1)
            
            self.metrics = define_metrics(
                f'{name}/metrics/',
                variables={
                    'target': self.target,
                    'predictions': self.predictions,
                    'loss': self.loss
                }
            )
    
            self.optimizer = tf.train.AdamOptimizer(
                learning_rate=0.0001
            ).minimize(
                self.loss,
                var_list=self.network.get_trainable_variables()
            )
        
    def reset_metrics(self):
        reset_metrics(f'{self.name}/metrics/train')
        reset_metrics(f'{self.name}/metrics/val')

In [0]:
loss_meter = LossMeterInception(net, a, b, name='inception_v1')

In [120]:
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())

for epoch_num in range(30):
    loss_meter.reset_metrics()
    for X_batch, y_batch in iterate_batches(flowers_x_train, flower_y_train, 64):
        _, _, _ = sess.run([
            loss_meter.optimizer,
            loss_meter.metrics['train_update_acc'],
            loss_meter.metrics['train_update_loss']
        ], feed_dict={
            loss_meter.inputs: X_batch,
            loss_meter.labels: y_batch
        })
        
        loss_value, accuracy = sess.run([
            loss_meter.metrics['train_acc'],
            loss_meter.metrics['train_loss']
        ])
#         print(loss_value, accuracy)
    
    print(f'Epoch {epoch_num + 1} train [acc, loss]:', sess.run([
        loss_meter.metrics['train_acc'],
        loss_meter.metrics['train_loss']
    ]))
    
    for X_batch, y_batch in iterate_batches(flowers_x_val, flower_y_val, 64, shuffle=False):
        _, _ = sess.run([
            loss_meter.metrics['val_update_acc'],
            loss_meter.metrics['val_update_loss']
        ], feed_dict = {
            loss_meter.inputs: X_batch,
            loss_meter.labels: y_batch,
        })
    print(
        f'Epoch {epoch_num + 1} val [acc, loss]:',
        sess.run([
            loss_meter.metrics['val_acc'], 
            loss_meter.metrics['val_loss']
        ])
    )

Epoch 1 train [acc, loss]: [0.22151533, 0.84626544]
Epoch 1 val [acc, loss]: [0.20462428, 0.80351466]
Epoch 2 train [acc, loss]: [0.3039329, 0.7399973]
Epoch 2 val [acc, loss]: [0.3583815, 0.70096]
Epoch 3 train [acc, loss]: [0.43001735, 0.637465]
Epoch 3 val [acc, loss]: [0.4231214, 0.6336546]
Epoch 4 train [acc, loss]: [0.4612493, 0.6044609]
Epoch 4 val [acc, loss]: [0.38034683, 0.7307488]
Epoch 5 train [acc, loss]: [0.4569115, 0.6113557]
Epoch 5 val [acc, loss]: [0.46127167, 0.6256698]
Epoch 6 train [acc, loss]: [0.5127241, 0.5815573]
Epoch 6 val [acc, loss]: [0.48901734, 0.59670895]
Epoch 7 train [acc, loss]: [0.5433777, 0.5421281]
Epoch 7 val [acc, loss]: [0.50057805, 0.64253205]
Epoch 8 train [acc, loss]: [0.61538464, 0.5027179]
Epoch 8 val [acc, loss]: [0.48901734, 0.65638846]
Epoch 9 train [acc, loss]: [0.60179293, 0.5070995]
Epoch 9 val [acc, loss]: [0.5919075, 0.5451559]
Epoch 10 train [acc, loss]: [0.6486408, 0.46504346]
Epoch 10 val [acc, loss]: [0.66589594, 0.47146264]
Epo

# Упражнения

* Примените методику аугментаций для тренировочного сета. Изменилось ли качество полученных сетей?

* batch normalization не дает четкой картины сходимости, попробуйте построить в tensorboard гистограммы сходимостей распределения векторов `moving_mean` и `moving_var`.