# Table of Contents
 <p><div class="lev1 toc-item"><a href="#Batch-Normalization" data-toc-modified-id="Batch-Normalization-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Batch Normalization</a></div><div class="lev2 toc-item"><a href="#Importações-e-ajustes" data-toc-modified-id="Importações-e-ajustes-11"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Importações e ajustes</a></div><div class="lev2 toc-item"><a href="#O-Algoritmo" data-toc-modified-id="O-Algoritmo-12"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>O Algoritmo</a></div><div class="lev3 toc-item"><a href="#Equações" data-toc-modified-id="Equações-121"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>Equações</a></div><div class="lev3 toc-item"><a href="#Grafo-de-operações" data-toc-modified-id="Grafo-de-operações-122"><span class="toc-item-num">1.2.2&nbsp;&nbsp;</span>Grafo de operações</a></div><div class="lev2 toc-item"><a href="#Keras:-parâmetros" data-toc-modified-id="Keras:-parâmetros-13"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Keras: parâmetros</a></div><div class="lev3 toc-item"><a href="#Entrada-com-dimensão-2" data-toc-modified-id="Entrada-com-dimensão-2-131"><span class="toc-item-num">1.3.1&nbsp;&nbsp;</span>Entrada com dimensão 2</a></div><div class="lev3 toc-item"><a href="#Entrada-com-dimensão-4" data-toc-modified-id="Entrada-com-dimensão-4-132"><span class="toc-item-num">1.3.2&nbsp;&nbsp;</span>Entrada com dimensão 4</a></div><div class="lev2 toc-item"><a href="#Keras:-Execução-na-fase-de-treinamento" data-toc-modified-id="Keras:-Execução-na-fase-de-treinamento-14"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Keras: Execução na fase de treinamento</a></div><div class="lev2 toc-item"><a href="#Keras:-Execução-na-fase-de-testes" data-toc-modified-id="Keras:-Execução-na-fase-de-testes-15"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Keras: Execução na fase de testes</a></div>

# Batch Normalization

## Importações e ajustes

In [52]:
%matplotlib inline
import matplotlib.pyplot as plot
from IPython import display

import sys
import numpy as np
import numpy.random as nr

import keras
from keras.preprocessing.image import ImageDataGenerator, load_img, img_to_array
from keras.models import Sequential
from keras.layers import Convolution2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense, BatchNormalization
from keras.initializers import Constant

from keras import backend as K

print('Keras ', keras.__version__)

np.set_printoptions(precision=2, suppress=True)
nr.seed(23456)


Keras  2.0.4


## O Algoritmo

Para descrição do algoritmo, veja o artigo original:
[Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/pdf/1502.03167.pdf) 

### Equações
\begin{align*} 
\boldsymbol{\mu} &= \frac{1}{m} \sum_{i = 1}^{m} \boldsymbol{x_i} &&
\boldsymbol{\sigma^{2}} = \frac{1}{m} \sum_{i = 1}^{m} (\boldsymbol{x_i} - \boldsymbol{\mu})^{2}
\\[3mm]
\hat{\boldsymbol{x_i}} &= \frac{\boldsymbol{x_i} - \boldsymbol{\mu}}{\sqrt{\boldsymbol{\sigma^{2}} + \epsilon}} &&
\boldsymbol{y_i} = \gamma \hat{\boldsymbol{x_i}} + \beta
\end{align*}

### Grafo de operações
<table align='left'>
<tr><td> <img src="https://drive.google.com/uc?id=0By1KMDFVxsI2ZGN6eWhCeTJSMDg"> </td></tr>
</table>

## Keras: parâmetros

### Entrada com dimensão 2

Usualmente, quando a entrada da camada tem duas dimensões (amostras e atributos) a normalização é feita na dimensão dos atributos, *axis=1*. Ou seja, calcula-se a estatística (média e variância) para cada coluna da matriz de dados, em cada *mini-batch*.

In [53]:
model = Sequential()
model.add(BatchNormalization(axis=1, input_shape=(5,), momentum=0.9, epsilon=0.0001, 
                             gamma_initializer=Constant(10), 
                             beta_initializer=Constant(11), 
                             moving_mean_initializer=Constant(12), 
                             moving_variance_initializer=Constant(13)))
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
batch_normalization_29 (Batc (None, 5)                 20        
Total params: 20
Trainable params: 10
Non-trainable params: 10
_________________________________________________________________


In [54]:
for layer in model.layers:
    print('\nConfiguration:')
    print('--------------')
    for k, v in layer.get_config().items():
        print('  {:30s}: {}'.format(k, v))
    print('\nParameters:')
    print('-----------')
    for p in layer.weights:
        if p in layer.trainable_weights:
            print('  Trainable:', p.name)
        else:
            print('            ', p.name)
print('\nmodel.get_weights():')
print('--------------------')
for w in model.get_weights():
    print('  ', w, w.shape)


Configuration:
--------------
  name                          : batch_normalization_29
  trainable                     : True
  batch_input_shape             : (None, 5)
  dtype                         : float32
  axis                          : 1
  momentum                      : 0.9
  epsilon                       : 0.0001
  center                        : True
  scale                         : True
  beta_initializer              : {'class_name': 'Constant', 'config': {'value': 11}}
  gamma_initializer             : {'class_name': 'Constant', 'config': {'value': 10}}
  moving_mean_initializer       : {'class_name': 'Constant', 'config': {'value': 12}}
  moving_variance_initializer   : {'class_name': 'Constant', 'config': {'value': 13}}
  beta_regularizer              : None
  gamma_regularizer             : None
  beta_constraint               : None
  gamma_constraint              : None

Parameters:
-----------
  Trainable: batch_normalization_29/gamma:0
  Trainable: batch_normal

### Entrada com dimensão 4

Usualmente, quando a entrada da camada tem quatro dimensões (amostras, filtros, altura e largura), a normalização é feita na segunda dimensão, *axis=1*, que corresponde aos filtros (canais). Ou seja, calcula-se a estatística (média e variância) para cada *mapa de atributos* do tensor. Assim, preserva-se a propriedade da invariância à translação, característica importante das convoluções.

In [62]:
model = Sequential()
model.add(BatchNormalization(axis=1, input_shape=(3, 10, 10), 
                             gamma_initializer=Constant(0), 
                             beta_initializer=Constant(1), 
                             moving_mean_initializer=Constant(2), 
                             moving_variance_initializer=Constant(3)))
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
batch_normalization_34 (Batc (None, 3, 10, 10)         12        
Total params: 12
Trainable params: 6
Non-trainable params: 6
_________________________________________________________________


In [56]:
for layer in model.layers:
    print('\nConfiguration:')
    print('--------------')
    for k, v in layer.get_config().items():
        print('  {:30s}: {}'.format(k, v))
    print('\nParameters:')
    print('-----------')
    for p in layer.weights:
        if p in layer.trainable_weights:
            print('  Trainable:', p.name)
        else:
            print('            ', p.name)
print('\nmodel.get_weights():')
print('--------------------')
for w in model.get_weights():
    print('  ', w, w.shape)


Configuration:
--------------
  name                          : batch_normalization_30
  trainable                     : True
  batch_input_shape             : (None, 3, 10, 10)
  dtype                         : float32
  axis                          : 1
  momentum                      : 0.99
  epsilon                       : 0.001
  center                        : True
  scale                         : True
  beta_initializer              : {'class_name': 'Constant', 'config': {'value': 1}}
  gamma_initializer             : {'class_name': 'Constant', 'config': {'value': 0}}
  moving_mean_initializer       : {'class_name': 'Constant', 'config': {'value': 2}}
  moving_variance_initializer   : {'class_name': 'Constant', 'config': {'value': 3}}
  beta_regularizer              : None
  gamma_regularizer             : None
  beta_constraint               : None
  gamma_constraint              : None

Parameters:
-----------
  Trainable: batch_normalization_30/gamma:0
  Trainable: batch_no

## Keras: Execução na fase de treinamento

In [57]:
K.set_learning_phase(1)

model = Sequential()
model.add(BatchNormalization(axis=1, input_shape=(5,), momentum=0.99, epsilon=0.0001))

model.set_weights([[ 5., 1., 1., 1., 1.],     # gamma
                   [ 3., 0., 0., 0., 0.],     # beta
                   [ 0., 0., 0., 0., 0.],
                   [ 1., 1., 1., 1., 1.]])

x = nr.random(size=(10, 5)) * 20
print(x, x.mean(0), x.std(0))
print()

y = model.predict(x, batch_size=10)
print(y, y.mean(0), y.std(0))


[[  6.44   6.55  18.55   6.23   3.24]
 [  7.28  10.58  15.78  17.51  12.73]
 [ 19.79  16.32   0.83   4.06   3.05]
 [ 15.06   9.1   19.09  18.75   2.71]
 [  4.72   6.3   13.45  12.41  12.44]
 [ 16.76  12.62  11.22  19.74  13.66]
 [  2.03  18.8   13.03   4.55   8.64]
 [  7.8   15.12   2.85   0.2   16.76]
 [  2.18   4.98   9.52  17.68   3.55]
 [ 15.72   0.7   12.17   6.17  15.4 ]] [  9.78  10.11  11.65  10.73   9.22] [ 6.14  5.37  5.7   6.91  5.35]

[[  0.28  -0.66   1.21  -0.65  -1.12]
 [  0.97   0.09   0.73   0.98   0.66]
 [ 11.16   1.16  -1.9   -0.97  -1.15]
 [  7.3   -0.19   1.31   1.16  -1.22]
 [ -1.12  -0.71   0.32   0.24   0.6 ]
 [  8.68   0.47  -0.08   1.3    0.83]
 [ -3.31   1.62   0.24  -0.89  -0.11]
 [  1.39   0.93  -1.54  -1.52   1.41]
 [ -3.19  -0.95  -0.37   1.01  -1.06]
 [  7.84  -1.75   0.09  -0.66   1.16]] [ 3.  0.  0. -0. -0.] [ 5.  1.  1.  1.  1.]


In [58]:
gamma, beta, mv_mean, mv_var = model.get_weights()

x2 = x - x.mean(0)
x2 /= x2.std(0)

x3 = x2 * gamma + beta

print(x3, x3.mean(0), x3.std(0))

[[  0.28  -0.66   1.21  -0.65  -1.12]
 [  0.97   0.09   0.73   0.98   0.66]
 [ 11.16   1.16  -1.9   -0.97  -1.15]
 [  7.3   -0.19   1.31   1.16  -1.22]
 [ -1.12  -0.71   0.32   0.24   0.6 ]
 [  8.68   0.47  -0.08   1.3    0.83]
 [ -3.31   1.62   0.24  -0.89  -0.11]
 [  1.39   0.93  -1.54  -1.52   1.41]
 [ -3.19  -0.95  -0.37   1.01  -1.06]
 [  7.84  -1.75   0.09  -0.66   1.16]] [ 3. -0.  0. -0.  0.] [ 5.  1.  1.  1.  1.]


## Keras: Execução na fase de testes

In [59]:
K.set_learning_phase(0)

model = Sequential()
model.add(BatchNormalization(axis=1, input_shape=(5,), momentum=0.999, epsilon=0.0001))

model.set_weights([[ 1., 1., 1., 1., 1.],     # gamma
                   [ 0., 0., 0., 0., 0.],     # beta
                   [ 2., 0., 0., 0., 0.],     # moving_mean
                   [ 4., 1., 1., 1., 1.]])    # moving_variance

x = nr.random(size=(10, 5)) * 20
print(x, x.mean(0), x.std(0))
print()

y = model.predict(x, batch_size=10)
print(y, y.mean(0), y.std(0))


[[ 11.93   1.89   8.87  16.09  13.22]
 [  4.34  18.26  11.8   16.71  11.83]
 [  5.66   1.13   5.83   6.64  17.66]
 [ 16.51  13.89  11.76   4.87   9.9 ]
 [ 16.43  18.43  10.69  11.26   0.42]
 [  6.33   0.14  11.56  11.57  13.15]
 [  2.91   9.98   9.79   2.48   9.13]
 [  0.92  12.94   7.76  11.34   7.56]
 [ 16.73  11.3   19.73  15.88   9.73]
 [  3.71  17.7    2.68  17.62  18.92]] [  8.55  10.57  10.05  11.45  11.15] [ 5.9   6.8   4.26  5.04  4.97]

[[  4.97   1.89   8.87  16.09  13.22]
 [  1.17  18.26  11.8   16.71  11.83]
 [  1.83   1.13   5.83   6.63  17.65]
 [  7.25  13.89  11.76   4.87   9.9 ]
 [  7.22  18.43  10.69  11.26   0.42]
 [  2.17   0.14  11.56  11.57  13.15]
 [  0.46   9.98   9.79   2.48   9.13]
 [ -0.54  12.94   7.76  11.34   7.56]
 [  7.37  11.3   19.73  15.88   9.73]
 [  0.86  17.7    2.68  17.62  18.92]] [  3.27  10.57  10.05  11.44  11.15] [ 2.95  6.8   4.26  5.04  4.97]


In [60]:
gamma, beta, mv_mean, mv_var = model.get_weights()

x2 = x - mv_mean
x2 /= np.sqrt(mv_var)

x3 = x2 * gamma + beta

print(x3, x3.mean(0), x3.std(0))

[[  4.97   1.89   8.87  16.09  13.22]
 [  1.17  18.26  11.8   16.71  11.83]
 [  1.83   1.13   5.83   6.64  17.66]
 [  7.25  13.89  11.76   4.87   9.9 ]
 [  7.22  18.43  10.69  11.26   0.42]
 [  2.17   0.14  11.56  11.57  13.15]
 [  0.46   9.98   9.79   2.48   9.13]
 [ -0.54  12.94   7.76  11.34   7.56]
 [  7.37  11.3   19.73  15.88   9.73]
 [  0.86  17.7    2.68  17.62  18.92]] [  3.27  10.57  10.05  11.45  11.15] [ 2.95  6.8   4.26  5.04  4.97]
