In [1]:
from IPython.display import Math
import numpy as np

# Introducing DenseNet
  
  
  
Ming Li  
Data Scientist  
Contributor to pandas, scikit-learn.

<figure>
<center><img src="./images/f838717a-6ad1-11e6-9391-f0906c80bc1d.jpg" width="640">
<figcaption>A single Dense Block of 5 layers.</figcaption></center></figure>

* Novel and simple connectivity pattern to increase network connections to $\frac{𝐿(𝐿+1)}{2}$
* Feature reuse.
* Parameters and computation efficient.
* Outperform current state-of-the-art results across various benchmarks.
* Easy and efficient* implementation.

## Convolution
In continuous domain of $\tau$, convolution is defined as:  
$$(f * g)(\tau) = \int_{0}^{t} f(\tau) g(\tau - t) d\tau$$
In discrete coordinate space $[h, w]$, this is equivalently defined as:
$$(f * g)[h, w] = \sum_{i}\sum_{j} f(h, w)g(h - i, w - j)$$

In [2]:
f = np.array([[1, 1, 1, 0, 0], [0, 1, 1, 1, 0], [0, 0, 1, 1, 1], [0, 0, 1, 1, 0], [0, 1, 1, 0, 0]])
g = np.array([[1, 0, 1], [0, 1, 0], [1, 0, 1]])

<figure>
<center><img src="./images/Convolution_schematic.gif" width="640">
<figcaption>Convolution during Forward Propagation, <a href=http://ufldl.stanford.edu/wiki/index.php/Feature_extraction_using_convolution>source of image</a></figcaption></center>
</figure>

In [32]:
def element_conv():
    h, w = 3, 3
    element_conv = np.zeros_like(g)
    for i in range(h):
        for j in range(w):
            kernel = f[i:h+i, j:w+j]
            element_conv[i, j] = np.sum(kernel * g)
    return element_conv

%timeit -n 1000 element_conv()

97.3 µs ± 3.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [33]:
element_conv()

array([[4, 3, 4],
       [2, 4, 3],
       [2, 3, 4]])

Implementation as Matrix Multiplication:

In [34]:
def matmul_conv():
    h, w = 3, 3
    col = np.zeros([9, 9])
    for i in range(h):
        for j in range(w):
            col[i*w+j] = f[i:h+i, j:w+j].flatten()
    return (g.flatten() @ col).reshape(g.shape)

 
%timeit -n 1000 matmul_conv()

30.1 µs ± 5.48 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


Identity

In [35]:
assert np.allclose(element_conv(), matmul_conv())

## Dense Block

A Dense Block in DenseNet is a block of hidden layers where subsequent layer reuses feature from preceding layers, through concatenation of feature maps along the depth.  

More concretely,
$$\begin{align*}
x_{l} &= f_{composite}(x_{0})\\
x_{2} &= f_{composite}([x_{0}, x_{1}])\\
x_{3} &= f_{composite}([x_{0}, x_{1}, x_{2}])
\end{align*}$$

Inspired from "identity shortcut connection" in Residual Block from ResNet, a Dense Block enhances connectivity and avoids loss of informaiton by replacing summation with concatenation. Enhanced connectivity allows Jacobians to flow through layers more freely during backpropagation.  
  
<figure>
<center><img src="./images/denseblock.png" width="960">
<figcaption>An illustration of Dense Block of 2 layers</figcaption></center>
</figure>



## Composite Function

$f_{composite}$ consists of 3 functions inspired from 'pre-activation' in ResNet: Batch Normalization, ReLU, Convolution.

<figure>
<center><img src="./images/pre-activation.png" width="360">
<figcaption>full pre-activation as in ResNet, note that "weight" indicates conv</figcaption></center>
</figure>

**Batch Normalizing Transform** is defined as:  
$$\begin{align}
\hat{x_{i}} &= \frac{x_{i} - \mu_{B}}{\sqrt{\sigma_{B}^2 + \epsilon}}\\
BN(x_{i}; \gamma, \beta) &= \gamma \hat{x_{i}} + \beta
\end{align}$$  
  
It has the benefit of regularization and data augmentation every time a feature map is reused.  
  

**Rectified Linear Unit (ReLU)** is defined as: $$f(x_{i}) = \max({0, x_{i}})$$  
  
Apart from faster evaluation than $\sigma(z) = \frac{1}{1 + e^{z}}$, $\frac{\partial{f(x)}}{\partial{x}} \in \{0, 1\}$ is also more favourable to $\frac{\partial{\sigma(x)}}{\partial{x}} \in [0., 0.25]$ in terms of reduce gradient vanishing.

## Dense Connectivity

In [6]:
from DenseNet.DenseNet import DenseNet as Constructer

In [7]:
IMAGE_SHAPE = (128, 128, 3)
KEEP_RATE = .80

In [8]:
dn = Constructer(IMAGE_SHAPE,
                 num_classes=17,
                 keep_prob=KEEP_RATE,
                 growth=32,
                 bottleneck=4,
                 compression=.5)
is_train = dn.is_train

In [9]:
def densenet(class_balance=False, l2_norm=False):
    """DenseNet-BC 121"""
    global prediction, loss, train_step, accuracy, saver, is_train

    init_conv = dn.add_conv_layer(
                            image_feed,
                            [[7, 7, IMAGE_SHAPE[-1], 2 * dn._k], [2 * dn._k]],
                            bn=False)
    init_pool = dn.add_pooling_layer(init_conv, kernel_size=[1, 3, 3, 1])
    dense_block_1 = dn.add_dense_block(init_pool, L=6)
    transition_layer_1 = dn.add_transition_layer(dense_block_1)
    dense_block_2 = dn.add_dense_block(transition_layer_1, L=12)
    transition_layer_2 = dn.add_transition_layer(dense_block_2)
    dense_block_3 = dn.add_dense_block(transition_layer_2, L=24)
    transition_layer_3 = dn.add_transition_layer(dense_block_3)
    dense_block_4 = dn.add_dense_block(transition_layer_3, L=16)
    global_pool = dn.add_global_average_pool(dense_block_4)
    dim = int(global_pool.get_shape()[-1])
    dense_layer_1 = dn.add_dense_layer(global_pool, [[dim, 1000], [1000]],
                                       bn=False)
    drop_out_1 = dn.add_drop_out_layer(dense_layer_1)
    logits = dn.add_read_out_layer(drop_out_1)