Convolutional Neural Networks are very similar to ordinary Neural Networks from the previous chapter: they are made up of neurons that have learnable weights and biases. Each neuron receives some inputs, performs a dot product and optionally follows it with a non-linearity. The whole network still expresses a single differentiable score function: from the raw image pixels on one end to class scores at the other. And they still have a loss function (e.g. SVM/Softmax) on the last (fully-connected) layer and all the tips/tricks we developed for learning regular Neural Networks still apply.

<img src="images/cover.png">

So what does change? ConvNet architectures make the explicit assumption that the inputs are images, which allows us to encode certain properties into the architecture. These then make the forward function more efficient to implement and vastly reduce the amount of parameters in the network.

### Architecture Overview
Recall: Regular Neural Nets. As we saw in the previous chapter, Neural Networks receive an input (a single vector), and transform it through a series of hidden layers. Each hidden layer is made up of a set of neurons, where each neuron is fully connected to all neurons in the previous layer, and where neurons in a single layer function completely independently and do not share any connections. The last fully-connected layer is called the “output layer” and in classification settings it represents the class scores.

Regular Neural Nets don’t scale well to full images. In CIFAR-10, images are only of size 32x32x3 (32 wide, 32 high, 3 color channels), so a single fully-connected neuron in a first hidden layer of a regular Neural Network would have 32*32*3 = 3072 weights. This amount still seems manageable, but clearly this fully-connected structure does not scale to larger images. For example, an image of more respectable size, e.g. 200x200x3, would lead to neurons that have 200*200*3 = 120,000 weights. Moreover, we would almost certainly want to have several such neurons, so the parameters would add up quickly! Clearly, this full connectivity is wasteful and the huge number of parameters would quickly lead to overfitting.

3D volumes of neurons. Convolutional Neural Networks take advantage of the fact that the input consists of images and they constrain the architecture in a more sensible way. In particular, unlike a regular Neural Network, the layers of a ConvNet have neurons arranged in 3 dimensions: width, height, depth. (Note that the word depth here refers to the third dimension of an activation volume, not to the depth of a full Neural Network, which can refer to the total number of layers in a network.) For example, the input images in CIFAR-10 are an input volume of activations, and the volume has dimensions 32x32x3 (width, height, depth respectively). As we will soon see, the neurons in a layer will only be connected to a small region of the layer before it, instead of all of the neurons in a fully-connected manner. Moreover, the final output layer would for CIFAR-10 have dimensions 1x1x10, because by the end of the ConvNet architecture we will reduce the full image into a single vector of class scores, arranged along the depth dimension. Here is a visualization:


### Layers used to build ConvNets
As we described above, a simple ConvNet is a sequence of layers, and every layer of a ConvNet transforms one volume of activations to another through a differentiable function. We use three main types of layers to build ConvNet architectures: Convolutional Layer, Pooling Layer, and Fully-Connected Layer (exactly as seen in regular Neural Networks). We will stack these layers to form a full ConvNet architecture.

Example Architecture: Overview. We will go into more details below, but a simple ConvNet for CIFAR-10 classification could have the architecture [INPUT - CONV - RELU - POOL - FC]. In more detail:

- INPUT [32x32x3] will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.
- CONV layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as [32x32x12] if we decided to use 12 filters.
- RELU layer will apply an elementwise activation function, such as the max(0,x) thresholding at zero. This leaves the size of the volume unchanged ([32x32x12]).
- POOL layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as [16x16x12].
- FC (i.e. fully-connected) layer will compute the class scores, resulting in volume of size [1x1x10], where each of the 10 numbers correspond to a class score, such as among the 10 categories of CIFAR-10. As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.

In this way, ConvNets transform the original image layer by layer from the original pixel values to the final class scores. Note that some layers contain parameters and other don’t. In particular, the CONV/FC layers perform transformations that are a function of not only the activations in the input volume, but also of the parameters (the weights and biases of the neurons). On the other hand, the RELU/POOL layers will implement a fixed function. The parameters in the CONV/FC layers will be trained with gradient descent so that the class scores that the ConvNet computes are consistent with the labels in the training set for each image.

### Spatial arrangement. 
We have explained the connectivity of each neuron in the Conv Layer to the input volume, but we haven’t yet discussed how many neurons there are in the output volume or how they are arranged. Three hyperparameters control the size of the output volume: the depth, stride and zero-padding. We discuss these next:

1. First, the depth of the output volume is a hyperparameter: it corresponds to the number of filters we would like to use, each learning to look for something different in the input. For example, if the first Convolutional Layer takes as input the raw image, then different neurons along the depth dimension may activate in presence of various oriented edges, or blobs of color. We will refer to a set of neurons that are all looking at the same region of the input as a depth column (some people also prefer the term fibre).
2. Second, we must specify the stride with which we slide the filter. When the stride is 1 then we move the filters one pixel at a time. When the stride is 2 (or uncommonly 3 or more, though this is rare in practice) then the filters jump 2 pixels at a time as we slide them around. This will produce smaller output volumes spatially.
3. As we will soon see, sometimes it will be convenient to pad the input volume with zeros around the border. The size of this zero-padding is a hyperparameter. The nice feature of zero padding is that it will allow us to control the spatial size of the output volumes (most commonly as we’ll see soon we will use it to exactly preserve the spatial size of the input volume so the input and output width and height are the same).

We can compute the spatial size of the output volume as a function of the input volume size (W), the receptive field size of the Conv Layer neurons (F), the stride with which they are applied (S), and the amount of zero padding used (P) on the border. You can convince yourself that the correct formula for calculating how many neurons “fit” is given by (W−F+2P)/S+1. For example for a 7x7 input and a 3x3 filter with stride 1 and pad 0 we would get a 5x5 output. With stride 2 we would get a 3x3 output. 

$$
Output = \frac{W - F + 2 * P}{S} + 1
$$

### Convolutions

Let's assume a grayscale image (single channel). A convolution is a function that takes an input 2D image $I(x,y)$ and outputs a filtered imaged $I'(x,y)$ such that $I'(x, y) = f(\text{neighborhood of } I(x, y))$. This means that in order to compute the value of the output pixel $I'(x, y)$ we will need not only the input pixel value $I(x, y)$ (as we did for brightness) but also its neighboring pixel values. This is not a strict definition but this is the basic idea.

Most convolution operations we will be using during the course will use an $n\text{x}n$ number of neighboring pixels, and will be of the form:

$$I'(u,v) = \sum_{i=0}^{n-1}\sum_{j=0}^{n-1}{w_{i,j}I(u - \lfloor n / 2 \rfloor + i, v-\lfloor n / 2 \rfloor + j)} + b$$

For the purpose of this lab tutorial $b$ will be zero, so the only parameters of our convolution operations will be the size of our neighborhood region $n$ and the weights $w_{i,j}$. Moreover, for the first few examples here the neighborhod size will be $3\text{x}3$, thus we will be dealing with the following operation to compute the output pixels $I'(u, v)$.

\begin{equation}
\begin{split}
I'(u,v) =\quad  &w_{0,0}I(u - 1, v - 1)\quad  &+\quad &w_{0,1}I(u - 1, v)\quad &+\quad &w_{0,2}I(u - 1, v + 1) \quad+ \\
          &w_{1,0}I(u, v - 1)\quad  &+\quad &w_{1,1}I(u, v)\quad &+\quad &w_{1,2}I(u, v + 1) \quad+ \\
          &w_{2,0}I(u + 1, v - 1)\quad  &+\quad &w_{2,1}I(u + 1, v)\quad &+\quad &w_{2,2}I(u + 1, v + 1) \quad+
\end{split}
\end{equation}

We are effectively computing a sliding window as illustrated here:

<img src="images/animation.gif" style="width:520px"/>

In practice, there are various ways of making the convolution operations faster by realizing that some computations are shared by consecutive windows, or by sacrificing memory and expressing the convolution as a single matrix multiplication, GPU acceleration is also possible and fortunately all of these are already implemented in pytorch. Convolutions are essential to obtain information from images as well as to apply many common effects to images like blurring, sharpening, etc. In pytorch we can use F.conv2d function to an input image.

Most image processing libraries implement a convolution operation, they are also referred to as image filtering operations in some libraries including PIL.

 ### Filters
Filters are frequently applied to images for different purposes. The human visual system applies edge detection filters to recognize an object.

<div class="imgcap">
<img src="images/ppl.jpg" style="border:none;width:30%">
</div>

<div class="imgcap">
<img src="images/edge.png" style="border:none;">
</div>

For example, to blur an image, we can apply a filter with patch size 3x3 over every pixel in the image:
<div class="imgcap">
<img src="images/filter_b.png" style="border:none;">
</div>

To apply the filter to an image, we move the filter 1 pixel at a time from left to right and top to bottom until we process every pixel.
<div class="imgcap">
<img src="images/stride.png" style="border:none;width:50%">
</div>

#### Stride and padding
However, we may encounter some problem on the edge. For example, on the top left corner, a filter may cover beyond the edge of an image. For a filter with patch size 3x3, we may ignore the edge and generate an output with width and height reduce by 2 pixels. Otherwise, we can pack extra 0 or replicate the edge of the original image. All these settings are possible and configurable as "padding" in a CNN. 
<div class="imgcap">
<img src="images/padding.png" style="border:none;width:50%">
</div>

> Padding with extra 0 is more popular because it maintains spatial dimensions and better preserve information on the edge.

For a CNN, sometimes we do not move the filter only by 1 pixel. If we move the filter 2 pixels to the right, we say the "X stride" is equal to 2.
<div class="imgcap">
<img src="images/stride2.png" style="border:none;width:50%">
</div>

Notice that both padding and stride may change the spatial dimension of the output. A stride of 2 in X direction will reduce X-dimension by 2. Without padding and x stride equals 2, the output shrink N pixels:

$$
N = \frac {\text{filter patch size} - 1} {2}
$$

In [68]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

def conv_out(W, K, S, P):
    return (W - K + 2 * P) / (S) + 1

In [85]:
W = 7
K = 5
S = 1
P = 2
m = nn.Conv2d(3, 10, kernel_size=K, stride=S, padding=P)
inp = Variable(torch.empty(1, 3, W, W).random_(10))
out = m(inp)
out.size()

torch.Size([1, 10, 7, 7])

In [86]:
conv_out(W, K, S, P)

7.0

In [87]:
inp

tensor([[[[1., 9., 4., 5., 3., 7., 5.],
          [0., 2., 1., 9., 8., 7., 2.],
          [7., 5., 4., 5., 2., 9., 8.],
          [5., 3., 5., 8., 1., 4., 4.],
          [0., 9., 9., 2., 1., 4., 7.],
          [5., 8., 2., 4., 1., 7., 4.],
          [9., 5., 2., 1., 4., 6., 2.]],

         [[4., 1., 6., 0., 3., 2., 0.],
          [8., 9., 5., 4., 6., 5., 1.],
          [1., 0., 2., 7., 9., 3., 4.],
          [0., 6., 7., 3., 0., 0., 3.],
          [1., 7., 4., 0., 6., 4., 9.],
          [7., 9., 7., 2., 6., 9., 2.],
          [5., 3., 6., 5., 3., 0., 6.]],

         [[4., 7., 6., 8., 4., 1., 3.],
          [4., 7., 4., 9., 4., 7., 3.],
          [4., 6., 5., 5., 0., 4., 7.],
          [7., 0., 1., 1., 3., 4., 7.],
          [1., 5., 3., 6., 1., 4., 4.],
          [2., 0., 4., 6., 3., 0., 4.],
          [6., 1., 3., 0., 3., 9., 1.]]]])

### Convolutional neural network (CNN)
A convolutional neural network composes of convolution layers, polling layers and fully connected layers(FC). 

<div class="imgcap">
<img src="images/conv_layer.png" style="border:none;width:70%">
</div>

When we process the image, we apply filters which each generates an output that we call **feature map**. If k-features map is created, we have feature maps with depth k.

<div class="imgcap">
<img src="images/filter_m.png" style="border:none;width:70%">
</div>



#### Pooling

To reduce the spatial dimension of a feature map, we apply maximum pool. A 2x2 maximum pool replaces a 2x2 area by its maximum. After applying a 2x2 pool, we reduce the spatial dimension for the example below from 4x4 to 2x2. (Filter size=2, Stride = 2)
<div class="imgcap">
<img src="images/pooling.png" style="border:none;width:50%">
</div>

Here, we construct a CNN using convolution and pooling:
<div class="imgcap">
<img src="images/conv_layer2.png" style="border:none;width:50%">
</div>

Pooling is often used with a convolution layer. Therefore, we often consider it as part of the convolution layer rather than a separate layer. The most common configuration is the maximum pool with filter size 2 and stride size 2. A filter size of 3 and stride size 2 is less common. Other pooling like average pooling has been used but fall out of favor lately. As a side note, some researcher may prefer using striding in a convolution filter to reduce dimension rather than pooling.

In [91]:
m = nn.MaxPool2d(kernel_size=3, stride=1)
output = m(inp)
output

tensor([[[[9., 9., 9., 9., 9.],
          [7., 9., 9., 9., 9.],
          [9., 9., 9., 9., 9.],
          [9., 9., 9., 8., 7.],
          [9., 9., 9., 7., 7.]],

         [[9., 9., 9., 9., 9.],
          [9., 9., 9., 9., 9.],
          [7., 7., 9., 9., 9.],
          [9., 9., 7., 9., 9.],
          [9., 9., 7., 9., 9.]],

         [[7., 9., 9., 9., 7.],
          [7., 9., 9., 9., 7.],
          [7., 6., 6., 6., 7.],
          [7., 6., 6., 6., 7.],
          [6., 6., 6., 9., 9.]]]])

In [92]:
m = nn.AvgPool2d(kernel_size=3, stride=1)
m(inp)

tensor([[[[3.6667, 4.8889, 4.5556, 6.1111, 5.6667],
          [3.5556, 4.6667, 4.7778, 5.8889, 5.0000],
          [5.2222, 5.5556, 4.1111, 4.0000, 4.4444],
          [5.1111, 5.5556, 3.6667, 3.5556, 3.6667],
          [5.4444, 4.6667, 2.8889, 3.3333, 4.0000]],

         [[4.0000, 3.7778, 4.6667, 4.3333, 3.6667],
          [4.2222, 4.7778, 4.7778, 4.1111, 3.4444],
          [3.1111, 4.0000, 4.2222, 3.5556, 4.2222],
          [5.3333, 5.0000, 3.8889, 3.3333, 4.3333],
          [5.4444, 4.7778, 4.3333, 3.8889, 5.0000]],

         [[5.2222, 6.3333, 5.0000, 4.6667, 3.6667],
          [4.2222, 4.2222, 3.5556, 4.1111, 4.3333],
          [3.5556, 3.5556, 2.7778, 3.1111, 3.7778],
          [2.5556, 2.8889, 3.1111, 3.1111, 3.3333],
          [2.7778, 3.1111, 3.2222, 3.5556, 3.2222]]]])

### Multiple convolution layers

Like deep learning, the depth of the network increases the complexity of a model. A CNN network usually composes of many convolution layers. 
<div class="imgcap">
<img src="images/convolution_b1.png" style="border:none;width:70%">
</div>

The CNN above composes of 3 convolution layer. We start with a 32x32 pixel image with 3 channels (RGB). We apply a 3x4 filter and a 2x2 max pooling which convert the image to 16x16x4 feature maps.  The following table walks through the filter and layer shape at each layer:
<div class="imgcap">
<img src="images/cnn_chanl.png" style="border:none">
</div>

### Fully connected (FC) layers
After using convolution layers to extract the spatial features of an image, we apply fully connected layers for the final classification. First, we flatten the output of the convolution layers. For example, if the final features maps have a dimension of 4x4x512, we will flatten it to an array of 4096 elements. We apply 2 more hidden layers here before we perform the final classification. The techniques needed are no difference from a FC network in deep learning.

<div class="imgcap">
<img src="images/convolution_b2.png" style="border:none;width:50%">
</div>

### Tips

Here are some of the tips to construct a CNN:
* Use smaller filters like 3x3 or 5x5 with more convolution layer. 
* Convolution filter with small stride works better.
* If GPU memory is not large enough, sacrifice the first layer with a larger filter like 7x7 with stride 2.
* Use padding fill with 0.
* Use filter size 2, stride size 2 for the maximum pooling if needed.

For the network design:
1. Start with 2-3 convolution layers with small filters 3x3 or 5x5 and no pooling. 
2. Add a 2x2 maximum pool to reduce the spatial dimension.  
3. Repeat 1-2 until a desired spatial dimension is reached for the fully connected layer. This can be a try and error process.
4. Use 2-3 hidden layers for the fully-connection layers.

### Convolutional pyramid

For each convolution layer, we reduce the spatial dimension while increasing the depth of the feature maps. Because of the shape, we call this a convolutional pyramid.

<div class="imgcap">
<img src="images/cnn3d.png" style="border:none;">
</div>

Here, we reduce the spatial dimension of each convolution layer through pooling or sometimes apply a filter with stride size > 1.
<div class="imgcap">
<img src="images/cnn3d4.png" style="border:none;width:50%">
</div>

The depth of the feature map can be increased by applying more filters.
<div class="imgcap">
<img src="images/cnn3d2.png" style="border:none;">
</div>

The core thinking of CNN is to apply small filters to explore spatial feature. The spatial dimension will gradually decrease as we go deep into the network. On the other hand, the depth of the feature maps will increase. It will eventually reach a stage that spatial locality is less important and we can apply a FC network for final analysis.

#### Google inceptions with 1x1 convolution

In our previous discussion, the convolution filter in each layer is of the same patch size say 3x3. To increase the depth of the feature maps, we can apply more filters using the same patch size. However, in GoogleNet, it applies a different approach to increase the depth. GoogleNet uses different filter patch size for the same layer. Here we can have filters with patch size 3x3 and 1x1. Don't mistake that a 1x1 filter is doing nothing. It does not explore the spatial dimension but it explores the depth of the feature maps. For example, in the 1x1 filter below, we convert the RGB channels (depth 3) into two feature maps output. The first set of filters generates 8 features map while the second one generates two. We can concatenate them to form maps of depth 10. The inception idea is to increase the depth of the feature map by concatenating feature maps using different patch size of convolution filters and pooling. 
<div class="imgcap">
<img src="images/inception.png" style="border:none;width:60%">
</div>

Inceptions can be considered as one way to introduce non-linearity into the system.

#### Fully connected network

After exploring the spatial relationship, we flatten the convolution layer output and connect it to a fully connected network:

<div class="imgcap">
<img src="images/cnn3d5.png" style="border:none;width:70%">
</div>

<div class="imgcap">
<img src="images/cnn3d6.png" style="border:none;width:70%">
</div>

### Eample of simple cnn network
<div class="imgcap">
<img src="images/simple_cnn.png" style="border:none;width:50%">
</div>

In [95]:
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.mp = nn.MaxPool2d(kernel_size=2)
        self.fc = nn.Linear(320, 10)
        
    def forward(self, x):
        in_size = x.size(0)
        x = F.relu(self.mp(self.conv1(x)))
        x = F.relu(self.mp(self.conv2(x)))
        x = x.view(in_size, -1)
        x = self.fc(x)
        return F.log_softmax(x)

#### Visualization
CNN uses filters to extract features of an image. It would be interesting to see what kind of filters that a CNN eventually trained. This gives us some insight understanding what the CNN trying to learn. 

Here are the 96 filters learned in the first convolution layer in AlexNet. Many filters turn out to be edge detection filters common to human visual systems. (Source from Krizhevsky et al.)
<div class="imgcap">
<img src="images/cnnfilter.png" style="border:none;width:50%">
</div>

The right side shows images with the highest activation in some feature maps at layer 4. Then we reconstruct the images based on the activations in the feature maps. This gives up some understanding of what the our model is looking for.
(Source from Matthew D Zeiler et al.)
<div class="imgcap">
<img src="images/cnnlayer_4.png" style="border:none;width:70%">
</div>

> If the visualization of the filters seems lossy, it indicates we need more training iterations or we are overfitting.

#### Batch normalization & ReLU

After applying filters on the input, we apply a batch normalization followed by a ReLU for non-linearity. The batch normalization renormalizes data to make learning faster with the Gradient descent. 

Batch normalization applies this equation to the input:

$$
z = \frac{x - \mu}{\sigma}
$$

For a feature map with the spatial dimension 10x10, we compute 100 means and 100 variance from the batch samples. For example, if the batch size is 16, the mean for the feature at location (i, j) is computed by:

$$
\mu_{i, j} = \frac{o^{(1)}_{i, j} + o^{(1)}_{i, j} + \dots + o^{(16)}_{i, j}}{16} \quad \text{which } o^{(k)} \text{ is the output from batch sample } k \in (1, 16)\\
$$


We feed $z$ to a linear equation with the trainable scalar values $ \gamma $ and $ \beta$ (1 pair for each normalized layer). 

$$
out = \gamma z + \beta
$$

The normalization can be undone if $ gamma = \sigma $ and $ \beta = \mu $. We initialize $\gamma = 1$ and $\beta =0 $, so the input is normalized and therefore learns faster, and the parameters will be learned during the training.

In [97]:
# Without Learnable Parameters
m = nn.BatchNorm2d(3, affine=False)
m(inp)

tensor([[[[-1.3074,  1.6045, -0.2154,  0.1486, -0.5794,  0.8766,  0.1486],
          [-1.6714, -0.9434, -1.3074,  1.6045,  1.2405,  0.8766, -0.9434],
          [ 0.8766,  0.1486, -0.2154,  0.1486, -0.9434,  1.6045,  1.2405],
          [ 0.1486, -0.5794,  0.1486,  1.2405, -1.3074, -0.2154, -0.2154],
          [-1.6714,  1.6045,  1.6045, -0.9434, -1.3074, -0.2154,  0.8766],
          [ 0.1486,  1.2405, -0.9434, -0.2154, -1.3074,  0.8766, -0.2154],
          [ 1.6045,  0.1486, -0.9434, -1.3074, -0.2154,  0.5126, -0.9434]],

         [[-0.0284, -1.0737,  0.6684, -1.4221, -0.3768, -0.7252, -1.4221],
          [ 1.3652,  1.7136,  0.3200, -0.0284,  0.6684,  0.3200, -1.0737],
          [-1.0737, -1.4221, -0.7252,  1.0168,  1.7136, -0.3768, -0.0284],
          [-1.4221,  0.6684,  1.0168, -0.3768, -1.4221, -1.4221, -0.3768],
          [-1.0737,  1.0168, -0.0284, -1.4221,  0.6684, -0.0284,  1.7136],
          [ 1.0168,  1.7136,  1.0168, -0.7252,  0.6684,  1.7136, -0.7252],
          [ 0.3200, -0.

### Dropout
Dropout is an extremely effective, simple and recently introduced regularization technique by Srivastava et al. in Dropout: A Simple Way to Prevent Neural Networks from Overfitting (pdf) that complements the other methods (L1, L2, maxnorm). While training, dropout is implemented by only keeping a neuron active with some probability p (a hyperparameter), or setting it to zero otherwise.

<div class="imgcap">
<img src="images/dropout.jpeg" style="border:none;width:70%">
</div>

In [98]:
m = nn.Dropout(p=0.2)
m(inp)

tensor([[[[ 0.0000, 11.2500,  5.0000,  0.0000,  3.7500,  8.7500,  6.2500],
          [ 0.0000,  2.5000,  1.2500,  0.0000, 10.0000,  8.7500,  2.5000],
          [ 8.7500,  0.0000,  5.0000,  6.2500,  2.5000, 11.2500, 10.0000],
          [ 6.2500,  0.0000,  6.2500, 10.0000,  1.2500,  5.0000,  0.0000],
          [ 0.0000, 11.2500, 11.2500,  2.5000,  1.2500,  5.0000,  8.7500],
          [ 6.2500, 10.0000,  2.5000,  5.0000,  1.2500,  8.7500,  0.0000],
          [11.2500,  0.0000,  2.5000,  0.0000,  5.0000,  7.5000,  2.5000]],

         [[ 5.0000,  1.2500,  0.0000,  0.0000,  3.7500,  2.5000,  0.0000],
          [10.0000, 11.2500,  6.2500,  5.0000,  7.5000,  0.0000,  0.0000],
          [ 0.0000,  0.0000,  0.0000,  8.7500, 11.2500,  0.0000,  5.0000],
          [ 0.0000,  7.5000,  8.7500,  3.7500,  0.0000,  0.0000,  3.7500],
          [ 1.2500,  8.7500,  5.0000,  0.0000,  7.5000,  5.0000, 11.2500],
          [ 8.7500, 11.2500,  8.7500,  2.5000,  0.0000, 11.2500,  0.0000],
          [ 6.2500,  3.