# Evaluating dimensions

In this notebook, we test the dimensions for a DeepAnt type neural network. The input parameters are the following:
- `n_ts` is the number of variables in the time series. For instance, if the time series is univariate, we will have `n_ts = 1`. For a bivariate time series, we will have `n_ts = 2`, and so on.
- `w` is the size of the window we will use as input to the algorithm. If we decide to predict using the past 15 steps, then `w = 15`. 
- `p_w` is the number of steps in the future we are trying to predict. If we try to predict 3 steps in the future, then `p_w = 3`.
- `bs` is the batch size. For the purpose of understanding how dimensions can fit together, we will not consider it since it does not have any impact on dimensions calculations

The DeepAnt algorithm is a succession of convolution, activation and pooling layers. Given the shape of the datasets, we have two ways to specify the neural network:

- With one-dimension convolutional layers
- With two-dimension convolutional layers


## One-dimension convolutional layers

If we use one dimension convolutional layers, then the most obvious way to structure the input data is with `n_ts` channels and `w` length, as shown in the picture below.

![Image of input dimensions](Images/DeepAnt-OneDimension-Input.png)

Let's now define the parameters of the different layers:
- Convolution layers:
    - `n_fs`: number of filters (for DeepAnt, this will be 32)
    - `f_size`: filter size. We'll use filters of size 3 (one dimensional)
    - `padding`: padding size in the convolution. We'll use padding of one
    - `c_stride`: size of stride. We'll use stride of one (default)
- Maximum pooling layers:
    - `k_size`: kernel size. We'll use filters of size 3 (one dimensional)
    - `padding`: padding size in the maximum pooling. We'll use padding of one
    - `p_stride`: size of stride. We'll use stride of 3 (default equal to size of kernel)
- Linear (fully connected) layer:
    - `p_w`: number of steps to predict in the future
    - `n_ts`: number of variable in the time series
    
The output must be structured with the following dimensions: `n_ts` * `p_w`, as shown below. 

![Image of output dimensions](Images/DeepAnt-OneDimension-Output.png)

### Computing dimensions in the case of one dimension convolution

Let's assume our input has size `n_ts` * `w`. After our first convolution, the size of the output should be the following: `n_c_out` * `c_w_out` where:

$$ n\_c\_out = n\_fs $$

$$ c\_w\_out = \lfloor\frac{w + 2 * padding + f\_size - 2}{c\_stride} +1\rfloor $$

After our first maximum pooling layer, our dimensions will be `n_fs` * `p_w_out` where:

$$ n\_fs = n\_fs $$ (unchanged)

$$ p\_w\_out = \lfloor\frac{c\_w\_out + 2 * padding + k\_size - 2}{p\_stride} + 1\rfloor $$

Let's now define a function that will calculate the dimensions of a CNN with the following strucure:

- 1D Convolution
- Max pooling layer

In [18]:
from math import floor

In [19]:
def calculateDimensions(window, c_in, n_fs, f_size, padding, c_stride, k_size, p_stride):
    n_c_out = n_fs
    
    c_w_out = floor((window + 2*padding + f_size - 2) / c_stride + 1)
    
    p_w_out = floor((c_w_out + 2*padding + k_size - 2) / p_stride + 1)
    
    return (n_fs, p_w_out)

In [20]:
# for a window of size 25, a quadrivariate time series, convolutions with 
# 32 kernels (as stated in the DeepAnt paper), convolution filters
# of size 3, kernel for max pooling of size 2, convolution stride of 1
# pooling stride equel to kernel size (default in pytorch) and 
# padding of 1

window = 25
c_in = 4
n_fs = 32
f_size = 3
padding = 1
c_stride = 1
k_size = 2
p_stride = 2

(new_c_in, size) = calculateDimensions(window, c_in, n_fs, f_size, padding, c_stride, k_size, p_stride)
(new_c_in, size)

(32, 16)

In [21]:
(dim1, dim2) = calculateDimensions(size, new_c_in, n_fs, f_size, padding, c_stride, k_size, p_stride)
(dim1, dim2)

(32, 12)

### Computing the size of the fully connected layer

Pytorch `nn.Linear()` class will transform input of size $(N,*,H_{in})$ into $(N,*,H_{out})$, where $N$ is the batch size, $*$ is any number of dimensions that will be **unchanged** and $H_{in}$ and $H_{out}$ are the sizes of the input and output features (see https://pytorch.org/docs/stable/nn.html?highlight=linear#torch.nn.Linear).

If we want to output predictions for our time series, this means our output dimensions will be `n_ts` * `p_w` (leaving batch size aside). Therefore, we must first redimension our max pooling output to something that will fit the linear layer. 

One of the way to do that is to use `Tensor.view()` to redimension one of the dimension to `n_ts`, which means that our input dimensions to the linear layer will be `n_ts` * `redim`, where `redim` will be automatically calculated by the `Tensor.view()` method. However, the tricky part is that the product of the max pool output dimensions **MUST** be divisible by `n_ts`, which will probably not always be the case!!

In the previous case, this gives the following.

In [22]:
product = dim1 * dim2

In [23]:
# c_in is actually n_ts
redim = product / c_in
redim

96.0

#### Let's break stuff...

In [24]:
window = 25
c_in = 7
n_fs = 32
f_size = 3
padding = 1
c_stride = 1
k_size = 2
p_stride = 2

(new_c_in, size) = calculateDimensions(window, c_in, n_fs, f_size, padding, c_stride, k_size, p_stride)
(new_c_in, size)

(32, 16)

In [25]:
(dim1, dim2) = calculateDimensions(size, new_c_in, n_fs, f_size, padding, c_stride, k_size, p_stride)
(dim1, dim2)

(32, 12)

In [26]:
product = dim1 * dim2
# c_in is actually n_ts
redim = product / c_in
redim

54.857142857142854

The case above (7 time series) won't work...

## Two-dimension convolutional layers