# 7.4 Multiple Input and Multiple Output Channels

In [2]:
import torch
from d2l import torch as d2l

### 7.4.1 Multiple Input Channels

In [3]:
#Performing a cross-correlation operation per channel and then adding up the results
def corr2d_multi_in(X,K):
  #Iterate through the 0th dimension (channel) of K first, then add them up
  return sum(d2l.corr2d(x,k) for x,k in zip(X,K))

In [4]:
X = torch.tensor([[[0.0, 1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0, 8.0]],
               [[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]])
K = torch.tensor([[[0.0, 1.0], [2.0, 3.0]], [[1.0, 2.0], [3.0, 4.0]]])

corr2d_multi_in(X, K)

tensor([[ 56.,  72.],
        [104., 120.]])

### 7.4.2 Multiple Output Channels

In [5]:
def corr2d_multi_in_out(X,K):
  #Iterate through the 0th dimension of K, and each time, perform cross-correlation
  #operations with input X. All of the results are stacked together
  return torch.stack([corr2d_multi_in(X,k) for k in K],0)

In [7]:
K = torch.stack((K, K+1, K+2),0)
K.shape

torch.Size([3, 2, 2, 2])

In [8]:
#Now the output contains three channels
corr2d_multi_in_out(X, K)

tensor([[[ 56.,  72.],
         [104., 120.]],

        [[ 76., 100.],
         [148., 172.]],

        [[ 96., 128.],
         [192., 224.]]])

### 7.4.3 1x1 Convolutional Layer

In [9]:
#We need to make some adjustments to the data shape before and after the matrix multiplication
def corr2d_multi_in_out_1x1(X,K):
  c_i, h, w = X.shape
  c_o = K.shape[0]
  X = X.reshape((c_i, h * w))
  K = K.reshape((c_o, c_i))
  #Matrix multiplication in the fully connected layer
  Y = torch.matmul(K, X)
  return Y.reshape((c_o, h, w))

In [10]:
X = torch.normal(0, 1, (3, 3, 3))
K = torch.normal(0, 1, (2, 3, 1, 1))
Y1 = corr2d_multi_in_out_1x1(X,K)
Y2 = corr2d_multi_in_out(X, K)
assert float(torch.abs(Y1 -Y2).sum()) < 1e-6

## Discussions

- When we add channels into the mix, our inputs and hidden representations both become three-dimensional tensors.
- Channel dimension

7.4.1 Multiple Input Channels
- When the input data contains multiple channels, we need to construct a convolution kernel with the same number of input channels as the input data, so that it can perform cross-correlation with the input data.
- Since the input and convolution kernel each have $c_i$  channels, we can perform a cross-correlation operation on the two-dimensional tensor of the input and the two-dimensional tensor of the convolution kernel for each channel, adding the $c_i$ results together (summing over the channels) to yield a two-dimensional tensor. This is the result of a two-dimensional cross-correlation between a multi-channel input and a multi-input-channel convolution kernel.

7.4.2. Multiple Output Channels
- It's essential to have multiple channels at each layer.
- In the most popular neural network, we actually increase the channel dimension as we go deeper in the neural network, typically downsampling to trade off spatial resolution for greater *channel depth*
- Denote by $c_i$ and $c_o$ the number of input and output channels, respectively, and by $k_h$ and $k_w$ the height and width of the kernel. To get an output with multiple channels, we can create a kernel tensor of shape $c_i \times k_h \times k_w$ for every output channel. We concatenate them on the output channel dimension, so that the shape of the convolution kernel is $c_o \times c_i \times k_h \times k_w$. In cross-correlation operations, the result on each output channel is calculated from the convolution kernel corresponding to that output channel and takes input from all channels in the input tensor.

7.4.3 1x1 Convolutional Layer
- They are popular operations that are sometimes inlcuded in the designs of complex deep networks.
- The 1x1 convolution loses the ability of larger convolutional layers to recognize patterns consisting of interactions among adjacents elements in the height and width dimensions. The only computation of the 1x1 convolution occurs on the channel dimension

- Channels allow the CNN to reason with multiple features, such as edge and shape detectors at the same time
- Channels offer a pratical trade-off between the drastic parameter reduction arising from the translation invariance and locality, and the need for expressive and diverse models in computer vision.
- Given an image of size ($h \times w$), the cost for computing a $k \times k$ convolutions is $O(h \cdot w \cdot k^2)$. For $c_i$ and $c_o$ input and output channels respectively this increasses to $O(h \cdot w \cdot k^2 \cdot c_i \cdot c_o)$

### Exercises

1. Assume that we have two convolution kernels of size $k_1$ and $k_2$, respectively (with no nonlinearity in between).

Prove that the result of the operation can be expressed by a single convolution.

$h_1$ and $h_2$ being two convolution kernels of sizes $k_1$ and $k_2$.
The first convolution is:
  $$y_1 = h_1 * x$$
the second convolution:
$$y_2 = h_2 * y_1 = h_2*(h_1*x)$$
Because convolutions are associative, the result of two consecutive convolutions can be replaced by a single convolution with a kernel that is the convolution of $h_1$ and $h_2$:
  $$h_{eq} = h_2*h_1$$


What is the dimensionality of the equivalent single convolution?
  $$k_{eq} = k_1 + k_2 - 1$$

2. Are the variables Y1 and Y2 in the final example of this section exactly the same? Why?

In both functions, corr2d_multi_in_ou_1x1 and corr2d_multi_in_out, the mathematical operation that is performed is identical for a 1x1  convolution. The only difference is how the computation is tructured. corr2d_multi_in_ou_1x1 uses matrix multiplication to achieve the result in one step, while corr2d_mult_in_out uses iterative convolutions across the input channels and outputs

In [13]:
assert float(torch.abs(Y1 -Y2).sum()) < 1e-6