# Considering multiple input output channels in conv layers

Till now we considered only 1 dimensional input and 1 dimensional output. In reality RGB images have 3 channels hence the convolutional kernal should also have 3 channels. 
Similarly, while taking output from convolutional layer we generate N feature maps of W*H shape. This creates a need to have multiple output channels as well.

Now considering each case at a time

In [1]:
import torch
import torch.nn as nn

## Multi input channels

In [2]:
# rewriting conv2d function here
def conv2d(X, K):
  h, w = K.shape
  Y = torch.zeros((X.shape[0] - h + 1, X.shape[1] - w + 1))
  for i in range(Y.shape[0]):
    for j in range(Y.shape[1]):
      Y[i, j] = (X[i: i + h, j: j + w] * K).sum()
  return Y

In [3]:
def conv2d_multi_in(X, K):
  return sum([conv2d(x, k) for x, k in zip(X, K)])


In [14]:
X = torch.tensor([[[0,1,2], [2,3,4], [4,5,6], [6,7,8]],
                  [[0,1,2], [2,3,4], [4,5,6], [6,7,8]]])
K = torch.tensor([[[0,1], [1,2]],
                  [[0,1], [1,2]]])

In [15]:
conv2d_multi_in(X, K).shape

torch.Size([3, 2])

## Multiple input - output channels

In [16]:
def corr2d_multi_in_out(X, K):
  # Traverse along the 0th dimension of K, and each time, perform
  # cross-correlation operations with input X. All of the results are merged
  # together using the stack function
  return torch.stack([conv2d_multi_in(X, k) for k in K], dim=0)

In [None]:
corr2d_multi_in_out(X, K)