# Summary table of most frequently used neural network layers


| Pytorch name  | Formula   | Trainable parameters  | Element wise | Used as last layer | Used as inner layer |
|----|----|----|----|----|----|
| `torch.nn.Linear(in_features=10, out_features=5)`  | $ f_i(x) = \sum_j W_{i,j} x_j + b_i $ or $ f(x) = W x + b $ | `W.shape ==  (5, 10)`, `b.shape == (5,)`  | No | Yes, for regression. | Yes, alternated with non-linear activation function layers. |
| `torch.nn.Embedding(embedding_dim=256, num_embeddings=10_000)`  | $ f_i(j) = E_{i,j} $ | `E.shape ==  (256, 10_000)`  | No | No, only as input layer for integer identifiers. | Very rarely. |
| `torch.nn.Conv1D(in_channels=16, out_channels=32, kernel_size=5)` | See slides | `W.shape ==  (32, 16, 5)`, `b.shape == (32,)` |  No | Rarely | Yes, for sequence transformation. |
| `torch.nn.Conv2D(in_channels=16, out_channels=32, kernel_size=(3, 3))` | See slides | `W.shape ==  (32, 16, 3, 3)`, `b.shape == (32,)` |  No | Rarely | Yes, for image transformation. |
| `torch.nn.ReLU()`| $ f_i(x) = max(0, x_i) $ | None | Yes | No | Yes, as a non-linear activation function between parametrized layers. |
| `torch.nn.Sigmoid()` | $ f_i(x) = \frac{1}{1 + e^{-x_i}} $ | None | Yes | Yes, for binary classifiers. | Not in modern architectures. |
| `torch.nn.Softmax()` | $ f_i(x) = \frac{e^{x_i}}{\sum_j e^{x_j}} $ | None | No | Yes, for multiclass classifiers. | Sometimes, e.g. for attention mechanisms in transformers. |
| `torch.nn.Dropout(p=0.2)`| $ f_i(x) = 0 $ with probability $p$ or $ f_i(x) = x_i $ otherwise | None | Yes | No | Yes, mostly to prevent overfitting while training. |
| `torch.nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2))`| See slides. | None | No | No | Yes, mostly to reduce spatial dimensionality in vision networks. |
| `torch.nn.AvgPool2d(kernel_size=(2, 2), stride=(2, 2))`| See slides. | None | No | No | Yes, mostly to reduce spatial dimensionality in vision networks. |



In [1]:
import torch
from torchinfo import summary


In [37]:
linear = torch.nn.Linear(in_features=10, out_features=5)
input_data = torch.randn(1, 10)
output_data = linear(input_data)
output_data

tensor([[ 0.2350,  0.7306, -0.3485,  0.2325, -0.3516]],
       grad_fn=<AddmmBackward0>)

In [39]:
summary(linear, input_size=input_data.shape)

Layer (type:depth-idx)                   Output Shape              Param #
Linear                                   [1, 5]                    55
Total params: 55
Trainable params: 55
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 0.00
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00

In [6]:
embedding = torch.nn.Embedding(embedding_dim=256, num_embeddings=10_000)
input_data = torch.randint(low=0, high=10_000, size=(10,))
output_data = embedding(input_data)
output_data

tensor([[-0.7314,  0.1365,  0.8833,  ..., -0.4714,  0.2999, -0.8848],
        [-1.4625,  1.5284, -0.1809,  ..., -0.2909,  0.8182, -1.4778],
        [ 0.0916, -1.8457,  0.2169,  ..., -0.7263,  0.7409, -1.1889],
        ...,
        [-0.6754,  0.8027, -1.0114,  ...,  0.4772,  0.9615, -0.2119],
        [ 1.5572, -1.0840,  1.3718,  ...,  0.4971,  0.5859, -2.3298],
        [-0.2463, -1.7743,  1.2568,  ..., -1.0989, -1.1573,  1.3765]],
       grad_fn=<EmbeddingBackward0>)

In [None]:
# torchinfo.summary does not work with Embedding layers?
output_data.shape

torch.Size([10, 256])

In [28]:
conv1d = torch.nn.Conv1d(in_channels=16, out_channels=32, kernel_size=5, padding="same")
input_data = torch.randn(1, 16, 100)  # (batch_size, in_channels, input_length)
output_data = conv1d(input_data)
output_data

tensor([[[-0.4152,  0.3532,  0.5414,  ..., -0.1657,  0.4355, -0.5072],
         [-0.2555, -0.0250,  0.0893,  ..., -0.2941, -0.2616,  0.1319],
         [ 0.2198, -0.2883, -0.0334,  ..., -0.4723,  0.2367,  0.2849],
         ...,
         [ 0.5063,  0.7722,  0.8691,  ...,  0.5755, -0.5086,  0.1494],
         [-0.1414,  0.1739,  0.0394,  ..., -0.7839, -0.7505, -0.2803],
         [ 0.0523, -0.5452,  0.2403,  ..., -0.1316,  0.9275, -0.7144]]],
       grad_fn=<ConvolutionBackward0>)

In [31]:
conv1d.weight.shape, conv1d.bias.shape

(torch.Size([32, 16, 5]), torch.Size([32]))

In [29]:
summary(conv1d, input_size=input_data.shape)

Layer (type:depth-idx)                   Output Shape              Param #
Conv1d                                   [1, 32, 100]              2,592
Total params: 2,592
Trainable params: 2,592
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 0.26
Input size (MB): 0.01
Forward/backward pass size (MB): 0.03
Params size (MB): 0.01
Estimated Total Size (MB): 0.04

In [34]:
conv2d = torch.nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, padding="same")
input_data = torch.randn(1, 16, 100, 100)  # (batch_size, in_channels, input_height, input_width)
output_data = conv2d(input_data)
output_data

tensor([[[[-0.1282,  0.3398, -0.0484,  ...,  0.1026,  0.1006,  0.3596],
          [-0.5153, -0.1646, -0.1448,  ..., -0.1040, -0.4692,  0.3613],
          [ 0.2917,  0.3874, -0.8717,  ..., -0.6438,  0.4476, -0.4428],
          ...,
          [ 0.5198,  0.9007, -0.1777,  ...,  0.5318, -1.5554,  0.2853],
          [ 0.3252,  0.4171, -0.0967,  ...,  0.2620, -0.4453, -0.4608],
          [ 0.3799,  0.2737,  0.1409,  ..., -0.0295, -0.1304, -0.2311]],

         [[-0.0871,  0.0276, -0.4660,  ..., -0.8063,  0.3991,  0.7173],
          [ 0.6543,  0.6663,  0.3561,  ..., -0.3361,  0.4271,  0.9505],
          [ 0.2346,  0.3326,  0.1194,  ...,  0.4064, -0.2679, -0.7752],
          ...,
          [-0.2633,  0.1063,  0.0273,  ...,  0.6872, -0.3092, -0.0640],
          [ 0.0101,  0.0037, -0.1582,  ..., -0.5729, -0.4625,  0.1141],
          [ 0.1168, -0.0920,  0.4790,  ..., -0.3043, -0.2561,  0.3524]],

         [[ 0.4074, -0.3504,  0.2201,  ...,  0.3430,  0.5963, -0.1674],
          [ 0.5389, -0.2693,  

In [35]:
summary(conv2d, input_size=input_data.shape)

Layer (type:depth-idx)                   Output Shape              Param #
Conv2d                                   [1, 32, 100, 100]         12,832
Total params: 12,832
Trainable params: 12,832
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 128.32
Input size (MB): 0.64
Forward/backward pass size (MB): 2.56
Params size (MB): 0.05
Estimated Total Size (MB): 3.25