In [1]:
import torch
import torch.nn as nn
import numpy as np
from rich.console import Console

In [2]:
console = Console()

# Frequently used modules

- [`nn.Linear`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear)
- [`nn.Dropout`](https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html#torch.nn.Dropout)
- [`nn.Conv2d`](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#torch.nn.Conv2d)

## `nn.Linear`

In [3]:
console.rule("Linear w/ bias")
console.print(nn.Linear(2, 3).state_dict())

console.rule("Linear w/o bias")
console.print(nn.Linear(2, 3, bias=False).state_dict())

## `nn.Dropout`

In [4]:
nn.Dropout(0.2).state_dict()

OrderedDict()

In [5]:
test_in = torch.randn(6)
console.print("Initial data: {}".format(test_in))
console.print("First dropout operation: {}".format(nn.Dropout(0.5)(test_in)))
console.print("Second dropout operation: {}".format(nn.Dropout(0.5)(test_in)))

During training, randomly zeroes some of the elements of the input tensor with probability `p` using samples from a Bernoulli distribution. Each channel will be zeroed out independently on every forward call.

**Furthermore, the outputs are scaled by a factor of $\frac{1}{1-p}$ during training. This means that during evaluation the module simply computes an identity function.**




## `nn.Conv2d`

In the simplest case, the output value of the layer with input size $(N, C_{\text{in}}, H, W)$ and output $(N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})$ can be precisely described as:

$$\text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) +
    \sum_{k = 0}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k)$$

where $\star$ is the valid 2D `cross-correlation` operator, $N$ is a batch size, $C$ denotes a number of channels, $H$ is a height of input planes in pixels, and $W$ is width in pixels.

Shape conversion: 
$$H/W_{out} = \left\lfloor\frac{H/W_{in}  + 2 \times \text{padding}[0/1] - \text{dilation}[0/1]
                        \times (\text{kernel\_size}[0/1] - 1) - 1}{\text{stride}[0/1]} + 1\right\rfloor$$

Attributes:
- weight (Tensor): the learnable weights of the module of shape: $$(\text{out\_channels}, \frac{\text{in\_channels}}{\text{groups}}, \text{kernel\_size[0]}, \text{kernel\_size[1]})$$.
- bias (Tensor): the learnable bias of the module of shape (out_channels).

In [6]:
input_tensor = torch.randn(2, 16, 64, 64)
conv2d_3x3 = nn.Conv2d(16, 128, 3, stride=2, padding=1)
output_tensor = conv2d_3x3(input_tensor)

In [7]:
console.print("Input tensor shape: {}".format(input_tensor.shape))
console.print("Output tensor shape: {}".format(output_tensor.shape))
console.print("Convolution weight shape: {}".format(conv2d_3x3.weight.shape))
console.print("Convolution bias shape: {}".format(conv2d_3x3.bias.shape))