## Initializing a Layer

In [1]:
import torch
from torch import Tensor
import numpy as np

A simple layer where *x* is the input, *w* is the weight and *b* the bias.

In [2]:
def linear(x, w, b) -> Tensor:
    return x @ w + b

In [3]:
def kaiming_init(w):
    return w * np.sqrt(2 / w.shape[0])

**ReLU:**

$$f(x) = x^+ = \max(0, x)$$

In [4]:
def relu(x):
    return x.clamp_min(0.)

Sample inputs *x* with the outputs *y*:

In [5]:
x = torch.randn(200, 100)
y = torch.randn(200)

Two-layers weights and bias:

In [6]:
w1 = torch.randn(100, 50)
b1 = torch.randn(50)

w2 = torch.randn(50, 1)
b2 = torch.randn(1)

Initializing weights with Kaimming ["Delving Deep into Rectififers: Surpassing Human-Level Performance"](https://oreil.ly/-_quA)

In [7]:
w1 = kaiming_init(w1)
w2 = kaiming_init(w2)

In [8]:
for i in range(50): x = relu(x @ kaiming_init(torch.randn(100, 100)))
x.std()

tensor(0.4289)

Then, the result of the first layer is:

In [9]:
l1 = linear(x, w1, b1)
l1.shape

torch.Size([200, 50])

## Defining the model

In [10]:
def model(x):
    l1 = linear(x, w1, b1)
    l2 = relu(l1)
    l3 = linear(l2, w2, b2)
    return l3

In [11]:
out = model(x)
out.shape

torch.Size([200, 1])

## MSE Loss Fuction

$MSE = ({1} / {n}) \sum \limits _{i=1} ^ n (x_{i} - y_{i})^2$



In [25]:
def mse(output, targ):
    return (output.squeeze() - targ).pow(2).mean()

## Backward

In [13]:
y.shape, out.shape, out.squeeze().shape, y.shape

(torch.Size([200]), torch.Size([200, 1]), torch.Size([200]), torch.Size([200]))

In [14]:
def mse_grad(inputs, targets):
    inputs.g = 2. * (inputs.squeeze() - targets).unsqueeze(-1) / (1 / inputs.shape[0])

def lin_grad(inputs, outputs, w, b):
    inputs.g = w.t() @ outputs.g
    w.g = inputs.t() @ outputs.g
    b.g = outputs.g.sum(0)
    
def relu_grad(inputs, outputs):
    inputs.g = (inputs > 0).float() * outputs.g

In [20]:
x = torch.randn(1, 50)

In [24]:
x.shape, x.squeeze().shape

(torch.Size([1, 50]), torch.Size([50]))