# `006` Weights inspection

Requirements: 005 Multilayer perceptron.

What makes training neural networks trickier is that an universal approximator will always make the best effort to solve the problem. This means that in many cases your code will appear correct, but most errors will be silent and only manifest as a low learning rate. This is a very well-known problem, and even the state-of-the-art networks released by deep learning masterminds such as Gemma from Google always ship with minor errors that the community realise later on.

This is the reason why we have to inspect every single piece of the network, pay attention and visualize as much as possible, and understand the very principles behind the technologies we are using. We have REALLY throrough, more than inspecting a bomb detonator. And probably we'll still introduce some errors.

Which is why it's important to know a few tools at our disposal to test the networks. We'll start from the example in the MLP notebook, and use the most typical tools for inspection. Note that this inspection is still very generic, as we're only used to the most basic networks.

Let's start by placing a redux version of the code from the MLP notebook:

In [10]:
from math import pi
from matplotlib import pyplot as plt
import torch

f = lambda x: torch.atan(x**3 - 2*x + 3) - 5

X = torch.arange(-pi * 2, pi * 2, 0.01)
X = X[torch.randperm(X.size(0))]
X = X.view(-1, 1)
Y = f(X)
Y = (Y - Y.mean()) / Y.std()
test_cut = int(X.size(0) * 0.8)
Xtr, Ytr = X[test_cut:], Y[test_cut:]
Xts, Yts = X[:test_cut], Y[:test_cut]

def get_model():
	W1 = torch.randn(1, 100) * 0.3
	b1 = torch.zeros(100)
	W2 = torch.randn(100, 100) * 0.3
	b2 = torch.zeros(100)
	W3 = torch.randn(100, 1) * 0.3
	b3 = torch.zeros(1)
	parameters = W1, b1, W2, b2, W3, b3
	for p in parameters: p.requires_grad = True
	return parameters

def forward(model, x):
	W1, b1, W2, b2, W3, b3 = model
	out = x @ W1 + b1
	out = out.relu()
	out = out @ W2 + b2
	out = out.relu()
	out = out @ W3 + b3
	return out

def train(model, step_size=0.001, iterations=1000):
	losses = []
	for _ in range(iterations):
		Ypred = forward(model, X)
		loss = ((Ypred - Y)**2).mean()
		for p in model: p.grad = None
		loss.backward()
		for p in model: p.data -= p.grad * step_size
		losses.append(loss.item())
	return losses

train(get_model())

[19.916460037231445,
 5.775614261627197,
 1.8617140054702759,
 0.6870419383049011,
 0.3319641649723053,
 0.22466222941875458,
 0.1920347809791565,
 0.18194711208343506,
 0.17869077622890472,
 0.17739583551883698,
 0.17666161060333252,
 0.17609041929244995,
 0.17556962370872498,
 0.17506682872772217,
 0.17457297444343567,
 0.17408545315265656,
 0.1736031472682953,
 0.17312560975551605,
 0.17265339195728302,
 0.17218616604804993,
 0.17172332108020782,
 0.171265110373497,
 0.17081178724765778,
 0.1703629195690155,
 0.1699187308549881,
 0.16947923600673676,
 0.16904385387897491,
 0.16861242055892944,
 0.1681855022907257,
 0.167763352394104,
 0.1673455834388733,
 0.16693159937858582,
 0.16652174293994904,
 0.16611605882644653,
 0.16571417450904846,
 0.16531608998775482,
 0.1649215966463089,
 0.1645294576883316,
 0.1641409695148468,
 0.16375625133514404,
 0.16337530314922333,
 0.1629980206489563,
 0.16262483596801758,
 0.16225552558898926,
 0.1618896871805191,
 0.16152755916118622,
 0.161169