<a href="https://colab.research.google.com/github/n-west/Wideband-RF-Signal-Detection-with-Machine-Learning/blob/main/2_Fit_quadratic.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import torch
import numpy as np
%matplotlib notebook
import matplotlib.pyplot as plt

Create a random process that we can make noisy observations of. It's a quadratic function with (psuedo-)random numbers as the parameters. Create a quadratic model with autograd parameters that we will fit to match the correct parameters.

In [None]:
torch.manual_seed(42)

class quadratic_model(torch.nn.Module):
    def __init__(self):
        super().__init__()
        torch.manual_seed(43)
        self.a = torch.nn.Parameter(torch.randn(()))
        self.b = torch.nn.Parameter(torch.randn(()))
        self.c = torch.nn.Parameter(torch.randn(()))

    def forward(self, x):
        return self.a * torch.pow(x, 2) + self.b * x + self.c

class observed_process():
    def __init__(self):
        torch.manual_seed(42)
        self.a = torch.randn(1)
        self.b = torch.randn(1)
        self.c = torch.randn(1)

    def sample(self, x):
        return self.a * torch.pow(x, 2) + self.b * x + self.c

    def sample_noisy(self, x):
        var = 0.001
        return (var*torch.randn(1) + self.a) * torch.pow(x, 2) + (var*torch.randn(1) + self.b) * x + (var*torch.randn(1) + self.c)

model = quadratic_model()
process = observed_process()

Check that the actual (the process we observe) parameters are different than our randomly-initialized model. We are going to train our model parameters to fit the noisy process.

In [None]:
print("Actual:")
print(process.a)
print(process.b)
print(process.c)

print("Before:")
print(model.a)
print(model.b)
print(model.c)


The following cell 

1. samples from the noisy process
2. runs a forward pass of the model
3. computes the loss (mean-squared error)
4. runs the backward pass
5. updates model parameters by using gradients

These steps repeat in a loop continuing to update the model. You can plot the errors for every step. Try changing the **learning rate** and **batch size**

In [None]:
error = []
lr = 0.001
for _ in range(20000):
    x = (torch.randn(1)*1).detach()
    y_target = process.sample_noisy(x)
    y_pred = model.forward(x)
    model.zero_grad()
    
    squared_error = torch.mean(torch.square(y_target - y_pred))
    
    squared_error.backward()
    error.append(squared_error.detach())
    
    model.a.data = model.a.data - lr*model.a.grad.data
    model.b.data = model.b.data - lr*model.b.grad.data
    model.c.data = model.c.data - lr*model.c.grad.data
    
print("After:")
print(model.a)
print(model.b)
print(model.c)

plt.figure()
plt.plot(error)

There are more sophisticated rules for using gradients to update each parameter. Deep learning toolboxes provide *optimizers* to provide a consistent API to apply these rules for you. AdamW is usually a very good safe choice. This is the same code that uses AdamW to train this quadratic model.

In [None]:
# Reinitialize the model to random parameters
model = quadratic_model()

In [None]:
optim = torch.optim.AdamW(model.parameters(), lr=1e-3)
error = []
for _ in range(10000):
    x = (torch.randn(1)*1).detach()
    y_target = process.sample_noisy(x)
    y_pred = model.forward(x)
    optim.zero_grad()
    
    squared_error = torch.mean(torch.square(y_target - y_pred))
    
    squared_error.backward()
    error.append(squared_error.detach())
    
    optim.step()
    
print("After:")
print(model.a)
print(model.b)
print(model.c)

plt.figure()
plt.plot(error)

A very common neural network primitive/layer type is the Linear layer. This is sometimes called the following names:
* fully-connected layer
* dense layer
* linear layer

It performs a matrix multiply follow by adding a bias. The number of parameters will be the input shape * output shape + output shape (if using bias). The following shows an example of a small linear layer to demonstrate what is happening.

In [None]:
torch.manual_seed(42)
# Parameters will now be  [0.4414,  0.4792, -0.1353] and 0.5304
layer_demo = torch.nn.Linear(in_features=3, out_features=1)
print(list(layer_demo.parameters()))

# The result should be (1 * .4414 + 2 * .4792 + 3 * -.1353) + 0.5304
print(layer_demo.forward(torch.tensor([[1.,2.,3.]])))
print((1 * .4414 + 2 * .4792 + 3 * -.1353) + 0.5304)

In deep learning we use multi-layer deep neural networks as a highly generalized model. In torch, these primitives are in torch.nn. The following 2-layer network can be fit to the quadratic function. Run the fit and observe how well it matches. Try changing the network features, activation function, batch size, learning rate, add layers, etc.

In [None]:
class nn_quad(torch.nn.Module):
  def __init__(self):
    super().__init__()
    self.l1 = torch.nn.Linear(in_features=1, out_features=5)
    self.l2 = torch.nn.Linear(in_features=5, out_features=1)

  def forward(self, x):
    x = torch.relu(self.l1(x))
    x = self.l2(x)
    return x

In [None]:
model = nn_quad()
optim = torch.optim.AdamW(model.parameters(), lr=1e-3)
error = []
for _ in range(50000):
    x = (torch.randn(64)*5).detach().unsqueeze(-1)
    y_target = process.sample_noisy(x)
    y_pred = model.forward(x)
    optim.zero_grad()
    
    squared_error = torch.mean(torch.square(y_target - y_pred))
    
    squared_error.backward()
    error.append(squared_error.detach())
    
    optim.step()
    
print("After:")

plt.figure()
plt.plot(error)



In [None]:
x_range = torch.autograd.Variable(torch.arange(-5, 5, .025).unsqueeze(-1), requires_grad=False)
y_pred = model(x_range)
y_actual = process.sample(x_range)

plt.figure()
plt.plot(x_range.detach(), y_pred.detach())
plt.plot(x_range.detach(), y_actual.detach())
plt.legend(["Inferred", "Actual"])