The following program solves the simplest possible machine learning last:

solve $f(x) = Ax$ such that $f(1) = 1$

In [10]:
import torch
import torch.utils.data
import numpy as np

lr = 1.9 # learning rate
mom = 0.0 # momentum

class MyModel(torch.nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.A = torch.nn.Parameter(torch.zeros((1), requires_grad=True))
    def forward(self, input):
        output  = self.A * input
        return(output)

device = 'cpu'

input  = torch.Tensor([[1]])
target = torch.Tensor([[1]])

slope_dataset = torch.utils.data.TensorDataset(input,target)
train_loader  = torch.utils.data.DataLoader(slope_dataset,batch_size=1)

# create neural network according to model specification
net = MyModel().to(device) # CPU or GPU

# choose between SGD, Adam or other optimizer
optimizer = torch.optim.SGD(net.parameters(),lr=lr,momentum=mom)

epochs = 1000

for epoch in range(1, epochs):
    for batch_id, (data,target) in enumerate(train_loader):
        optimizer.zero_grad() # zero the gradients
        output = net(data)    # apply network
        loss = 0.5*torch.mean((output-target)*(output-target))
        if type(net.A.grad) == type(None):
            print('Ep%3d: zero_grad(): A.grad=  None  A.data=%7.4f loss=%7.4f' \
                      % (epoch, net.A.data, loss))
        else:
            print('Ep%3d: zero_grad(): A.grad=%7.4f A.data=%7.4f loss=%7.4f' \
                      % (epoch, net.A.grad, net.A.data, loss))
        loss.backward()       # compute gradients
        optimizer.step()      # update weights
        print('            step(): A.grad=%7.4f A.data=%7.4f' \
                      % (net.A.grad, net.A.data))
        if loss < 0.000000001 or np.isnan(loss.data):
            exit(0)

Ep  1: zero_grad(): A.grad=  None  A.data= 0.0000 loss= 0.5000
            step(): A.grad=-1.0000 A.data= 1.9000
Ep  2: zero_grad(): A.grad= 0.0000 A.data= 1.9000 loss= 0.4050
            step(): A.grad= 0.9000 A.data= 0.1900
Ep  3: zero_grad(): A.grad= 0.0000 A.data= 0.1900 loss= 0.3280
            step(): A.grad=-0.8100 A.data= 1.7290
Ep  4: zero_grad(): A.grad= 0.0000 A.data= 1.7290 loss= 0.2657
            step(): A.grad= 0.7290 A.data= 0.3439
Ep  5: zero_grad(): A.grad= 0.0000 A.data= 0.3439 loss= 0.2152
            step(): A.grad=-0.6561 A.data= 1.5905
Ep  6: zero_grad(): A.grad= 0.0000 A.data= 1.5905 loss= 0.1743
            step(): A.grad= 0.5905 A.data= 0.4686
Ep  7: zero_grad(): A.grad= 0.0000 A.data= 0.4686 loss= 0.1412
            step(): A.grad=-0.5314 A.data= 1.4783
Ep  8: zero_grad(): A.grad= 0.0000 A.data= 1.4783 loss= 0.1144
            step(): A.grad= 0.4783 A.data= 0.5695
Ep  9: zero_grad(): A.grad= 0.0000 A.data= 0.5695 loss= 0.0927
            step(): A.grad=-0.430

### Question:
Run the code above and look at the output.

Change the learning rate `lr` to each of the following values by editing line 5 in the above code.
`0.01, 0.1, 0.5, 1.0, 1.5, 1.9, 2.0, 2.1`

Try running the code and describe what happens for each value of lr, in terms of the success and speed of the algorithm.

### Answer:
at `lr=0.01`, `A.data` converges to 1 at epoch 988

at `lr=0.1` , `A.data` converges to 1 at epoch 97

at `lr=0.5`, `A.data` converges to 1 at epoch 16

at `lr=1.0`, `A.data` converges to 1 at epoch  2

at `lr=1.5`, `A.data` converges to 1 at epoch 16. It looks like it overshoots substantially then oscillates around 1 for a few epochs before settling.

at `lr=1.9`, `A.data` converges to 1 at  epoch 97. similar pattern

at `lr=2.0`, `A.data` does not converge to 1. It consistently over (2) and under (0) shoots

at `lr=2.1`, `A.data`  does not converge to 1. It gets pretty wild. It explodes to infinity then changes to Nan

### Question 2:
Now keep the learning rate at `1.9`, but try each of the following values for momentum by changing the value of mom on line 6.
`0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9`

For which value of momentum is the task solved in the fewest epochs?

What happens when the momentum is `1.0`? What happens when it is `1.1`?

### Answer:
at `mom=0.1`, it converges to 1 at epoch 25

at `mom=0.2`, it converges to 1 at epoch 14

at `mom=0.3`, it converges to 1 at epoch 13

at `mom=0.4`, it converges to 1 at epoch 24

at `mom=0.5`, it converges to 1 at epoch 30

at `mom=0.6`, it converges to 1 at epoch 37

at `mom=0.7`, it converges to 1 at epoch 59

at `mom=0.8`, it converges to 1 at epoch 92

at `mom=0.9`, it converges at epoch 190

at `mom=1.0` it does not converge

at `mom=1.1` it explodes