# Concise Implementation of Linear Regression

## Generating the Dataset

In [1]:
from d2l import torch as d2l
import numpy as np
import torch
from torch.utils import data

true_w = torch.tensor([2, -3.4])
true_b = 4.2
features, labels = d2l.synthetic_data(true_w, true_b, 1000)

## Reading the Dataset

In [2]:
def load_array(data_arrays, batch_size, is_train=True):
    """Construct a PyTorch data iterator."""
    dataset = data.TensorDataset(*data_arrays)
    return data.DataLoader(dataset, batch_size, shuffle=is_train)

batch_size = 10
data_iter = load_array((features, labels), batch_size)

In [3]:
next(iter(data_iter))

[tensor([[ 0.2866, -0.0803],
         [ 1.4815, -1.2206],
         [ 0.8566,  0.2019],
         [ 0.5517, -0.2840],
         [-1.0887, -0.3544],
         [ 0.7117, -1.2348],
         [-0.2045, -1.3529],
         [-0.6090, -0.9271],
         [ 0.6146,  0.1506],
         [ 0.0794, -0.4185]]),
 tensor([[ 5.0411],
         [11.3007],
         [ 5.2221],
         [ 6.2766],
         [ 3.2419],
         [ 9.8318],
         [ 8.4023],
         [ 6.1110],
         [ 4.9303],
         [ 5.7686]])]

## Defining the Model

Recall the architecture of a single-layer network.
The layer is said to be *fully-connected*
because each of its inputs is connected to each of its outputs
by means of a matrix-vector multiplication.

In PyTorch, the fully-connected layer is defined in the `Linear` class.

In [4]:
# `nn` is an abbreviation for neural networks
from torch import nn
net = nn.Sequential(nn.Linear(2, 1))

## Initializing Model Parameters

In [5]:
net[0].weight.data.normal_(0, 0.01)
net[0].bias.data.fill_(0)

tensor([0.])

## Defining the Loss Function

The `MSELoss` class computes the mean squared error, also known as squared $L_2$ norm.
By default it returns the average loss over examples.

In [6]:
loss = nn.MSELoss()

## Defining the Optimization Algorithm

Minibatch stochastic gradient descent is a standard tool
for optimizing neural networks
and thus PyTorch supports it alongside a number of
variations on this algorithm in the `optim` module.

In [7]:
trainer = torch.optim.SGD(net.parameters(), lr=0.03)

## Training

For each minibatch, we go through the following ritual:

* Generate predictions by calling `net(X)` and calculate the loss `l` (the forward propagation).
* Calculate gradients by running the backpropagation.
* Update the model parameters by invoking our optimizer.

In [8]:
num_epochs = 3
for epoch in range(num_epochs):
    for X, y in data_iter:
        l = loss(net(X), y)
        trainer.zero_grad()
        l.backward()
        trainer.step()
    l = loss(net(features), labels)
    print(f'epoch {epoch + 1}, loss {l:f}')

epoch 1, loss 0.000162
epoch 2, loss 0.000096
epoch 3, loss 0.000096


In [9]:
w = net[0].weight.data
print('error in estimating w:', true_w - w.reshape(true_w.shape))
b = net[0].bias.data
print('error in estimating b:', true_b - b)

error in estimating w: tensor([0.0009, 0.0010])
error in estimating b: tensor([-0.0002])


## Summary

* Using PyTorch's high-level APIs, we can implement models much more concisely.
* In PyTorch, the `data` module provides tools for data processing, the `nn` module defines a large number of neural network layers and common loss functions.
* We can initialize the parameters by replacing their values with methods ending with `_`.

## Exercises

1. If we replace `nn.MSELoss(reduction='sum')` with `nn.MSELoss()`, how can we change the learning rate for the code to behave identically. Why?

In [10]:
loss = nn.MSELoss(reduction='sum')
trainer = torch.optim.SGD(net.parameters(), lr=0.03/batch_size)

num_epochs = 3
for epoch in range(num_epochs):
    for X, y in data_iter:
        l = loss(net(X), y)
        trainer.zero_grad()
        l.backward()
        trainer.step()
    l = loss(net(features), labels) / len(features)
    print(f'epoch {epoch + 1}, loss {l:f}')

epoch 1, loss 0.000095
epoch 2, loss 0.000097
epoch 3, loss 0.000095


2. Review the PyTorch documentation to see what loss functions and initialization methods are provided. Replace the loss by Huber's loss.

https://pytorch.org/docs/stable/nn.html#loss-functions

In [11]:
loss = nn.SmoothL1Loss()
trainer = torch.optim.SGD(net.parameters(), lr=0.03)

num_epochs = 3
for epoch in range(num_epochs):
    for X, y in data_iter:
        l = loss(net(X), y)
        trainer.zero_grad()
        l.backward()
        trainer.step()
    l = loss(net(features), labels)
    print(f'epoch {epoch + 1}, loss {l:f}')

epoch 1, loss 0.000048
epoch 2, loss 0.000048
epoch 3, loss 0.000048


3. How do you access the gradient of `net[0].weight`?

In [12]:
net[0].weight.grad

tensor([[0.0052, 0.0032]])