# Concise Implementation of Linear Regression

In the last lecture, we showed you
how to build a linear regression model from scratch.

While In practice, there are more simple ways, since
data iterators, loss functions, optimizers,
and neural network layers
are common used in a variety of deep learning models.

Hence, In this lecture, we will show you how to implement 
the linear regression model from concisely by using PyTorch.

In [None]:
from d2l import torch as d2l
import numpy as np
import torch
from torch.utils import data

## Generating the Dataset

To start, we will generate the same dataset as last lecture.

In [None]:
true_w = torch.tensor([2, -3.4])
true_b = 4.2
features, labels = d2l.synthetic_data(true_w, true_b, 1000)

## Reading the Dataset

Rather than rolling our own iterator,
we can call upon the existing API in a framework to read data.
We pass in `features` and `labels` as arguments and specify `batch_size`
when instantiating a data iterator object.
Besides, the boolean value `is_train`
indicates whether or not
we want the data iterator object to shuffle the data
on each epoch (pass through the dataset).

In [None]:
def load_array(data_arrays, batch_size, is_train=True):  #@save
    """Construct a PyTorch data iterator."""
    dataset = data.TensorDataset(*data_arrays)
    return data.DataLoader(dataset, batch_size, shuffle=is_train)

In [None]:
batch_size = 10
data_iter = load_array((features, labels), batch_size)

Now we can use `data_iter` in much the same way as we called
the `data_iter` function in :numref:`sec_linear_scratch`.
To verify that it is working, we can read and print
the first minibatch of examples.
Comparing with :numref:`sec_linear_scratch`,
here we use `iter` to construct a Python iterator and use `next` to obtain the first item from the iterator.

In [None]:
next(iter(data_iter))

In PyTorch, the fully-connected layer is defined in the Linear class. 

Note that we passed two arguments into nn.Linear. The first one specifies the input feature dimension, which is 2, and the second one is the output feature dimension, which is a single scalar and therefore 1.

`nn` is an abbreviation for neural networks.

In [None]:
from torch import nn
net = nn.Sequential(nn.Linear(2, 1))

## Initializing Model Parameters

As we have specified the input and output dimensions when constructing nn.Linear. Now we access the parameters directly to specify their initial values. We first locate the layer by net[0], which is the first layer in the network, and then use the weight.data and bias.data methods to access the parameters. Next we use the replace methods normal_ and fill_ to overwrite parameter values.

In [None]:
net[0].weight.data.normal_(0, 0.01)
net[0].bias.data.fill_(0)

## Defining the Loss Function

The MSELoss class computes the mean squared error, also known as squared $𝐿_2$ norm. By default it returns the average loss over examples.

In [None]:
loss = nn.MSELoss()

## Defining the Optimization Algorithm

Minibatch stochastic gradient descent is a standard tool for optimizing neural networks and thus PyTorch supports it alongside a number of variations on this algorithm in the optim module. When we instantiate an SGD instance, we will specify the parameters to optimize over (obtainable from our net via net.parameters()), with a dictionary of hyperparameters required by our optimization algorithm. Minibatch stochastic gradient descent just requires that we set the value lr, which is set to 0.03 here.

In [None]:
trainer = torch.optim.SGD(net.parameters(), lr=0.03)

## Training

You might have noticed that expressing our model through high-level APIs of a deep learning framework requires comparatively few lines of code. We did not have to individually allocate parameters, define our loss function, or implement minibatch stochastic gradient descent. Once we start working with much more complex models, advantages of high-level APIs will grow considerably. However, once we have all the basic pieces in place, the training loop itself is strikingly similar to what we did when implementing everything from scratch.

To refresh your memory: for some number of epochs, we will make a complete pass over the dataset (train_data), iteratively grabbing one minibatch of inputs and the corresponding ground-truth labels. For each minibatch, we go through the following ritual:

- Generate predictions by calling net(X) and calculate the loss l (the forward propagation).

- Calculate gradients by running the backpropagation.

- Update the model parameters by invoking our optimizer.

For good measure, we compute the loss after each epoch and print it to monitor progress.

In [None]:
num_epochs = 3
for epoch in range(num_epochs):
    for X, y in data_iter:
        l = loss(net(X) ,y)
        trainer.zero_grad()
        l.backward()
        trainer.step()
    l = loss(net(features), labels)
    print(f'epoch {epoch + 1}, loss {l:f}')

Below, we compare the model parameters learned by training on finite data and the actual parameters that generated our dataset. 

To access parameters, we first access the layer that we need from net and then access that layer’s weights and bias. As in our from-scratch implementation, note that our estimated parameters are close to their ground-truth counterparts.

In [None]:
w = net[0].weight.data
print('error in estimating w:', true_w - w.reshape(true_w.shape))
b = net[0].bias.data
print('error in estimating b:', true_b - b)