## 3.1. Linear Regression

*Studying and coding along with the printed book __„Dive into Deep Learning“__ by Aston Zhang, Zachary C. Lipton, Mu Li & Alexander J. Smola. The accompanying website for the chapter Linear Neural Networks for Regression > Linear Regression can be found at [d2l.ai](https://d2l.ai/chapter_linear-regression/linear-regression.html).*

__Solving a regression problem very much means predicting a numerical value.__

Common examples:

- Predicting prices of homes or stocks
- Predicting the length of stays for patients in a hospital
- Forecasting demand for retail sales

There are also prediction problems that are classification problems with the goal of predicting membership among a set of categories.

__Let's use the following scenario as an example:__ we want to estimate the *prices of houses* (in US\\$) based on their *size/area* in square feet and their *age* in years. 

Our goal is to __develop a model for predicting house prices__.

- The first step is to generate a *dataset* with sales prices, area and age for a number of homes.
- This dataset will be our *training dataset* or ***training set***
- Each row of the dataset is called an *example* (or *data point*, *instance*, *sample*).
- An example contais the data corresponding to one sale.
- What we're trying to predict is the sales price which in technical terminology is called a *label* (or ***target***).
- Variables (in our case age and area) upon which the predictions are based are called ***features*** (or *covariates*).

In [1]:
%matplotlib inline
import math
import time
import numpy as np
import torch

### 3.1.1. Basics

Some basic assumptions we make in *linear regression*:

1. We assume that the relationship between features $\mathbf{x}$ and target $y$ is approximately linear,
   - The conditional mean $E[Y \mid X=\mathbf{x}]$ can be expressed as a weighted sum of the features $\mathbf{x}$
   - The target value might deviate from its expected value on account of observation noise
   - The assumption that any such noise is well behaved, following a Gaussian distribution
2. $n$ will be used to denote the number of examples in our dataset
   - Superscripts will be used to enumerate samples and targets: $\mathbf{x}^{(i)}$ denotes the $i^{\textrm{th}}$ sample
   - Subscripts will be used to index coordinates: $x_j^{(i)}$ denotes its $j^{\textrm{th}}$ coordinate

*(These notes are taken form the printed book and the GitHup repo [d2l-ai/d2l-en](https://github.com/d2l-ai/d2l-en); mathematical notation is copied from the github repo as I still have to figure out how to write and read it correctly)*

### 3.1.2. Vectorization for Speed

When training our models, we typically want to process whole minibatches of examples simultaneously. 
Doing this efficiently requires that we vectorize the calculations and leverage fast linear algebra libraries rather than writing costly for-loops in Python.

To see why this matters so much, let’s consider two methods for adding vectors. To start, we instantiate two 10,000-dimensional vectors containing all 1s. In the first method, we loop over the vectors with a Python for-loop. In the second, we rely on a single call to +.

In [7]:
n = 1000
a = torch.ones(n)
b = torch.ones(n)
print(a.shape)
print(b.shape)

torch.Size([1000])
torch.Size([1000])


#### __Benchmarking the workloads__

In [10]:
# adding them using a for-loop, one coordinate at a time
c = torch.zeros(n)
t = time.time()
for i in range(n):
    c[i] = a[i] + b[i]
f'{time.time() - t:.5f} sec'

'0.12503 sec'

In [12]:
# vectorizing code, pushing the mathematics to the library
# computing the elementwise sum with the reloaded + operator
t = time.time()
d = a + b
f'{time.time() - t:.5f} sec'

'0.00100 sec'