# Linear Regression Example

In this example, we create a training set and find a linear model with the best fit to the set. First, we generate a training set of $N=1000$ points $\{(x_i,y_i)\}_{i=1}^N$ where $x_i$ are independent and uniformly distributed in $[0,1]$ and $y_i=2\|x_i\|^2+\epsilon_i$ for $\epsilon_i$ are uniformly distributed in $[-1,1]$.

In [1]:
from matplotlib import pyplot as plt
import numpy as np

N = 1000

x_train = np.random.rand(N, 2)
eps = 2*(np.random.rand(N) - .5)
y_train = 2*x_train[:,0]**2 + 2*x_train[:,1]**2 + eps

Next, we use the $sklearn$ package to find the best linear model for the training set. The output $\hat{\beta}$ is the vector which will be used in the linear model $f(x)=\langle{x},\hat{\beta}\rangle$. 
By re-generating the training set and run this code multiple times, we can observe the changes on $\hat{\beta}$ depending on the training data.

In [2]:
from sklearn.linear_model import LinearRegression

model = LinearRegression(fit_intercept=False)
model.fit(x_train, y_train)
beta_hat = model.coef_

print("beta_hat = {}".format(beta_hat))

beta_hat = [1.38519552 1.45869766]


We also have an explicit formula for $\hat{\beta}$ based on least squares: $\hat{\beta}=(D^TD)^{-1}D^Ty$. Using this formula, we compute $\hat{\beta}$ and verify that it matches the ouput from $sklearn$ package.

In [4]:
beta_hat_formula = np.matmul(np.matmul(np.linalg.inv(np.matmul(x_train.transpose(), x_train)),x_train.transpose()),y_train)

print("beta_hat computed using formula = {}".format(beta_hat_formula))

beta_hat computed using formula = [1.38519552 1.45869766]
