## Regression two ways

In this problem, you are going to solve a simple linear regression problem. In the cells below, I have some code to generate data from a random mode 

$$ y = c_0 + c_1 x_1 + c_2 x_2 + c_3 x_3 + 0.5 * \eta $$

In this case, you know the ground truth generating model and we are generating some data out of it. Your job is to start with the data, and see if you can recover the generating linear coefficients. You will do this two ways.

1) Use the Moore Penrose inverse. See the notes for exactly what this is.

2) Use the SVD approach.

Both should yield the same coefficients up to numerical error and the should be quite close to the generating coeffecients.




In [3]:
using Pkg
using LinearAlgebra
using Plots
using Statistics

In [9]:
x1, x2, x3= rand(100), rand(100), rand(100)
x0 = ones(size(x1))
c0, c1, c2, c3 = 4.0, 1.5 , -2.0 , 5.1;
y = c0 .+ c1.*x1 + c2.*x2 +c3.*x3 + 0.5.*randn(100) # Note the randn here. The noise should be normal!!!

100-element Vector{Float64}:
 9.316182602054843
 5.970141961203163
 8.675811421307934
 4.266017022771447
 4.984635561722619
 9.633830921165726
 3.0810933187788043
 7.070405181961871
 3.1740923381534913
 6.614419603262718
 ⋮
 4.0119533808432895
 2.7398273025172513
 6.636907337278942
 6.65362941956435
 8.162224115886197
 7.7857883406149195
 2.870148587810135
 8.178610832824315
 3.07086680725207

In [10]:
# Construct a X matrix

X = hcat(x0, x1, x2, x3)


# Moore-Penrose inverse
X_plus = pinv(X)

# Coeffiecient vector
c_hat_mp = X_plus * y

println("Coefficients from Moore-Penrose Inverse:")
println(c_hat_mp)


Coefficients from Moore-Penrose Inverse:
[3.845779135870971, 1.6729158562565316, -2.049170884991899, 5.365147695468075]


In [11]:
# SVD solution
U, S, V = svd(X)
c_hat_svd = V * (Diagonal(1 ./ S) * (U' * y))

println("Coefficients from SVD:")
println(c_hat_svd)


Coefficients from SVD:
[3.8457791358709716, 1.6729158562565312, -2.0491708849918995, 5.365147695468075]
