# Problem 1 
In this proplem, you are requested to implement least square solution: $$\min_x \|Ax-y\|^2,$$ where $\textbf{A}$ is the input features while $y$ is the output. It is worth noticing that the first column of $\textbf{A}$ is a vector with all elements to be $1$, that is: $A=[1,B]$. The solution of least square is given by: $(A^TA)^{-1}A^Ty$

In [2]:
import numpy as np
import pandas as pd
import time

In [3]:
def my_least_squares(B, y):
    start_time = time.time()
    B = np.asmatrix(B)
    A = np.insert(B, 0, 1, axis=1)
    y = np.asmatrix(y)
    #beta = np.dot(np.linalg.inv(np.dot(A.T, A)),A.T)
    beta = (((A.T * A).I) * A.T)*y
    #beta = np.dot(beta, y)
    #print(A.shape, y.shape)
    end_time = time.time()
    print("Time usage in Linear reg: ", end_time-start_time)
    return beta

In [4]:
B=np.random.rand(100,20);
y=np.random.rand(100,1);
print(my_least_squares(B,y));

Time usage in Linear reg:  0.011585712432861328
[[ 0.58660254]
 [ 0.07599449]
 [ 0.06412193]
 [-0.01852613]
 [ 0.10746721]
 [ 0.12987661]
 [-0.07483704]
 [-0.18723742]
 [ 0.14083481]
 [-0.05235579]
 [-0.16468733]
 [-0.00620788]
 [ 0.1373255 ]
 [-0.24591243]
 [-0.0554224 ]
 [ 0.07222086]
 [ 0.0581393 ]
 [-0.09483842]
 [-0.12082309]
 [-0.01935497]
 [ 0.06489711]]


# Problem 2 
In this proplem, you are requested to implement least square solution by making use of gradient descent method: $$\min_x \|Ax-y\|^2,$$ where $\textbf{A}$ is the input features while $y$ is the output. What you should do is to first initialize (guess) $x_0$, then by making use of iterative updating: $x_{k+1}=x_{k}-\lambda*\Delta$, where $\Delta$ is the gradient of the objective function with respective to $x_k$. $\lambda$ is the so-called learning rate and should be set to be small to avoid gradient explosion. 

In [5]:
def my_gradient_descent(B, y):
    start_time = time.time()
    B = np.asmatrix(B)
    A = np.insert(B, 0, 1, axis=1)
    beta_size = A.shape[1]
    y = np.asmatrix(y)
    
    beta = np.asmatrix(np.random.rand(beta_size, 1))
    #print(beta)
    alpha = 0.0001
    K = 100000
    for i in range(K):
        beta = beta - alpha*((A.T * A)*beta - A.T*y)
    end_time = time.time()
    print("time usage in Gradient descent: ", end_time-start_time)
    return beta

In [6]:
B = np.array([[1,2,3],[2,2,4],[1,1,1],[3,2,2],[2,1,2]])
y = np.array([[7],[8],[3],[9],[8]])

print(my_gradient_descent(B,y))

time usage in Gradient descent:  6.598049163818359
[[0.94239683]
 [2.00217651]
 [0.04458847]
 [0.99129965]]


In [7]:
B=np.random.rand(400,100);
y=np.random.rand(400,1);
print(my_gradient_descent(B,y));

time usage in Gradient descent:  21.87859058380127
[[ 7.18734956e-01]
 [ 4.79323082e-02]
 [-7.54545047e-02]
 [-7.66280344e-02]
 [ 4.77233496e-02]
 [-4.70830994e-02]
 [-4.63583483e-02]
 [ 6.83793129e-03]
 [ 4.09486224e-02]
 [-5.01704739e-02]
 [ 4.99746593e-02]
 [ 5.15817923e-04]
 [-1.20304992e-01]
 [ 4.93834640e-02]
 [ 1.27080141e-01]
 [ 7.79226896e-03]
 [-5.63025783e-02]
 [ 5.75939670e-02]
 [-3.72663318e-02]
 [ 5.84704336e-02]
 [-6.62853174e-02]
 [ 4.10304472e-03]
 [-1.90681642e-02]
 [-4.00108648e-02]
 [-3.13399354e-02]
 [-6.12773741e-02]
 [ 1.17200202e-02]
 [ 3.17234723e-03]
 [-2.98686637e-02]
 [ 9.36161801e-03]
 [ 9.23787883e-02]
 [ 4.25734806e-02]
 [ 1.31346169e-01]
 [-9.21327285e-02]
 [-5.60025889e-02]
 [-1.07514952e-01]
 [ 4.02331455e-02]
 [ 9.15738819e-02]
 [-3.03638477e-03]
 [ 1.06035185e-01]
 [-2.22075715e-02]
 [-1.88964406e-02]
 [ 6.51900076e-02]
 [-2.87324281e-02]
 [-1.02044495e-02]
 [ 6.33184328e-03]
 [-5.40633164e-02]
 [-5.19917801e-02]
 [-6.07154905e-02]
 [ 6.92667947e-03]

![Q3_question](q3_question.jpg)

![Q3](q3.jpeg)