# Solving Linear Regression 

- Only Training set, No Test points

In [1]:
import numpy as np
from datetime import datetime
import pickle

In [2]:
# load
with open('data.pickle', 'rb') as f:
    data_load = pickle.load(f)

In [3]:
X, y = data_load

In [4]:
print(X.shape, y.shape)

(3360, 4) (3360,)


### Analytic Solution

Let $X \in \mathbb{R}^{N \times d}$ be the ***design matrix*** of the data,
that is, the $i$ th **row vector** of $X$ is $\hat{x^i} = (1, x^i)$.

Let $y \in \mathbb{R}^N$ be the **row vector** consisting of labels of data.

Then, the loss function $L(w)$ can be written as the following vector notation:
$$L(w) = \frac{1}{2N}\sum_{i=1}^N (y_i- w x_i^\top)^2=\frac{1}{2N}(y - w X^\top) (y - wX^\top)^\top.$$

Since the loss function is convex w.r.t $w$, we can find the minimum by differentiating the function w.r.t $w$.

$$\nabla_{w} L(w) = \frac{1}{N} (-yX + wX^\top X)$$

Therefore, if $X^\top X$ is invertible, the analytic optimal solution is
$$ \hat{w} = yX(X^\top X)^{-1}. $$

```np.linalg.solve(A,b)```

- It finds a solution x for the linear equation Ax = b.
- Here, x is considered to be a column vector. Thus we take transpose to the equation above.

$$\nabla_{w} L(w) = 0 \quad \Leftrightarrow \quad w(X^\top X)=yX \quad \Leftrightarrow \quad (X^\top X)w^\top = X^\top y^\top$$

In [5]:
w = np.linalg.solve(X.T @ X, X.T @ y.T)

print(w)

[ 0.55513795 -4.01374667  3.66918128 -6.52875621]


### Wrong Answer! Why?

- We consider each row vector to have $1$ in the first entry.

In [6]:
Z = np.zeros((X.shape[0], X.shape[1]+1))

In [7]:
for i in range(X.shape[0]):
    temp = list(X[i])
    temp.insert(0, 1)  #List.insert(index, value) index에 value값을 넣기
    Z[i] = np.array(temp)

In [8]:
w = np.linalg.solve(Z.T@Z, Z.T@y.T)

print(w)

[10.04876171  3.00710683 -4.01374667  0.99925961 -6.99523963]


### Gradient Descent Method

In [9]:
INPUT_DIM=4
OUTPUT_DIM=1

In [10]:
def forward(X, weights):
    pred = np.matmul(weights, X.T)
    return pred

def MSE(X, y, pred):
    N = X.shape[0]
    loss = np.sum((pred-y)**2) / (2*N)
    return loss

def compute_grads(X, y, pred):
    N     = X.shape[0]
    grads = (1/N)*(-np.matmul(y,X) + np.matmul(pred, X))
    return grads

def update_weights(weights, grads, LR):
    weights -= LR*grads
    return weights

In [11]:
BATCH_SIZE= 30
EPOCHS=100
LR = 0.001

weights= np.random.randn(OUTPUT_DIM,INPUT_DIM+1)

In [12]:
start = datetime.now()

for epoch in range(EPOCHS):
    
    # Shuffle Data
    idx = np.random.permutation(Z.shape[0])
    x_temp = Z[idx]
    y_temp = y[idx]
    
    for batch in range(Z.shape[0]//BATCH_SIZE):
        batch_X = x_temp[batch*BATCH_SIZE:(batch+1)*BATCH_SIZE]
        batch_y = y_temp[batch*BATCH_SIZE:(batch+1)*BATCH_SIZE].reshape(1,-1)
        
        pred  = forward(batch_X, weights)
        loss  = MSE(batch_X, batch_y, pred)
        grads = compute_grads(batch_X, batch_y, pred)
        
        weights = update_weights(weights, grads, LR)
    
    print('EPOCH %d Completed, Loss: %.3f' % (epoch+1, loss))
    
end = datetime.now()
print('Total time:', end-start)

EPOCH 1 Completed, Loss: 61.203
EPOCH 2 Completed, Loss: 46.396
EPOCH 3 Completed, Loss: 48.768
EPOCH 4 Completed, Loss: 37.166
EPOCH 5 Completed, Loss: 22.763
EPOCH 6 Completed, Loss: 31.793
EPOCH 7 Completed, Loss: 19.022
EPOCH 8 Completed, Loss: 16.954
EPOCH 9 Completed, Loss: 26.537
EPOCH 10 Completed, Loss: 16.139
EPOCH 11 Completed, Loss: 8.386
EPOCH 12 Completed, Loss: 18.828
EPOCH 13 Completed, Loss: 10.373
EPOCH 14 Completed, Loss: 5.088
EPOCH 15 Completed, Loss: 11.323
EPOCH 16 Completed, Loss: 11.857
EPOCH 17 Completed, Loss: 9.606
EPOCH 18 Completed, Loss: 6.893
EPOCH 19 Completed, Loss: 4.001
EPOCH 20 Completed, Loss: 7.441
EPOCH 21 Completed, Loss: 6.911
EPOCH 22 Completed, Loss: 5.818
EPOCH 23 Completed, Loss: 5.314
EPOCH 24 Completed, Loss: 6.947
EPOCH 25 Completed, Loss: 3.857
EPOCH 26 Completed, Loss: 2.711
EPOCH 27 Completed, Loss: 4.845
EPOCH 28 Completed, Loss: 4.719
EPOCH 29 Completed, Loss: 3.115
EPOCH 30 Completed, Loss: 5.426
EPOCH 31 Completed, Loss: 4.690
EPO

In [13]:
weights

array([[ 9.92044363,  2.97188297, -4.01358441,  1.03781138, -6.98832044]])

In [14]:
w

array([10.04876171,  3.00710683, -4.01374667,  0.99925961, -6.99523963])