# Solving Linear Regression 

- Only Training set, No Test points

In [1]:
import numpy as np
from datetime import datetime
import pickle

In [2]:
# load
with open('data.pickle', 'rb') as f:
    data_load = pickle.load(f)

In [3]:
X, y = data_load

In [4]:
print(X.shape, y.shape)

(3360, 4) (3360,)


### Analytic Solution

Let $X \in \mathbb{R}^{N \times d}$ be the ***design matrix*** of the data,
that is, the $i$ th **row vector** of $X$ is $\hat{x^i} = (1, x^i)$.

Let $y \in \mathbb{R}^N$ be the **row vector** consisting of labels of data.

Then, the loss function $L(w)$ can be written as the following vector notation:
$$L(w) = \frac{1}{2N}\sum_{i=1}^N (y_i- w x_i^\top)^2=\frac{1}{2N}(y - w X^\top) (y - wX^\top)^\top.$$

Since the loss function is convex w.r.t $w$, we can find the minimum by differentiating the function w.r.t $w$.

$$\nabla_{w} L(w) = \frac{1}{N} (-yX + wX^\top X)$$

Therefore, if $X^\top X$ is invertible, the analytic optimal solution is
$$ \hat{w} = yX(X^\top X)^{-1}. $$

```np.linalg.solve(A,b)```

- It finds a solution x for the linear equation Ax = b.
- Here, x is considered to be a column vector. Thus we take transpose to the equation above.

$$\nabla_{w} L(w) = 0 \quad \Leftrightarrow \quad w(X^\top X)=yX \quad \Leftrightarrow \quad (X^\top X)w^\top = X^\top y^\top$$

In [5]:
w = np.linalg.solve(X.T @ X, X.T @ y.T)

print(w)

[ 0.55513795 -4.01374667  3.66918128 -6.52875621]


### Wrong Answer! Why?

- We consider each row vector to have $1$ in the first entry.

In [5]:
Z = np.zeros((X.shape[0], X.shape[1]+1))

In [6]:
for i in range(X.shape[0]):
    temp = list(X[i])
    temp.insert(0, 1)  #List.insert(index, value) index에 value값을 넣기
    Z[i] = np.array(temp)

In [7]:
w = np.linalg.solve(Z.T@Z, Z.T@y.T)

print(w)

[ 9.91055003  3.00048444 -4.01521986  1.00538649 -6.98619243]


### Gradient Descent Method

In [8]:
INPUT_DIM=4
OUTPUT_DIM=1

In [9]:
def forward(X, weights):
    pred = np.matmul(weights, X.T)
    return pred

def MSE(X, y, pred):
    N = X.shape[0]
    loss = np.sum((pred-y)**2) / (2*N)
    return loss

def compute_grads(X, y, pred):
    N     = X.shape[0]
    grads = (1/N)*(-np.matmul(y,X) + np.matmul(pred, X))
    return grads

def update_weights(weights, grads, LR):
    weights -= LR*grads
    return weights

In [10]:
BATCH_SIZE= 30
EPOCHS=100
LR = 0.001

weights= np.random.randn(OUTPUT_DIM,INPUT_DIM+1)

In [11]:
start = datetime.now()

for epoch in range(EPOCHS):
    
    # Shuffle Data
    idx = np.random.permutation(Z.shape[0])
    x_temp = Z[idx]
    y_temp = y[idx]
    
    for batch in range(Z.shape[0]//BATCH_SIZE):
        batch_X = x_temp[batch*BATCH_SIZE:(batch+1)*BATCH_SIZE]
        batch_y = y_temp[batch*BATCH_SIZE:(batch+1)*BATCH_SIZE].reshape(1,-1)
        
        pred  = forward(batch_X, weights)
        loss  = MSE(batch_X, batch_y, pred)
        grads = compute_grads(batch_X, batch_y, pred)
        
        weights = update_weights(weights, grads, LR)
    
    print('EPOCH %d Completed, Loss: %.3f' % (epoch+1, loss))
    
end = datetime.now()
print('Total time:', end-start)

EPOCH 1 Completed, Loss: 90.371
EPOCH 2 Completed, Loss: 46.376
EPOCH 3 Completed, Loss: 32.512
EPOCH 4 Completed, Loss: 28.221
EPOCH 5 Completed, Loss: 23.235
EPOCH 6 Completed, Loss: 27.845
EPOCH 7 Completed, Loss: 20.300
EPOCH 8 Completed, Loss: 16.903
EPOCH 9 Completed, Loss: 22.315
EPOCH 10 Completed, Loss: 12.602
EPOCH 11 Completed, Loss: 8.973
EPOCH 12 Completed, Loss: 11.571
EPOCH 13 Completed, Loss: 6.837
EPOCH 14 Completed, Loss: 6.188
EPOCH 15 Completed, Loss: 8.765
EPOCH 16 Completed, Loss: 8.456
EPOCH 17 Completed, Loss: 8.182
EPOCH 18 Completed, Loss: 10.059
EPOCH 19 Completed, Loss: 6.709
EPOCH 20 Completed, Loss: 12.118
EPOCH 21 Completed, Loss: 3.810
EPOCH 22 Completed, Loss: 7.308
EPOCH 23 Completed, Loss: 7.195
EPOCH 24 Completed, Loss: 6.539
EPOCH 25 Completed, Loss: 6.599
EPOCH 26 Completed, Loss: 8.472
EPOCH 27 Completed, Loss: 4.260
EPOCH 28 Completed, Loss: 4.488
EPOCH 29 Completed, Loss: 4.201
EPOCH 30 Completed, Loss: 4.346
EPOCH 31 Completed, Loss: 4.099
EPOC

In [12]:
weights

array([[ 9.77692512,  2.96445189, -4.01457452,  1.04649802, -6.98131819]])

In [13]:
w

array([ 9.91055003,  3.00048444, -4.01521986,  1.00538649, -6.98619243])