## Stochastic Gradient Descent with $L_2$ Regularization 

$ \theta = \begin{bmatrix} 
          w  \\
          b 
     \end{bmatrix} $

$X = \begin{bmatrix} 
          x_1 & 1\\ 
          x_2 & 1\\
          \cdots & \cdots \\
          x_n & 1 
     \end{bmatrix} $

$ \hat{y} = x \theta $  

$J(\theta) = \frac{1}{2n} \sum ( \hat{y}_{test} - y_{test} )^2 + \alpha w^T w$ 

Note that, in regularization we only penalize the weights and not the bias. 

$J(\theta) = \frac{1}{2n} \sum ( x_i \theta - y_{test} )^2 + \alpha w^T w$

$\nabla_{\theta} J(\theta) = -\frac{1}{n} X^T.[X \theta - Y_{test} ]$

$\theta = \theta - \alpha \nabla_{\theta} J(\theta)$

$\theta = \theta + \frac{\alpha}{n} X^T.[X \theta - Y_{test}]$

In [3]:
def linear_regression (X_tr, y_tr):
    n,m = X_tr.shape
    X = np.zeros([1,m])
    for i in range(n):
        X += (X_tr[i,:]) * y_tr[i]
    A = np.dot(X_tr.transpose(), X_tr)
    w = np.linalg.solve(A,X.transpose())
    return w
    
def train_age_regressor ():
    # Load data
    X_tr = np.load("age_regression_Xtr.npy")
    n = X_tr.shape[0]
    X_tr = X_tr.reshape((n,-1))
    ytr = np.load("age_regression_ytr.npy")
    
    X_te = np.load("age_regression_Xte.npy")
    m = X_te.shape[0]
    X_te = X_te.reshape((m,-1))
    
    yte = np.load("age_regression_yte.npy")

    w = linear_regression(X_tr, ytr)
    return w
    # Report fMSE cost on the training and testing data (separately)
    # ...


In [2]:
w = train_age_regressor ()

In [3]:
w

array([[  6.98588537],
       [ -1.0178158 ],
       [-15.19684493],
       ...,
       [  4.52892776],
       [-11.65211279],
       [  7.92118149]])

In [7]:
X_tr = np.load("age_regression_Xtr.npy")
n = X_tr.shape[0]
X_tr = X_tr.reshape((n,-1))
ytr = np.load("age_regression_ytr.npy")

X_te = np.load("age_regression_Xte.npy")
m = X_te.shape[0]
X_te = X_te.reshape((m,-1))
yte = np.load("age_regression_yte.npy")

fMSE = 0
for i in range(n):
    fMSE += (1/(2*n))*(np.dot(X_tr[i,:],w)- ytr[i])*(np.dot(X_tr[i,:],w)- ytr[i]) 
    
print("Training Data fMSE = ", fMSE[0])

fMSE = 0
for i in range(m):
    fMSE += (1/(2*m))*(np.dot(X_te[i,:],w)- yte[i])*(np.dot(X_te[i,:],w)- yte[i]) 
    
print("Testing Data fMSE = ", fMSE[0])

Training Data fMSE =  50.46549587657664
Testing Data fMSE =  269.19936735382396
