**Question 4** Thinking geometrically.

Below, a dataset and its linear regression solution are provided. Let $c_1, c_2, c_3, c_4,$ and $c_5$ be the five columns in the feature matrix. Randomly sample five real values $u_1, u_2, u_3, u_4, u_5$ from a uniform distribution $(0, 1)$ and generate a new vector by using these values to linearly combine the aforementioned columns: $c = \sum_{i=1}^5 u_i c_i$. Calculate the inner product $c^\top (y-\hat{y})$, where $y$ is the training target vector and $\hat{y}$ is the predicted target vector. Why do you get such result?

Additionally, what is the value of $\hat{y}^\top (y-\hat{y})$? Explain why this value is what it is.

You may choose to verify your answer through coding. Note that very small values (e.g., $\leq 1e-10$) can be treated as zeros.


In [None]:
import numpy as np
import math
pi=math.pi
# generate 10 numbers from -1 to 1 with equal stepsize
x=np.linspace(-1,1,100)

# generate training target (noise contaminated!)
y=np.sin(2*pi*.5*x)+0.4*np.random.randn(x.size)

M = 5
basis = np.arange(M+1)
X = x[:, np.newaxis]**basis[np.newaxis, :]

w = np.linalg.solve(X.T@X, X.T@y)
yhat = X.dot(w)
r = y-yhat

**Question 5** Investigating Solution Sensitivity.

1. Determine the solution using the Singular Value Decomposition (SVD) of the data matrix and the training target $y$. You must **not** use any pre-existing API for this task. Ensure that your computations avoid any matrix-matrix products. Your final solution should match the one obtained using `np.linalg.lstsq()`.

**Note:** The order of computations is crucial in matrix calculations as it can significantly affect computational costs.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import math
pi=math.pi
# generate 10 numbers from -1 to 1 with equal stepsize
x=np.linspace(-1,1,50)
# generate training target (noise contaminated!)
y=np.sin(2*pi*.5*x)+0.4*np.random.randn(x.size)

M = 10
basis = np.arange(M+1)
X = x[:, np.newaxis]**basis[np.newaxis, :]

# https://numpy.org/doc/stable/reference/generated/numpy.linalg.svd.html
U, S, VT = np.linalg.svd(X)
print(U.shape, S.shape, VT.shape)

## TODO: Please fill the function to compute the linear regression solution
## on dataset (X, y) which is already provided
## the computation should NOT involve matrix-matrix product,
## matrix-vector product is fine
## and should NOT use any solution API like np.linalg.solve, etc.
def getbysvd():
  return theta

theta = getthetabysvd()
thetabylstsq = np.linalg.lstsq(X, y)[0]
print("if your computation is correct, here should print out true :: ", np.allclose(theta, thetabylstsq))


2. There are various definitions of sensitivity. In this exercise, 'sensitivity' refers to the extent to which a model's output is perturbed when noise is added to the input. How does sensitivity change as regularization increases? Please visualize the change in sensitivity as regularization increases using a separate figure.

3. **Without using an additional regularization term**, can you identify another method to reduce sensitivity? Hint: Review the documentation for the `lstsq` function (doc: [numpy.linalg.lstsq](https://numpy.org/doc/stable/reference/generated/numpy.linalg.lstsq.html#numpy.linalg.lstsq)). Please empirically test if this method works and explain why.

4. **Relationship Between Sensitivity and Overfitting**:
   - Do you think reducing sensitivity can always reduce overfitting?
   - Conversely, do you think reducing overfitting can always reduce sensitivity?
   
   Please share your thoughts and reasoning on these questions.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import math
pi=math.pi
# generate 10 numbers from -1 to 1 with equal stepsize
x=np.linspace(-1,1,10)

# generate training target (noise contaminated!)
y=np.sin(2*pi*.5*x)+0.4*np.random.randn(x.size)

# define a validation set
xv=np.linspace(-1,1,20)
yv=np.sin(2*pi*.5*xv)+0.4*np.random.randn(xv.size)

In [None]:
def get_sensi(theta, xtilde):
  yhat_orig = xtilde@theta
  yhat_p = (xtilde+np.random.normal(0., 0.2, xtilde.shape))@theta
  return np.linalg.norm(yhat_p-yhat_orig)/np.linalg.norm(yhat_orig)

np.random.seed(0)
error_val= []
error_train = []
sensi = []

# try increasing number of basis
reglist = [0.0, 0.00001, 0.0001, 0.001, 0.01, 0.1]
for reg in reglist:
  plt.figure()

  # always show the true function
  # plot the true function
  plt.plot(np.linspace(-1,1,50), np.sin(2*pi*.5*np.linspace(-1,1,50)), 'black', label='true')

  M = 30
  basis = np.arange(M+1)
  X = x[:, np.newaxis]**basis[np.newaxis, :]
  print(X.shape)

  theta = np.linalg.solve(X.T@X + reg*np.eye(X.shape[1]), X.T@y)
  yhat = X@theta
  # plot the fitted function
  plt.plot(x,yhat, label='fitted')

  # plot the train and validation data
  plt.plot(x,y,'ro', label='train')

  # show labels
  plt.legend()

  # compute val error and train error
  Xv = xv[:, np.newaxis]**basis[np.newaxis, :]
  yhat_val = Xv@theta
  error = np.sum((yv - yhat_val)**2)
  error_t = np.sum((y - yhat)**2)

  error_val.append(error)
  error_train.append(error_t)


plt.figure()
# TODO: add a plot to visualize how validation error changes as we increase basis

plt.plot(error_val, label='val-err')
plt.plot(error_train, label='train-err')
plt.legend()