# Libraries

In [None]:
import numpy             as np
import matplotlib.pyplot as plt

from sklearn.datasets     import make_regression
from sklearn.linear_model import LinearRegression

# Math - Algebra

(Based on https://online.stat.psu.edu/stat462/node/132/ and https://www.geeksforgeeks.org/ml-normal-equation-in-linear-regression)

Linear algebra is the branch of mathematics concerning linear equations,
$$
a_{1}x_{1}+\cdots +a_{n}x_{n}=b,
$$
linear maps,
$$
(x_{1},\ldots ,x_{n})\mapsto a_{1}x_{1}+\cdots +a_{n}x_{n},
$$
and their representations in vector spaces and through matrices. Linear algebra is a key foundation to the field of machine learning, from the notations used to describe the equations and operation of algorithms to the efficient implementation of algorithms in code.

## 1. Motivational Example of Linear Regression

We first derive the linear regression model in matrix form. In linear regression, we fit a linear function to a dataset of $n$ data points $(x_i, y_i)$. The linear model is given by
$$
y(x) = \beta_0 + \beta_1 x.
$$

Linear regression desscribes the data by minimizing the least squares deviation between the data and the linear model:
$$
y_i = \beta_0 + \beta_1 x_i + \epsilon _i, \, \text{for }i = 1, \dots , n.
$$
Here the $\epsilon_i$ describes the deviation between the model and data and are assumed to be Gaussian distributed.

Writing out the set of equations for $i = 1, \dots, n$, we obtain $n$ equations:
$$
y_1 = \beta_0 + \beta_1 x_1 + \epsilon _1 \\
y_2 = \beta_0 + \beta_1 x_2 + \epsilon _2 \\
\vdots \\
y_n = \beta_0 + \beta_1 x_n + \epsilon _n \\
$$

We can formulate the above simple linear regression function in matrix notation:
$$
\begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} =
\begin{bmatrix}
   1 & x_1 \\
   1 & x_2 \\
   \vdots \\
   1 & x_n
\end{bmatrix}
\begin{bmatrix}
   \beta_0 \\
   \beta_1
\end{bmatrix} +
\begin{bmatrix}
   \epsilon_1 \\
   \epsilon_2 \\
   \vdots \\
   \epsilon_n
\end{bmatrix}.
$$

We can write this matrix equation in a more compact form
$$
\mathbf{Y} = \mathbf{X}\, \mathbf{\beta} + \mathbf{\epsilon},
$$
where
- $\mathbf{X}$ is an $n \times 2$ matrix.
- $\mathbf{Y}$ is an $n \times 1$ column vector
- $\mathbf{\beta}$ is a $2 \times 1$ column vector
- $\mathbf{\epsilon}$ is an $n \times 1$ column vector.

The matrix $\mathbf{X}$ and vector $\mathbf{\beta}$ are multiplied together using the techniques of matrix multiplication.
And, the vector $\mathbf{X} \mathbf{\beta}$ is added to the vector $\mathbf{\epsilon}$ using the techniques of matrix addition.

Let's quickly review matrix algebra, the subject of mathematics that deals with operations of matrices, vectors, and tensors.

## 2. Least Squares Estimates of Linear Regression Coefficients

As we will discuss later, minimizing the mean squared error of model prediction and data leads to the following equation for the coefficient vector ${\bf \beta}$:
$$
\mathbf{\beta} = \begin{bmatrix} \beta_0 \\ \vdots \\ \beta_k \end{bmatrix}
= ( \mathbf{X}^\mathrm{T} \mathbf{X} )^{-1}\, \mathbf{X}^\mathrm{T}\, \mathbf{Y},
$$
where
- $( \mathbf{X}^\mathrm{T} \mathbf{X} )^{-1}$ is the inverse of the $\mathbf{X}^\mathrm{T} \mathbf{X}$ matrix, and
- $\mathbf{X}^\mathrm{T}$ is the transpose of the $\mathbf{X}$ matrix.

Let's remind ourselves of the transpose and inverse of a matrix.

## 3. Transpose of a Matrix

The transpose of a matrix $\mathbf{A}$, denoted as $\mathbf{A}^\mathrm{T}$ or $\mathbf{A}^{\prime}$, is a matrix whose rows are the columns of $\mathbf{A}$ and whose columns are the rows of $\mathbf{A}$. All in the same order.

For example, the transpose of the $3 \times 2$ matrix $\mathbf{A}$:
$$
\mathbf{A} = \begin{bmatrix} a_{0,0} & a_{0,1} \\ a_{1,0} & a_{1,1} \\ a_{2,0} & a_{2,1} \end{bmatrix}
$$
is the $2 \times 3$ matrix $\mathbf{A}^\mathrm{T}$:
$$
\mathbf{A}^\mathrm{T} = \begin{bmatrix} a_{0,0} & a_{1,0} & a_{2,0} \\ a_{0,1} & a_{1,1} & a_{2,1} \end{bmatrix}
$$

The $\mathbf{X}$ matrix in the simple linear regression setting is:
$$
\mathbf{X} = \begin{bmatrix}
   1 & x_1 \\
   1 & x_2 \\
   \vdots \\
   1 & x_n
\end{bmatrix}.
$$

Hence, the $\mathbf{X}^\mathrm{T} \mathbf{X}$ matrix in the linear regression is:
$$
\mathbf{X}^\mathrm{T} \mathbf{X} = \begin{bmatrix}
   1 & 1 & \dots & 1\\
   x_1 & x_2 & & x_n
\end{bmatrix}

\begin{bmatrix}
   1 & x_1 \\
   1 & x_2 \\
   \vdots \\
   1 & x_n
\end{bmatrix}

= \begin{bmatrix}
n & \sum_{i=1}^n x_i \\ \sum_{i=1}^n x_i & \sum_{i=1}^n x_i^2
\end{bmatrix}.
$$

## 4. The Inverse of a Matrix

The inverse $\mathbf{A}^{-1}$ of a **square matrix** $\mathbf{A}$ is the unique matrix such that:
$$
\mathbf{A}^{-1} \mathbf{A} = \mathbf{I} = \mathbf{A} \mathbf{A}^{-1}.
$$

That is, the inverse of $\mathbf{A}$ is the matrix $\mathbf{A}^{-1}$ that you multiply $\mathbf{A}$ by to obtain the identity matrix $\mathbf{I}$. Note that the inverse only exists for square matrices.

Now, finding inverses, particularly for large matrices, is a complicated task. We will use numpy to calculate the inverses.

## 5. Solution for Linear Regresssion

We will use a data set from the Python library sklearn for linear regression.

In [None]:
# Create data set
x, y = make_regression(n_samples=100, n_features=1, n_informative=1, noise=10, random_state=10)
 
# Plot the data set
plt.figure(figsize=(8, 6))

plt.rcParams['font.size'] = '16'

plt.scatter(x, y, s = 30, marker = 'o')

plt.xlabel('x')
plt.ylabel('y')

plt.title('Scatter Data', fontsize=20)

plt.show()

In [None]:
# Convert the vector of y variables into a column vector
Y = np.expand_dims(y, axis=-1) 

# Create matrix X by adding x0 = 1 to each instance of x and taking the transpose
X = np.stack( ( np.ones(x.size), np.ravel(x) ), axis=1 )

# Determining the coefficients of linear regression

# Calculate X^T X
XT_times_X = np.matmul(X.T, X)

# Calculate (X^T X)^-1
XT_times_X_inverse = np.linalg.inv(XT_times_X)

# Calculate (X^T Y)
XT_times_Y = np.matmul(X.T, Y)

# Calculate (X^T X)^-1 (X^T Y)
Beta = np.matmul(XT_times_X_inverse, XT_times_Y).reshape(2)

# Display best values obtained
print(f"Matrix X =\n"
      f"{X[1:5, :]}\n\n")

print(f"Matrix X'X =\n"
      f"{XT_times_X}\n\n")

print(f"Inverse of (X'X) =\n"
      f"{XT_times_X_inverse}\n\n")

print("Regression coefficients\n"
      f"β0 = {Beta[0]:6.4f}\n"
      f"β1 = {Beta[1]:6.2f}")

### 5.1 Predict values using the regression coefficients

In [None]:
# Predict the values for given data instance.
x_sample     = np.array( [[-2.5],[3]] )

# Generatw matrix X
X_sample     = np.stack( ( np.ones( x_sample.size), np.ravel(x_sample) ), axis=1 )

# Multiply matrix X by the regression coefficients
y_predicted  = np.matmul(X_sample, Beta)

# Plot the generated data set
plt.figure(figsize=(8, 6))

plt.rcParams['font.size'] = '16'

plt.scatter(x, y, s = 30, marker = 'o')
plt.plot(x_sample, y_predicted, color='black', lw=2)

plt.xlabel('x')
plt.ylabel('y')

plt.title('Scatter Data', fontsize=20)

plt.show()

print(f"predicted values = {', '.join([f'[ {i[0]:.2f}, {i[1]:.2f} ]' for i in zip(np.ravel(x_sample), y_predicted)])}")

### 5.2 Now using scikit-learn

In [None]:
# Linear regression function from scikit-learn
linear_regression = LinearRegression()

# Fit the model
linear_regression.fit(x, y)
 
# Print obtained theta values
print(f"β0 = {linear_regression.intercept_:6.4f}\n"
      f"β1 = {linear_regression.coef_[0]:6.2f}")

> ## Assignment
>
> The projection matrix converts values from the observed variable $y_i$ into the estimated values $\hat{y}$  obtained with the least squares method. The projection matrix, $\mathbf{H}$, is given by
> $$
> \mathbf{H} =  \mathbf{X}\, (\mathbf{X}^\mathrm{T} \mathbf{X})^{-1}\, \mathbf{X}^\mathrm{T}
> $$
>
> Calculate the projection matrix, $\mathbf{H}$, and show that you obtain the predicted $y$-values by creating a plot.

In [None]:
# Calculate the projection matrix



# Apply the projection matrix to the y-values to generate the y predictions



# Plot the predicted and original y values vs. the x values


> Knowning the projection matrix, $\mathbf{H}$, we can also express the $R^2$ value for the linear regression using a matrix equation:
> $$
> R^2 = 1 - \frac{\mathbf{y}^\mathrm{T}\, (\mathbf{I} - \mathbf{H})\, \mathbf{y}} {\mathbf{y}^\mathrm{T}\, (\mathbf{I} - \mathbf{M})\, \mathbf{y}}
> $$
> where $\mathbf{I}$ is the identity matrix,
> $$
> \mathbf{M} = \mathbf{1}\, (\mathbf{1}^\mathrm{T} \mathbf{1})^{-1}\, \mathbf{1}^\mathrm{T},
> $$
> and $\mathbf{1}$ is a column vector of ones.
> 
> Calculate the $R^2$ value using the above matrix form of the equations.

In [None]:
# Create a column vector of ones
One = np.expand_dims(np.ones(y.size), axis=-1)

# Calculate the matrix M



# Calculate R2

I = np.identity(H.shape[0])

