# Properties of Matrix Operations

## Operations

The operations are as follows:

- **Addition**: If $ A $ and $ B $ are matrices of the same size $ m \times n $, then $ A + B $, their sum, is a matrix of size $ m \times n $.
- **Multiplication by scalars**: If $ A $ is a matrix of size $ m \times n $ and $ c $ is a scalar, then $ cA $ is a matrix of size $ m \times n $.
- **Matrix multiplication**: If $ A $ is a matrix of size $ m \times n $ and $ B $ is a matrix of size $ n \times p $, then the product $ AB $ is a matrix of size $ m \times p $.
- **Vectors**: A vector of length $ n $ can be treated as a matrix of size $ n \times 1 $, and the operations of vector addition, multiplication by scalars, and multiplying a matrix by a vector agree with the corresponding matrix operations.
- **Transpose**: If $ A $ is a matrix of size $ m \times n $, then its transpose $ A^T $ is a matrix of size $ n \times m $.
- **Identity matrix**: $ I_n $ is the $ n \times n $ identity matrix; its diagonal elements are equal to 1 and its off-diagonal elements are equal to 0.
- **Zero matrix**: We denote by 0 the matrix of all zeroes (of relevant size).
- **Inverse**: If $ A $ is a square matrix, then its inverse $ A^{-1} $ is a matrix of the same size. Not every square matrix has an inverse! (The matrices that have inverses are called invertible.)

## Properties

The properties of these operations are (assuming that \( r, s \) are scalars and the sizes of the matrices \( A, B, C \) are chosen so that each operation is well defined):

$$
(A + B) + C = A + (B + C), \tag{1}
$$

$$
A + B = B + A, \tag{2}
$$

$$
A + 0 = A, \tag{3}
$$

$$
r(A + B) = rA + rB, \tag{4}
$$

$$
(r + s)A = rA + sA, \tag{5}
$$

$$
r(sA) = (rs)A, \tag{6}
$$

$$
A(BC) = (AB)C, \tag{7}
$$

$$
A(B + C) = AB + AC, \tag{8}
$$

$$
(B + C)A = BA + CA, \tag{9}
$$

## Additional Properties

$$
r(AB) = (rA)B = A(rB), \tag{10}
$$

$$
I_m A = A = A I_n, \tag{11}
$$

$$
(A^T)^T = A, \tag{12}
$$

$$
(A + B)^T = A^T + B^T, \tag{13}
$$

$$
(rA)^T = r A^T, \tag{14}
$$

$$
(AB)^T = B^T A^T, \tag{15}
$$

$$
(I_n)^T = I_n, \tag{16}
$$

$$
A A^{-1} = A^{-1} A = I_n, \tag{17}
$$

$$
(rA)^{-1} = r^{-1} A^{-1}, \quad r \neq 0, \tag{18}
$$

$$
(AB)^{-1} = B^{-1} A^{-1}, \tag{19}
$$

$$
(I_n)^{-1} = I_n, \tag{20}
$$

$$
(A^T)^{-1} = (A^{-1})^T, \tag{21}
$$

$$
(A^{-1})^{-1} = A, \tag{22}
$$

## Differences from Number Operations

We see that in many cases, we can treat addition and multiplication of matrices as addition and multiplication of numbers. However, here are some differences between operations with matrices and operations with numbers:

- **Note the reverse order of multiplication** in (15) and (19).
- (19) can only be applied if we know that both \( A \) and \( B \) are invertible.
- In general, $ AB \neq BA $, even if $A$ and $B$ are both square. If $AB = BA$, then we say that $A$ and $B$ commute.
- For a general matrix $A$, we cannot say that $AB = AC$ yields $B = C$. (However, if we know that $A$ is invertible, then we can multiply both sides of the equation $AB = AC$ to the left by $A^{-1} $ and get $B = C $.)
- The equation $AB = 0 $ does not necessarily yield $ A = 0 $ or $ B = 0 $. For example, take

$$
A = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}, \quad B = \begin{pmatrix} 0 & 0 \\ 0 & 1 \end{pmatrix}.
$$


# Review of the OLS estimator

**OLS - Ordinary Least Squares**  
It is an estimation technique of estimating linear equations of the form  
$y = X\beta + v$

**Indifference: curve**

$U(X_1, X_2) = A * X_1^α * X_2^β$  .......Cobb-Douglas.

$\ln Z1 = \ln (A * X_1^α * X_2^β) = \ln A + \alpha \ln X_1 + \beta \ln X_2$

### Where:
$y = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix}$  
- $n \times 1$ vector of dependent variable  
- explained variable  
- endogenous variable

$X = \begin{bmatrix} x_1 & x_2 & \cdots & x_k \\ x_{21} & x_{22} & \cdots & x_{2k} \\ x_{23} & x_{32} & \cdots & x_{k8} \\ \vdots & \vdots & \ddots & \vdots \\ x_{n1} & x_{n2} & \cdots & x_{nk} \end{bmatrix}$  
- $n \times k$ matrix of independent/explanatory/exogenous variables

$\beta = \begin{bmatrix} \beta_1 \\ \beta_2 \\ \beta_3 \\ \vdots \\ \beta_k \end{bmatrix}$  
- $k \times 1$ matrix of parameters. We wish to estimate these parameters

## Degrees of freedom
- Greene: $n - K - 1$
- Verbeek: $n - K$

# Degrees of Freedom

Degrees of freedom refer to the number of observations minus the number of parameters being estimated.

- Verbeek: \( n - k \)
  - \( k \): Number of parameters.
- Greene: \( n - k - 1 \)
  - \( k \): Number of independent variables.
  - To Greene, slope parameters are \( k \), but adding the intercept gives \( (k + 1) \) parameters.
  - \( n - (k + 1) = n - k - 1 \)



# Linear Regression: Matrix Form & OLS Assumptions

## Residuals
Let  
$$
u = \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ \vdots \\ u_n \end{bmatrix}
$$

be the $(n \times 1)$ vector of residuals.  

Residual (estimated error):  
$\hat{u}_i = y_i - \hat{y}_i$

So:  
$$
y_i = \hat{y}_i + u_i
$$

## Matrix Representation

$$
y = X \beta + u
$$

Where:

$$
y =
\begin{bmatrix}
y_1 \\
y_2 \\
\vdots \\
y_n
\end{bmatrix}, \quad
X =
\begin{bmatrix}
1 & x_{12} & x_{13} & \cdots & x_{1k} \\
1 & x_{22} & x_{23} & \cdots & x_{2k} \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
1 & x_{n2} & x_{n3} & \cdots & x_{nk}
\end{bmatrix}, \quad
\beta =
\begin{bmatrix}
\beta_1 \\
\beta_2 \\
\vdots \\
\beta_k
\end{bmatrix}, \quad
u =
\begin{bmatrix}
u_1 \\
u_2 \\
\vdots \\
u_n
\end{bmatrix}
$$

---

### Individual Equation Form
$$
y_i = \beta_1 + \beta_2 x_{i2} + \cdots + \beta_k x_{ik} + u_i
$$

---

**Note:**
- The first column of 1s in $X$ captures the intercept ($\beta_1$).  
- Software like **R, Stata, Python (`statsmodels`)** includes this automatically unless told otherwise.





## Worked Examples
### Case 1: \(n=2, k=2\)
$$
\begin{bmatrix} y_1 \\ y_2 \end{bmatrix}
=
\begin{bmatrix} 1 & x_{21} \\ 1 & x_{22} \end{bmatrix}
\begin{bmatrix} \beta_1 \\ \beta_2 \end{bmatrix}
+
\begin{bmatrix} u_1 \\ u_2 \end{bmatrix}
$$


### Case 2: \(n=4, k=3\)
$$
\begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ y_4 \end{bmatrix}
=
\begin{bmatrix} 
1 & x_{21} & x_{31} \\
1 & x_{22} & x_{32} \\
1 & x_{23} & x_{33} \\
1 & x_{24} & x_{34}
\end{bmatrix}
\begin{bmatrix} \beta_1 \\ \beta_2 \\ \beta_3 \end{bmatrix}
+
\begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4 \end{bmatrix}
$$

---



## Classical Linear Regression Assumptions (CLRM)


1. **Zero mean error**
$$
E[u] = 0
$$

$$
E[u] = \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ \vdots \\ u_n \end{bmatrix} = \begin{bmatrix} E[u_1] \\ E[u_2] \\ E[u_3] \\ \vdots \\ E[u_n] \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix}
$$




2. **Homoskedasticity & No autocorrelation**
$$
E[uu'] = \sigma^2 I_n
$$
Where $I_n$ is an identity matrix of order n

$\sigma^2$ arises from the distribution of u.

$$
U \sim N(0, \sigma^2)
$$

***[u is normally distributed with zero mean and a constant variance]***

1. Constant variance: $ \text{Var}(u_i) = \sigma^2 \$  

**Homoskedasticity**

2. Zero covariance: $ \text{Cov}(u_i, u_j) = 0 \quad \text{for } i \neq j \$

**No autocorrelation**

But what is $UU'$

$$
uu' =
\begin{bmatrix}
u_1 \\ u_2 \\ u_3 \\ \ddot \\ u_n \end{bmatrix}
$$

$$
uu' =
\begin{bmatrix}
u_1^2 & u_1 u_2 & \cdots & u_1 u_n \\
u_2 u_1 & u_2^2 & \cdots & u_2 u_n \\
\vdots & \vdots & \ddots & \vdots \\
u_n u_1 & u_n u_2 & \cdots & u_n^2
\end{bmatrix}
$$


3. **Full rank (No perfect multicollinearity)**
$$
\text{rank}(X) = k
$$

4. **Exogeneity**
$$
E[u \mid X] = 0
$$

5. **Normality (for inference)**
$$
u \sim N(0, \sigma^2 I_n)
$$

---

## Variance-Covariance Matrix of \(u\)
$$
uu' =
\begin{bmatrix}
u_1^2 & u_1 u_2 & \cdots & u_1 u_n \\
u_2 u_1 & u_2^2 & \cdots & u_2 u_n \\
\vdots & \vdots & \ddots & \vdots \\
u_n u_1 & u_n u_2 & \cdots & u_n^2
\end{bmatrix}
$$
![image.png](attachment:image.png)
Taking expectations:

### Case: $ n=2 $
$$
E[uu'] =
\begin{bmatrix}
\sigma^2 & 0 \\
0 & \sigma^2
\end{bmatrix}
$$

### Case: $n=3$
$$
E[uu'] =
\begin{bmatrix}
\sigma^2 & 0 & 0 \\
0 & \sigma^2 & 0 \\
0 & 0 & \sigma^2
\end{bmatrix}
$$
![image-2.png](attachment:image-2.png)
![image-3.png](attachment:image-3.png)
![image-4.png](attachment:image-4.png)
![image-5.png](attachment:image-5.png)
![image-6.png](attachment:image-6.png)
![image-7.png](attachment:image-7.png)

---

## Homoskedasticity vs Heteroskedasticity
- **Homoskedasticity:** residuals have constant spread around regression line. 
![image.png](attachment:image.png)
![image-4.png](attachment:image-4.png)
- **Heteroskedasticity:** variance of residuals changes with \(x\).  
![image-5.png](attachment:image-5.png)
📈 *Visual tip:* Simulate and plot residuals in Python to illustrate constant vs. non-constant variance.

---

## Autocorrelation
- **No autocorrelation:** residuals across observations are independent.  
![image-2.png](attachment:image-2.png)
- **Autocorrelation:** residuals of one observation depend on another (common in time series).
![image-3.png](attachment:image-3.png)

---

## Rank of Matrix \(X\)
- **Rank = number of linearly independent columns (or rows).**  
- If $\text{rank}(X) = k$, \(X\) has full column rank → OLS estimates are unique.

### Example of Rank Deficiency
Suppose:
$$
\text{male}_i = 
\begin{cases} 1 & \text{male} \\ 0 & \text{female} \end{cases}, \quad
\text{female}_i =
\begin{cases} 1 & \text{female} \\ 0 & \text{male} \end{cases}
$$

$$
X =
\begin{bmatrix}
1 & 0 & 1 \\
1 & 1 & 0 \\
1 & 0 & 1 \\
1 & 1 & 0
\end{bmatrix}
$$


Here:  
- The last two columns are perfectly collinear ($\text{male}_i + \text{female}_i = 1$) → **perfect multicollinearity**.  

**Econometric Implication:**  
- Cannot separately estimate coefficients of male and female dummies with an intercept (**dummy variable trap**).  
- **Fix:** Drop one dummy (e.g., keep only “male” dummy).

---

## Key Takeaways
- $(E[u] = 0$ ensures **unbiasedness**.  
- $E[uu'] = \sigma^2 I_n$ ensures **efficiency** (BLUE property).  
- Full rank $X$ ensures **unique and stable estimates**.  
- Violations (heteroskedasticity, autocorrelation, multicollinearity) affect **inference**, not always unbiasedness.


$$
E[u]=\begin{bmatrix}
     1 & 2 & 3 \\
     4 & 5 & 6 \\
     7 & 8 & 9
     \end{bmatrix}
$$


![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

In [7]:
import numpy as np

# Example dataset
male = np.array([1, 0, 1, 0])     # 1 = male, 0 = female
female = np.array([0, 1, 0, 1])   # 1 = female, 0 = male

# Stack into a 2D array
X = np.vstack([male, female])

# Compute correlation matrix
corr_matrix = np.corrcoef(X)
print(corr_matrix)


[[ 1. -1.]
 [-1.  1.]]


![image.png](attachment:image.png)

![image.png](attachment:image.png)

In [8]:
# Example dummies
male = np.array([1, 0, 1, 0])     # 1 = male, 0 = female
female = np.array([0, 1, 0, 1])   # 1 = female, 0 = male

# Intercept column
intercept = np.ones_like(male)

# Stack into design matrix X (n x k)
X = np.column_stack([intercept, male, female])
print("X =\n", X)

# Compute X'X
XtX = X.T @ X
print("\nX'X =\n", XtX)


X =
 [[1 1 0]
 [1 0 1]
 [1 1 0]
 [1 0 1]]

X'X =
 [[4 2 2]
 [2 2 0]
 [2 0 2]]


![image.png](attachment:image.png)

In [9]:
import numpy as np


male = np.array([1, 0, 1, 0])
female = np.array([0, 1, 0, 1])
intercept = np.ones_like(male)

X = np.column_stack([intercept, male, female])

# Computing X'X, I get
XtX = X.T @ X

# Determinant of X'X
det = np.linalg.det(XtX)

print("X'X =\n", XtX)
print("\nDeterminant =", det)


X'X =
 [[4 2 2]
 [2 2 0]
 [2 0 2]]

Determinant = 0.0


![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

# 1) Problem setup (what we’re solving)

We have $N$ observations and $K$ regressors (including intercept). For observation $i$:

* $x_i$ is a $K\times 1$ column vector $(x_{i1},\dots,x_{iK})'$. Usually $x_{i1}=1$ for intercept.
* $y_i$ is the scalar response.

Stack observations:

* $X$ is $N\times K$ with rows $x_i'$.
* $y$ is $N\times 1$ with elements $y_i$.
* We model $y = X\beta + u$, where $u$ is the error vector.

Ordinary least squares (OLS) chooses $\beta$ to minimize the sum of squared residuals:

$$
S(\beta) = \sum_{i=1}^N (y_i - x_i' \beta)^2 = (y - X\beta)'(y - X\beta).
$$

# 2) Derive the normal equations
![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)
![image-3.png](attachment:image-3.png)
![image-4.png](attachment:image-4.png)
![image-5.png](attachment:image-5.png)
Expand $S(\beta)$:

$$
S(\beta) = y'y - 2\beta'X'y + \beta'(X'X)\beta.
$$

(Use bilinearity and $(X\beta)' = \beta'X'$.)

Take gradient with respect to $\beta$. Standard matrix derivatives:

* $\frac{\partial}{\partial\beta}(-2\beta'X'y) = -2X'y$.
* $\frac{\partial}{\partial\beta}(\beta'X'X\beta) = 2X'X\beta$ (because $X'X$ is symmetric).

So

$$
\nabla_\beta S(\beta) = -2X'y + 2X'X\beta.
$$

Set gradient to zero (first-order condition for minimum):

$$
-2X'y + 2X'X\beta = 0 \quad\Longrightarrow\quad X'X\beta = X'y.
$$

These are the **normal equations**.

If $X'X$ is invertible (full column rank), solve:

$$
\boxed{\;\hat\beta = (X'X)^{-1} X' y\;}.
$$

# 3) Why the solution is unique (Hessian & positive definiteness)

Hessian $=\; 2X'X$. If $X$ has full column rank $K$, then $X'X$ is positive definite, so the stationary point is a unique global minimizer. If not full rank, $X'X$ is singular and there are infinitely many minimizers (perfect multicollinearity).

# 4) Concrete numeric example — do the algebra

Let

$$
X=\begin{bmatrix}1&1\\1&2\\1&3\end{bmatrix},\qquad y=\begin{bmatrix}1\\2\\2\end{bmatrix}.
$$

Here $N=3$, $K=2$. Compute step-by-step:

1. $X'X = \begin{bmatrix}\sum 1 & \sum x_i\\ \sum x_i & \sum x_i^2\end{bmatrix} = \begin{bmatrix}3 & 6\\ 6 & 14\end{bmatrix}.$

2. $X'y = \begin{bmatrix}\sum y_i \\ \sum x_i y_i\end{bmatrix} = \begin{bmatrix}5 \\ 11\end{bmatrix}.$

3. $(X'X)^{-1}$. Determinant $=3\cdot14 - 6\cdot6 = 42 -36 = 6$.

$$
(X'X)^{-1} = \frac{1}{6}\begin{bmatrix}14 & -6\\ -6 & 3\end{bmatrix}
= \begin{bmatrix}7/3 & -1\\ -1 & 1/2\end{bmatrix}.
$$

4. Multiply:

$$
\hat\beta = (X'X)^{-1} X'y
= \begin{bmatrix}7/3 & -1\\ -1 & 1/2\end{bmatrix}\begin{bmatrix}5\\11\end{bmatrix}
= \begin{bmatrix}2/3 \\ 1/2\end{bmatrix}.
$$

So the estimated line is $\hat y = \tfrac{2}{3} + \tfrac{1}{2}x$.

Check orthogonality (normal equation holds):
Residuals $r = y - X\hat\beta = [-\tfrac{1}{6}, \tfrac{1}{3}, -\tfrac{1}{6}]'$.
Verify $X'r = 0$ (i.e., columns of $X$ are orthogonal to residuals) — you’ll find both components exactly zero. That’s the defining property of OLS.

Calculate goodness of fit:

* SST (total) = $\sum (y_i-\bar y)^2 = 2/3$.
* SSE (residual sum squares) = $\sum r_i^2 = 1/6$.
* SSR = SST − SSE = 1/2.
* $R^2 = SSR/SST = 0.75$.

Also $\hat\sigma^2 = \dfrac{\text{SSE}}{N-K} = \dfrac{1/6}{1} = 1/6$.

Variance of $\hat\beta$:

$$
\mathrm{Var}(\hat\beta) = \sigma^2 (X'X)^{-1} \quad\Rightarrow\quad
\widehat{\mathrm{Var}}(\hat\beta)=\hat\sigma^2 (X'X)^{-1}.
$$

Plugging numbers gives standard errors you’d use in t-tests.

# 5) Geometric interpretation — projection onto column space

* Columns of $X$ span a $K$-dimensional subspace of $\mathbb{R}^N$ (the column space $\mathcal{C}(X)$).
* OLS finds $\hat y = X\hat\beta$ which is the orthogonal projection of $y$ onto $\mathcal{C}(X)$.
* Residual $r = y - \hat y$ is orthogonal to every column of $X$: $X'(y-\hat y)=0$.
* The **hat matrix** (projection matrix) is

  $$
  H = X (X'X)^{-1} X'.
  $$

  Properties: $H = H'$ (symmetric), $H^2 = H$ (idempotent), $Hy = \hat y$. Trace$(H)=K$ (degrees of freedom explained).

This projection view is powerful: OLS = best linear predictor in $L^2$ sense.

# 6) Statistical properties (Gauss–Markov theorem — exam essentials)

Under the standard assumptions:

1. **Linearity**: $y = X\beta + u$.
2. **Exogeneity**: $E[u|X]=0$.
3. **Homoskedasticity & no autocorrelation**: $\mathrm{Var}(u|X)=\sigma^2 I_N$.
4. **Full column rank**: $\mathrm{rank}(X)=K$.

Then:

* $\hat\beta$ is unbiased: $E[\hat\beta|X]=\beta$.
* $\mathrm{Var}(\hat\beta|X)=\sigma^2 (X'X)^{-1}$.
* OLS is the **BLUE** (best linear unbiased estimator) — smallest variance among linear unbiased estimators.

Estimator for $\sigma^2$:

$$
\hat\sigma^2 = \frac{\text{SSE}}{N-K} = \frac{(y-X\hat\beta)'(y-X\hat\beta)}{N-K},
$$

which is unbiased for $\sigma^2$ under the assumptions above.

# 7) What can go wrong (practical & exam-expected caveats)

* **Perfect multicollinearity**: some column of $X$ is linear combination of others → $X'X$ singular → no unique OLS. Remove or combine variables.
* **Near-collinearity**: $X'X$ poorly conditioned → $(X'X)^{-1}$ huge → large variances. Detect via condition number or VIFs (variance-inflation factors).

  * Remedies: drop variables, combine variables, use principal components, or regularize (ridge regression).
* **Endogeneity** (violates exogeneity): $E[u|X]\ne 0$ → bias. Remedies: instrumental variables.
* **Heteroskedasticity**: $\mathrm{Var}(u|X)\neq\sigma^2 I$ → OLS still unbiased but standard errors are wrong. Remedy: robust (White) standard errors.
* **Autocorrelation** (time series): use GLS or Newey–West SE.
* **Numerical instability**: computing $(X'X)^{-1}$ directly is numerically unstable when condition number high. Use QR decomposition or SVD for stable computation:

  * QR: $X=QR$, $Q'Q=I$, then $\hat\beta = R^{-1}Q'y$.
  * SVD gives stable solution and is used when $X$ may be singular.

# 8) Useful related objects and properties (quick list)

* Hat matrix $H = X(X'X)^{-1}X'$. Residual-maker $M = I - H$.
* SSE = $y'(I-H)y$.
* Degrees of freedom: DF for SSE = $N-K$, for SSR = $K-1$ (if intercept included).
* Diagnostics: leverage = diagonal entries of $H$, Cook’s distance for influence.
* Hypothesis tests: $t$-tests for single coeff, $F$-tests for joint restrictions. Use $\widehat{\mathrm{Var}}(\hat\beta)$ and $\hat\sigma^2$.

# 9) Applications — where this is useful (brief)

* Economics: demand/supply estimation, returns to education.
* Finance: CAPM beta estimates (time-series regression of asset returns on market returns).
* Forecasting: OLS as baseline predictor.
* Portfolio analytics: regress fund returns on factors.
* Policy evaluation: difference-in-differences often implemented as OLS.

# 10) Exam checklist — what to write to get full credit

* Define model $y=X\beta+u$ and objective $S(\beta)$.
* Expand $S(\beta)$ and take derivative; show gradient = $-2X'y+2X'X\beta$.
* Set to zero → normal equations $X'X\hat\beta=X'y$.
* State rank/invertibility condition for uniqueness and give $\hat\beta=(X'X)^{-1}X'y$.
* Mention projection geometry: $X'(y-X\hat\beta)=0$.
* State Gauss–Markov assumptions and results ($\hat\beta$ is BLUE, variance form).
* Show estimator for $\sigma^2$ and degrees of freedom $N-K$.
* Discuss at least one practical issue (multicollinearity, heteroskedasticity) and a remedy.
* If asked, show numeric example or apply QR/SVD for numerical stability.

# 11) Short summary (TL;DR)

* Minimize $S(\beta)=(y-X\beta)'(y-X\beta)$ → first-order condition gives $X'X\beta = X'y$.
* If $X'X$ invertible, $\hat\beta = (X'X)^{-1}X'y$.
* OLS is a projection: residuals orthogonal to columns of $X$.
* Under standard assumptions, $\hat\beta$ is unbiased with $\mathrm{Var}(\hat\beta)=\sigma^2(X'X)^{-1}$.
* Watch out for multicollinearity and endogeneity; use QR/SVD for stable computation.


![image.png](attachment:image.png)

To find $\beta_1$ and $\beta_2$ for the data sampling process $y_i = \beta_1 + \beta_2 x_i + u_i$, we use the given data: $y' = [2\ 4\ 7]$ and $x' = [0\ 0\ 5]$. This suggests a system with three observations, and we can set up the normal equations for ordinary least squares (OLS) regression.

The model is $y = X\beta + u$, where:
- $y = [y_1\ y_2\ y_3]' = [2\ 4\ 7]'$,
- $X = \begin{bmatrix} 1 & x_1 \\ 1 & x_2 \\ 1 & x_3 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 1 & 0 \\ 1 & 5 \end{bmatrix}$,
- $\beta = \begin{bmatrix} \beta_1 \\ \beta_2 \end{bmatrix}$,
- $u = [u_1\ u_2\ u_3]'$.

The OLS estimator is given by $\hat{\beta} = (X'X)^{-1}X'y$. First, compute $X'X$:
$$
X'X = \begin{bmatrix} 1 & 1 & 1 \\ 0 & 0 & 5 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 1 & 0 \\ 1 & 5 \end{bmatrix} = \begin{bmatrix} 3 & 5 \\ 5 & 25 \end{bmatrix}.
$$

Next, compute $X'y$:
$$
X'y = \begin{bmatrix} 1 & 1 & 1 \\ 0 & 0 & 5 \end{bmatrix} \begin{bmatrix} 2 \\ 4 \\ 7 \end{bmatrix} = \begin{bmatrix} 2 + 4 + 7 \\ 0 \cdot 2 + 0 \cdot 4 + 5 \cdot 7 \end{bmatrix} = \begin{bmatrix} 13 \\ 35 \end{bmatrix}.
$$

Now, find the inverse of $X'X$. The determinant is $3 \cdot 25 - 5 \cdot 5 = 75 - 25 = 50$, and the inverse is:
$$
(X'X)^{-1} = \frac{1}{50} \begin{bmatrix} 25 & -5 \\ -5 & 3 \end{bmatrix}.
$$

Then, $\hat{\beta} = (X'X)^{-1}X'y$:
$$
\hat{\beta} = \frac{1}{50} \begin{bmatrix} 25 & -5 \\ -5 & 3 \end{bmatrix} \begin{bmatrix} 13 \\ 35 \end{bmatrix} = \frac{1}{50} \begin{bmatrix} 25 \cdot 13 - 5 \cdot 35 \\ -5 \cdot 13 + 3 \cdot 35 \end{bmatrix} = \frac{1}{50} \begin{bmatrix} 325 - 175 \\ -65 + 105 \end{bmatrix} = \frac{1}{50} \begin{bmatrix} 150 \\ 40 \end{bmatrix} = \begin{bmatrix} 3 \\ 0.8 \end{bmatrix}.
$$

Thus, $\beta_1 = 3$ and $\beta_2 = 0.8$.

#### IN PYTHON:

In [1]:
import numpy as np

# Data
y = np.array([2, 4, 7])
X = np.array([[1, 0], [1, 0], [1, 5]])

# OLS estimation
beta_hat = np.linalg.inv(X.T @ X) @ X.T @ y

# Extract beta_1 and beta_2
beta_1, beta_2 = beta_hat

print(f"beta_1 = {beta_1}")
print(f"beta_2 = {beta_2}")

beta_1 = 2.9999999999999982
beta_2 = 0.8000000000000003
