
# Review of the OLS estimator

**OLS - Ordinary Least Squares**  
It is an estimation technique of estimating linear equations of the form  
$y = X\beta + v$

**Indifference: curve**

$U(X_1, X_2) = A * X_1^α * X_2^β$  .......Cobb-Douglas.

$\ln Z1 = \ln (A * X_1^α * X_2^β) = \ln A + \alpha \ln X_1 + \beta \ln X_2$

### Where:
$y = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix}$  
- $n \times 1$ vector of dependent variable  
- explained variable  
- endogenous variable

$X = \begin{bmatrix} x_1 & x_2 & \cdots & x_k \\ x_{21} & x_{22} & \cdots & x_{2k} \\ x_{23} & x_{32} & \cdots & x_{k8} \\ \vdots & \vdots & \ddots & \vdots \\ x_{n1} & x_{n2} & \cdots & x_{nk} \end{bmatrix}$  
- $n \times k$ matrix of independent/explanatory/exogenous variables

$\beta = \begin{bmatrix} \beta_1 \\ \beta_2 \\ \beta_3 \\ \vdots \\ \beta_k \end{bmatrix}$  
- $k \times 1$ matrix of parameters. We wish to estimate these parameters

## Degrees of freedom
- Greene: $n - K - 1$
- Verbeek: $n - K$

# Degrees of Freedom

Degrees of freedom refer to the number of observations minus the number of parameters being estimated.

- Verbeek: \( n - k \)
  - \( k \): Number of parameters.
- Greene: \( n - k - 1 \)
  - \( k \): Number of independent variables.
  - To Greene, slope parameters are \( k \), but adding the intercept gives \( (k + 1) \) parameters.
  - \( n - (k + 1) = n - k - 1 \)



# Linear Regression: Matrix Form & OLS Assumptions

## Residuals
Let  
$$
u = \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ \vdots \\ u_n \end{bmatrix}
$$

be the $(n \times 1)$ vector of residuals.  

Residual (estimated error):  
$\hat{u}_i = y_i - \hat{y}_i$

So:  
$$
y_i = \hat{y}_i + u_i
$$

## Matrix Representation

$$
y = X \beta + u
$$

Where:

$$
y =
\begin{bmatrix}
y_1 \\
y_2 \\
\vdots \\
y_n
\end{bmatrix}, \quad
X =
\begin{bmatrix}
1 & x_{12} & x_{13} & \cdots & x_{1k} \\
1 & x_{22} & x_{23} & \cdots & x_{2k} \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
1 & x_{n2} & x_{n3} & \cdots & x_{nk}
\end{bmatrix}, \quad
\beta =
\begin{bmatrix}
\beta_1 \\
\beta_2 \\
\vdots \\
\beta_k
\end{bmatrix}, \quad
u =
\begin{bmatrix}
u_1 \\
u_2 \\
\vdots \\
u_n
\end{bmatrix}
$$

---

### Individual Equation Form
$$
y_i = \beta_1 + \beta_2 x_{i2} + \cdots + \beta_k x_{ik} + u_i
$$

---

**Note:**
- The first column of 1s in $X$ captures the intercept ($\beta_1$).  
- Software like **R, Stata, Python (`statsmodels`)** includes this automatically unless told otherwise.





## Worked Examples
### Case 1: \(n=2, k=2\)
$$
\begin{bmatrix} y_1 \\ y_2 \end{bmatrix}
=
\begin{bmatrix} 1 & x_{21} \\ 1 & x_{22} \end{bmatrix}
\begin{bmatrix} \beta_1 \\ \beta_2 \end{bmatrix}
+
\begin{bmatrix} u_1 \\ u_2 \end{bmatrix}
$$


### Case 2: \(n=4, k=3\)
$$
\begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ y_4 \end{bmatrix}
=
\begin{bmatrix} 
1 & x_{21} & x_{31} \\
1 & x_{22} & x_{32} \\
1 & x_{23} & x_{33} \\
1 & x_{24} & x_{34}
\end{bmatrix}
\begin{bmatrix} \beta_1 \\ \beta_2 \\ \beta_3 \end{bmatrix}
+
\begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4 \end{bmatrix}
$$

---



## Classical Linear Regression Assumptions (CLRM)


1. **Zero mean error**
$$
E[u] = 0
$$

$$
E[u] = \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ \vdots \\ u_n \end{bmatrix} = \begin{bmatrix} E[u_1] \\ E[u_2] \\ E[u_3] \\ \vdots \\ E[u_n] \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix}
$$




2. **Homoskedasticity & No autocorrelation**
$$
E[uu'] = \sigma^2 I_n
$$
Where $I_n$ is an identity matrix of order n

$\sigma^2$ arises from the distribution of u.

$$
U \sim N(0, \sigma^2)
$$

***[u is normally distributed with zero mean and a constant variance]***

1. Constant variance: $ \text{Var}(u_i) = \sigma^2 \$  

**Homoskedasticity**

2. Zero covariance: $ \text{Cov}(u_i, u_j) = 0 \quad \text{for } i \neq j \$

**No autocorrelation**

But what is $UU'$

$$
uu' =
\begin{bmatrix}
u_1 \\ u_2 \\ u_3 \\ \ddot \\ u_n \end{bmatrix}
$$

$$
uu' =
\begin{bmatrix}
u_1^2 & u_1 u_2 & \cdots & u_1 u_n \\
u_2 u_1 & u_2^2 & \cdots & u_2 u_n \\
\vdots & \vdots & \ddots & \vdots \\
u_n u_1 & u_n u_2 & \cdots & u_n^2
\end{bmatrix}
$$


3. **Full rank (No perfect multicollinearity)**
$$
\text{rank}(X) = k
$$

4. **Exogeneity**
$$
E[u \mid X] = 0
$$

5. **Normality (for inference)**
$$
u \sim N(0, \sigma^2 I_n)
$$

---

## Variance-Covariance Matrix of \(u\)
$$
uu' =
\begin{bmatrix}
u_1^2 & u_1 u_2 & \cdots & u_1 u_n \\
u_2 u_1 & u_2^2 & \cdots & u_2 u_n \\
\vdots & \vdots & \ddots & \vdots \\
u_n u_1 & u_n u_2 & \cdots & u_n^2
\end{bmatrix}
$$

Taking expectations:

### Case: $ n=2 $
$$
E[uu'] =
\begin{bmatrix}
\sigma^2 & 0 \\
0 & \sigma^2
\end{bmatrix}
$$

### Case: $n=3$
$$
E[uu'] =
\begin{bmatrix}
\sigma^2 & 0 & 0 \\
0 & \sigma^2 & 0 \\
0 & 0 & \sigma^2
\end{bmatrix}
$$



---

## Homoskedasticity vs Heteroskedasticity
- **Homoskedasticity:** residuals have constant spread around regression line. 
![image.png](attachment:image.png)
- **Heteroskedasticity:** variance of residuals changes with \(x\).  

📈 *Visual tip:* Simulate and plot residuals in Python to illustrate constant vs. non-constant variance.

---

## Autocorrelation
- **No autocorrelation:** residuals across observations are independent.  
- **Autocorrelation:** residuals of one observation depend on another (common in time series).

---

## Rank of Matrix \(X\)
- **Rank = number of linearly independent columns (or rows).**  
- If \(\text{rank}(X) = k\), \(X\) has full column rank → OLS estimates are unique.

### Example of Rank Deficiency
Suppose:
\[
\text{male}_i = 
\begin{cases} 1 & \text{male} \\ 0 & \text{female} \end{cases}, \quad
\text{female}_i =
\begin{cases} 1 & \text{female} \\ 0 & \text{male} \end{cases}
\]

\[
X =
\begin{bmatrix}
1 & 0 & 1 \\
1 & 1 & 0 \\
1 & 0 & 1 \\
1 & 1 & 0
\end{bmatrix}
\]

Here:  
- The last two columns are perfectly collinear (\(\text{male}_i + \text{female}_i = 1\)) → **perfect multicollinearity**.  

**Econometric Implication:**  
- Cannot separately estimate coefficients of male and female dummies with an intercept (**dummy variable trap**).  
- **Fix:** Drop one dummy (e.g., keep only “male” dummy).

---

## Key Takeaways
- \(E[u] = 0\) ensures **unbiasedness**.  
- \(E[uu'] = \sigma^2 I_n\) ensures **efficiency** (BLUE property).  
- Full rank \(X\) ensures **unique and stable estimates**.  
- Violations (heteroskedasticity, autocorrelation, multicollinearity) affect **inference**, not always unbiasedness.


![image.png](attachment:image.png)

![image.png](attachment:image.png)

$$
E[u]=\begin{bmatrix}
     1 & 2 & 3 \\
     4 & 5 & 6 \\
     7 & 8 & 9
     \end{bmatrix}
$$


![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)