# Independence and homoskedasticity

In this section, we will take a look at two assumptions: independence and homoskedasticity, together as they are very tied together.

In [section 3](3_Linear_regression_MLE.ipynb), we made an assumption on the error:

$$ \epsilon \sim \mathcal{N}(0, \sigma^2\mathit{I}) $$

In this we are saying that the errors are centered around zero and have a covariance matrix ($\Sigma$) of form

$$
\Sigma =
    \begin{bmatrix}
    \sigma^2 & 0 & \cdots & 0 \\
    0 & \sigma^2 & \cdots & 0 \\
    \vdots & \vdots & \ddots & \vdots \\
    0 & 0 & \cdots & \sigma^2 \\
    \end{bmatrix}
$$

This is what we call a spherical error, which means the errors for each data point are same (homoskedasticity) and not correlated. These can be formally defined as

- Homoskedasticity : The observations have a constant variance $\sigma^2$

$$ \mathbb{V}[\epsilon_n | \text{X}] = \sigma^2, \quad n \in \{1,\dots,N\} $$

- Error terms are uncorrelated

$$ \mathbb{E}[\epsilon_n \epsilon_m | \text{X}] = 0, \quad n,m \in \{1,\dots,N\}, n \neq m $$

Taking both of these two sub-assumptions together at once, we can formalize spherical errors as

$$ \mathbb{V}[\mathbf{\epsilon} | \text{X}] = \sigma^2\mathit{I} $$

## Breaking the assumption

### Heteroscedasticity but with uncorrelated errors

But in most practical scenario, the assumption of homoscedasticity does not hold. So, let us assumes heteroscedasticity but with uncorrelated errors:

$$
\Sigma =
    \begin{bmatrix}
    \sigma_1^2 & 0 & \cdots & 0 \\
    0 & \sigma_2^2 & \cdots & 0 \\
    \vdots & \vdots & \ddots & \vdots \\
    0 & 0 & \cdots & \sigma_n^2 \\
    \end{bmatrix}
$$

Since, there the covariance terms are zero, the data points are still independent. Therefore, we can write the likelihood of the data as:

$$ \mathcal{L}_x(\beta) = \prod^n_{i=1} \frac{1}{\sqrt{2\pi\sigma_i^2}}\exp{-\frac{(y_i - (\beta x_i))^2}{2\sigma_i^2}} $$

Converting this to log-likelihood by taking log on both sides:

\begin{align*}
\mathcal{l}_x(\beta) &= \sum^n_{i=1} \log\Big(\frac{1}{\sqrt{2\pi\sigma_i^2}}\exp{-\frac{(y_i - (\beta x_i))^2}{2\sigma_i^2}}\Big) \\
                     &= \sum^n_{i=1} \log\Big(\frac{1}{\sqrt{2\pi\sigma_i^2}}\Big) + \log\Big(\exp{-\frac{(y_i - (\beta x_i))^2}{2\sigma_i^2}}\Big) \\
                     &= \sum^n_{i=1} \log{1} - \log{\sqrt{2\pi\sigma_i^2}} -\frac{(y_i - (\beta x_i))^2}{2\sigma_i^2} \\
                     &= \sum^n_{i=1} \log{1} - \frac{1}{2}\sum^n_{i=1}\log{2\pi\sigma_i^2} - \sum^n_{i=1}\frac{(y_i - (\beta x_i))^2}{2\sigma_i^2} \\
\end{align*}

In above, log(1) is a constant and even if $\sigma_i^2$ are different for each data point, they are not dependent on $\beta$ and hence constant. Since constants do not affect the optimization, problem can be re-written as:

$$
\underset{\beta}{\text{max}} \quad - \sum^n_{i=1}\frac{(y_i - (\beta x_i))^2}{2\sigma_i^2}
$$

or

$$
\underset{\beta}{\text{min}} \quad \sum^n_{i=1}\frac{(y_i - (\beta x_i))^2}{2\sigma_i^2}
$$

If s = $\frac{1}{\sigma^2}$ which act as a weight assigned to each of the data points in the least square. Hence, this is called **weighted least square (WLS)**.

$$
\underset{\beta}{\text{min}} \quad \frac{1}{2}\sum^n_{i=1}s_i(y_i - (\beta x_i))^2
$$

or in matrix form

$$
\underset{\beta}{\text{min}} \quad (y - \text{X}\beta)^{T}S(y - \text{X}\beta)
$$

Since this is linear in parameters, quadratic and its derivative is linear, we can find an analytical estimate of $\beta$ by setting it derivative to zero.

$$
\hat\beta_{WLS} = (\text{X}^{T}S\text{X})^{-1}\text{X}^{T}Sy
$$

#### Scikit-learn Implementation

In scikit-learn, you can set the `sample_weight` keyword in the fit function.

### Heteroscedasticity but with correlated errors

This time let us assumes heteroscedasticity but errors are correlated

$$ \epsilon \sim \mathcal{N}(0,\Sigma) $$

In this case, we cannot assume data points to be independent and the likelihood need to consider the joint probability density function for normal distribution

$$
\mathcal{f}(y|X;\beta,\Sigma) = \frac{1}{(2\pi)^{n/2}{|\Sigma|}^{1/2}}\exp\Big(-\frac{1}{2}(y - \text{X}\beta)^{T}\Sigma^{-1/2}(y - \text{X}\beta)\Big)
$$

Converting to log-likelihood by taking log on both side

\begin{align*}
\mathcal{l}(\beta) &= \frac{n}{2}\log{(2\pi)} + \frac{1}{2}\log{|\Sigma|} + \log\Big(\exp\Big(-\frac{1}{2}(y - \text{X}\beta)^{T}\Sigma^{-1/2}(y - \text{X}\beta)\Big)\Big) \\
                   &= \frac{n}{2}\log{(2\pi)} + \frac{1}{2}\log{|\Sigma|} - \frac{1}{2}(y - \text{X}\beta)^{T}\Sigma^{-1/2}(y - \text{X}\beta) \\
\end{align*}

This needs to be maximized with respect to $\beta$ so $\log{(2\pi)}$ and $\log{|\Sigma|}$ are constants. Since constants do not affect the optimization, problem can be re-written as:

$$
\underset{\beta}{\text{max}} \quad -(y - \text{X}\beta)^{T}\Sigma^{-1/2}(y - \text{X}\beta)
$$

or

$$
\underset{\beta}{\text{min}} \quad (y - \text{X}\beta)^{T}\Sigma^{-1/2}(y - \text{X}\beta)
$$

Since this is linear in parameters, quadratic and its derivative is linear, we can find an analytical estimate of $\beta$ by setting it derivative to zero. Further, this generalizes least square to any positive-definite covariance matrix, this is called **generalized least square (GLS)**

$$
\hat\beta_{GLS} = (\text{X}^{T}\Sigma^{1/2}\text{X})^{-1}\text{X}^{T}\Sigma^{1/2}y
$$

Similar to OLS, it can be shown that **GLS** is also **BLUE** in the presence of heteroscedastic errors. As **WLS** is a special case of **GLS** with uncorrelated errors where $s_i = \frac{1}{\sigma_i^2}$, **WLS** is also **BLUE** in the presence of heteroscedastic uncorrelated errors.

If the error covariance matrix is known, we can solve GLS and get the estimate of $\beta$. However, most of the time, structure of $\Sigma$ is unknown. In this scenario we can use **Feasible Generalized Least Squares (FGLS)** which is can iterative method where we also estimate the covariance matrix iteratively. 

#### Scikit-learn Implementation

At the moment (June 2025) scikit-learn does not provide an implementation of GLS, you can use it via another propular statistics library `statsmodel` using `statsmodels.regression.linear_model.GLS`.