# Weighted Least Squares (WLS) Regression

A standard assumption in linear regression $Y = \beta_0 + \beta_1 x + Z$ is $\mathbb{E}\left[Z \vert X\right] = 0$ and $\mathbb{V}\left[Z \vert X\right] = \sigma^2$. When the latter is violated--that is, variance is non-constant across $x^{\left(i\right)}$, noise is heteroscedastic, and OLS is no longer the best linear unbiased estimator.

In the final part of this project, we will study the impact of heteroscedastic errors and a possible approach for rectifying it with weighted least squares. For this exercise, we use the synthetic dataset in **wls_data.csv** in the data directory. The data file contains two observed variables: 'x' and 'y'.

We will still assume $Z^{\left(i\right)}$ are independent, follow a Normal distribution, and $\mathbb{E}\left[Z \vert X\right] = 0$. However, we will not assume the errors are homoscedastic.

In [None]:
# Import the necessary libraries
%matplotlib inline

from matplotlib import pyplot as plt
import numpy as np
import statsmodels.api as sm
from scipy import stats
import pandas as pd

plt.rc('font', size = 14)

In [None]:
# read the data
filen = 'data/wls_data.csv'
data = pd.read_csv(filen)
x = data['x'].values
y = data['y'].values
N = len(x)

# 1. OLS Regression
We will first build an OLS regression model. Note that we expect the estimated coefficients, $\hat{\beta_0}, \hat{\beta_1}$, to be unbiased estimators of $\beta_0, \beta_1$, due to the assumed properties of the errors. However, note that we have <b>not</b> assumed homoskedasticity (constant variance) of noise terms.

**(1a).** [1 pt] Build an OLS regression (using the <b>statsmodels.OLS</b> library). You do not have to fit the model in this step.

'y' is the dependent variable, and 'x' is the independent variable. You should add a constant to 'x' using the <b>add_constant()</b> function in statsmodels. Save the model as <b>ols_model</b>

**(1b).** [1 pt] Fit the model using <b>ols_model.fit()</b> and save the result as <b>ols_results</b>.

**(1c).** [1 pt] Print the summary by calling <b>ols_results.summary()</b>.

**(1d).** [1 pt] Compute the predictions, and store them as <b>ols_y_hat</b>. Use the <b>predict()</b> function with the parameters from <b>ols_results</b>. <br>
Check: http://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.predict.html

**(1e).** [1 pt] Compute the residuals. Store them as <b>ols_residuals</b>.

**(1f).** [2 pts] On the same plot, produce the following:
1. Scatter plot of the data (x, y)
2. Plot (or scatter plot) of (x, ols_y_hat)

Label your plot and axes.

**(1g).** [1 pt] Comment on the fit produced.

**A:** (Type your answer here.)

**(1h).** [1 pt] Produce a scatter plot of the residuals vs 'x'. Label your plot and axes.

**(1i).** [1 pt] What does this plot show? Is there evidence of constant variance (or otherwise)?

**A:** (Type your answer here.)

**(1j).** [1 pt] Now produce a QQ-plot of the residuals vs the Normal distribution.

**(1k).** [1 pt] Do the residuals appear Normally distributed?

**A:** (Type your answer here.)

# 2. Weights Least Squares (WLS)

Given the apparent heteroscedasticity, we will use Weighted Least Squares to produce a better linear estimator. The weights are the variance at each data point. To understand why, we can look at the log likelihood from the inference perspective with Gaussian noise:

\begin{equation*}
    l\left(\beta_0, \beta_1, \sigma\right) = - \sum_{i=1}^n \log \sigma^{\left(i\right)} - \frac{1}{2} \sum_{i=1}^n \left(\frac{y^{\left(i\right)} - \left(\beta_0 + \beta_1 x^{\left(i\right)}\right)}{\sigma^{\left(i\right)}}\right)^2
\end{equation*}

We can compare this equation with the objective for WLS to see what the weights are for each sample:

\begin{equation*}
    \hat{\beta}_0, \hat{\beta}_1 = \arg\min_{\beta_0, \beta_1} \sum_{i=1}^n w^{\left(i\right)} \left(y^{\left(i\right)} - \beta_0 - \beta_1 x^{\left(i\right)}\right)^2
\end{equation*}

We see the weights are $\frac{1}{\sigma^{\left(i\right)2}}$. Note that when $\sigma^{\left(i\right)}$ are the same for all data points as in OLS, the samples have uniform weight. A challenge when implementing WLS is we are not provided information about the variance of each data point. Therefore, we first have to estimate the weights to eliminate hetereoscedasticity.

**(2a).** [2 pts] Write a function <b>estW()</b> to estimate the variance at each data point. This function takes the following inputs:
1. An array of OLS residuals
2. An array of 'x' values (corresponding to each residual)

The residuals are too variable to be used directly as the weights. Thus, we estimate the variance by predicting the squared residuals from $x^2$ with an intercept. Your function should return an array of estimates of the variance for each data point.

In [None]:
def estW(residuals, 
         x): 
    '''
    Estimate the variance for each data point by regressing the OLS residual^2 against x^2 and a constant.
    Use the regression parameters to produce estimated variances for each data point.
    @param residuals: np array, OLS residuals
    @param x: np array, samples
    @return: np array, estimated variances for each data point
    '''
    pass

**(2b).** [1 pt] Use the <b>estW(ols_residuals, x)</b> function to produce the estimates of the variances. Save these estimates of $\hat{\sigma}^{\left(i\right)2}$ as <b>sig_est</b>

**(2c).** [2 pts] On the same plot, produce the following in different colors:
1. Scatter plot of the absolute values of ols_residuals against x.
2. Plot (or scatterplot) of the square root of the variance estimates, i.e. each $\hat{\sigma}^{\left(i\right)}$ against $x^{\left(i\right)}$.

Label the plots and axes clearly.

**(2d).** [1 pt] What does the plot above show? Do you think the estimated residuals make sense?

**A:** (Type your answer here.)

**(2e).** [1 pt] The solution to WLS in the multi-variate linear regression case is

\begin{equation*}
    \hat{\beta} = \left(X^T V X\right)^{-1} X^T V Y
\end{equation*}

where $V$ is a matrix with diagonal entries $w^{\left(i\right)}$. Let $W$ be a matrix with diagonal entries $\sqrt{w^{\left(i\right)}}$. Observe that

\begin{equation*}
    \hat{\beta} = \left(\left(W X\right)^T \left(W X\right)\right)^{-1} \left(W X\right)^T \left(W Y\right)
\end{equation*}

This suggests we can find the WLS solution by solving OLS for $W X$ and $W Y$. We will first compute the diagonal entries of $W$ from the estimates of the variances. These entries are the <b>reciprocals of the square root of the variance estimates</b>. Store the diagonal entries of $W$ as <b>w</b>.

**(2f).** [1 pt] Now produce the following variables to set up the OLS problem:

1. <b>X</b> = independent variable (x) and a constant (use add_constant(x) from the statsmodels library). <br>
2. <b>W</b> = a matrix with the <b>w</b> array on the diagonal. <br>
3. <b>X_w</b> = the product of W and X <br>
4. <b>y_w</b> = the product of W and Y <br>

**(2g).** [1 pt] Build an OLS regression (using the <b>statsmodels.OLS</b> library) with <b>y_w</b> and <b>X_w</b>. Save your model as <b>wls_model</b>. You do not need to fit the model in this step

**(2h).** [1 pt] Similar to the OLS exercise above, fit the model. Then compute the predictions and residuals. Save them as <b>wls_results, wls_y_hat, wls_residuals</b>, respectively.

**(2i).** [1 pt] Print the summary of this regression.

**(2j).** [1 pt] Show the scatter plot of these <b>wls_residuals</b> vs x. Is there still evidence of hetereoscedasticity?

**A:** (Type your answer here.)

**(2k).** [1 pt] Produce the QQ-plot of the <b>wls_residuals</b> vs the Normal distribution. Compared to the OLS residuals, is the Normal fit better? If so, how?

**A:** (Type your answer here.)

**(2l).** [1 pt] Since WLS is predicting $\widehat{w Y}$, transform the predictions back to $\hat{Y}$. Save the transformed predictions as <b>wls_y_hat_transformed</b>.

Hint: wls_y_hat_transformed = $W^{-1} \times$ wls_y_hat

**(2m).** [2 pts] On the same plot, produce in different colors:
1. Scatter plot of the observed data (x, y)
2. Plot (or scatterplot) (x, wls_y_hat_transformed)

Label your plots and axes clearly.

**(2n).** [2 pts] Using the WLS regression summaries and the plots, contrast the regression fit for OLS and WLS:
1. Are the coefficient estimates much different? Is that what you had expected?
2. Are the standard errors different? Is that what you had expected?

**A**: (Type your answer here.)

<b>Note:</b> A more careful approach would do an 80/20 split on the data, where the 20% is used for inferring the pattern of the noise and the 80% is used to construct Weighted Least Squares. This would make these two inferences independent and would better resemble the case where the noise variances are deterministic known numbers (which are, of course, independent of the WLS regression procedure). You are welcome to try it out on your own and see what you get!