## Notation

### $Y$: the response variable

### $y_i$: the observed value of the response variable for the $i$-th data point. In the context of regression, it's the actual measurement you have for a given predictor value $x_i$.

<br>

### $x_i$: This represents the value of the predictor variable for the $i$-th data point. These are the independent variables or features used to predict the response.

<br>

### $i$: an index that iterates through the individual data points or observations, typically from $1$ to $n$, where $n$ is the total number of observations.

<br>

### $f(x_i;\boldsymbol{\beta})$: model *function* that describes the expected value of the response variable at $x_i$, parameterized by the vector $\boldsymbol{\beta}$.

For a linear model, this would be $\beta_0 + \beta_1 x_i$. It represents the systematic part of the relationship between $x$ and $y$.

<br>

### $\boldsymbol{\beta}$: a *vector* of the model parameters that we want to estimate from the data.

For a simple linear regression, $\boldsymbol{\beta} = (\beta_0, \beta_1)^\top$, where $\beta_0$ is the intercept and $\beta_1$ is the slope.

### $\hat{\boldsymbol{\beta}}$: the *estimators* of $\boldsymbol{\beta}$

<br>

### $\mathcal{L}(\boldsymbol{\beta})$: likelihood function.

It quantifies how probable the observed data $(\mathbf{y})$ are, given a specific set of parameter values ($\boldsymbol{\beta}$) and the assumed probability distribution of the noise (in this case, Gaussian). Maximizing the likelihood function with respect to $\boldsymbol{\beta}$ gives the maximum likelihood estimates of the parameters.

<br>

### $\sigma_i$:  standard deviation of the noise or error for the $i$-th data point.

* In the case of ordinary least squares (OLS), $\sigma_i$ is assumed to be constant for all $i$ ($\sigma_i = \sigma$), representing *homoscedasticity*.

* In weighted least squares (WLS), $\sigma_i$ can vary for each data point, representing *heteroscedasticity*, and is assumed to be known or estimated independently. The weight $w_i$ is defined as $1/\sigma_i^2$.

## The model (assumptions)

Consider a series of independent observations $(x_i, y_i)$ for $i=1,\dots,n$ for which the response variable is denoted by $Y$.
* Model (linear in parameters):
  
  $$
f(x_i;\boldsymbol{\beta}) \;=\; \sum_{j=0}^{p-1} \beta_j\,\phi_j(x_i),
$$
  
  where $\{\phi_j\}$ are chosen basis functions (e.g., $\phi_0=1$, $\phi_1(x)=x$ for a line), and $\boldsymbol{\beta}=(\beta_0, \dots, \beta_{p-1})^\top$ is the parameter vector. The model is thus of the form $Y = \beta_0 + \beta_1 x + \epsilon$, where $\epsilon$ is the random error. An observation would thus be $y_i = \beta_0 + \beta_1 x_i + \epsilon_i$.
  
* Noise model (possibly heteroscedastic):
  
  $$
Y_i \mid x_i \;\sim\; \mathcal{N}\!\big(f(x_i;\boldsymbol{\beta}),\,\sigma_i^2\big).
$$
* Likelihood $\mathcal{L}$, log-likelihood $\mathcal{W}=\log \mathcal{L}$, least-squares sum $\mathcal{M}$.
* Estimators are indicated with a caret (e.g., $\hat{\boldsymbol{\beta}}$). Averages use bars (e.g., $\bar{x}$). The residual for each observation pair $(x_i, y_i)$ is $e_i = y_i - \hat{y}_i = y_i - (\hat{\beta}_0 + \hat{\beta}_1 x_i)$.



## Weighted Least Squares derived from Gaussian likelihood

The likelihood function for independent normally distributed measurements is given by:

<br>

$$
\mathcal{L}(\boldsymbol{\beta}) \;=\; \prod_{i=1}^n \frac{1}{\sqrt{2\pi}\,\sigma_i}
\exp\!\left[-\frac{\big(y_i - f(x_i;\boldsymbol{\beta})\big)^2}{2\sigma_i^2}\right].
$$

<br>

Note that $\boldsymbol{\beta}$ is a vector with the parameters we want to estimate. The log likelihood is then:

<br>

$$
\mathcal{W}(\boldsymbol{\beta}) \;=\; \sum_{i=1}^n \left[-\tfrac{1}{2}\log(2\pi)-\log\sigma_i
-\frac{\big(y_i - f(x_i;\boldsymbol{\beta})\big)^2}{2\sigma_i^2}\right].
$$

<br>

Maximizing $\mathcal{W}$ is equivalent to minimizing

<br>

$$
\mathcal{M}(\boldsymbol{\beta})
\;=\; \sum_{i=1}^n w_i\,\big(y_i - f(x_i;\boldsymbol{\beta})\big)^2,
\qquad
w_i \;=\; \frac{1}{\sigma_i^{2}}.
$$

<br>

This is **weighted least squares (WLS)**. If all $\sigma_i=\sigma$, the common factor cancels and we recover **ordinary least squares (OLS)**.



## Alternative notation to match textbook: Least Squares with $\bar y_i$(pointwise mean) notation

### Setup and notation

* Observations: $(x_i, y_i)$ for $i=1,\dots,n$, independent. The response variable is denoted by $Y$.
* True/noiseless value at $x_i$:
$$
\bar y_i \;=\; f(x_i;\boldsymbol{\beta}) \;=\; \sum_{j=0}^{p-1} \beta_j\,\phi_j(x_i).
$$
* Noise model (possibly heteroscedastic):
$$
Y_i \mid x_i \;\sim\; \mathcal{N}\!\big(\bar y_i,\,\sigma_i^2\big).
$$
* Likelihood $\mathcal L$, log-likelihood $\mathcal W=\log\mathcal L$, least-squares sum $\mathcal M$.
* Estimators carry a caret (e.g., $\hat{\boldsymbol{\beta}}$). **Warning on bars:** $\bar y_i$ is the *mean at point $i$*; $\bar y$ is the *sample average* of the $y_i$’s. The residual for each observation pair $(x_i, y_i)$ is $e_i = y_i - \hat{y}_i = y_i - (\hat{\beta}_0 + \hat{\beta}_1 x_i)$.

### From Gaussian likelihood to (weighted) least squares

Independence gives
$$
\mathcal L(\boldsymbol{\beta})
=\prod_{i=1}^n \frac{1}{\sqrt{2\pi}\,\sigma_i}
\exp\!\left[-\frac{(y_i-\bar y_i)^2}{2\sigma_i^2}\right].
$$

Log:
$$
\mathcal W(\boldsymbol{\beta})
=\sum_{i=1}^n\left[-\tfrac{1}{2}\log(2\pi)-\log\sigma_i
-\frac{(y_i-\bar y_i)^2}{2\sigma_i^2}\right].
$$

Maximizing $\mathcal W$ is equivalent to minimizing
$$
\boxed{
\mathcal M(\boldsymbol{\beta})
=\sum_{i=1}^n w_i\,(y_i-\bar y_i)^2,
\qquad w_i=\sigma_i^{-2}
}.
$$

# The Assignment

## Assignment as written

Create an Excel (or Google) spreadsheet to fit a set of $n=8-12$ data points to a line. For $x_i$, use the index 1 through $n$. Generate your model $y_i$ data with a slope $\beta_1$ between 1 and 3 (you choose) and an intercept $\beta_0$ between -1 and 1 (you choose).

Create two worksheets. The first worksheet should estimate $\beta_0$ and $\beta_1$ assuming a constant variance of 1, i.e. ordinary least squares. The second worksheet should carry out a weighted least squares fit assuming that the variance is equal to $y_i+\epsilon$. Compare the values for the slope and intercept in each case, and comment on your observations.

## Gemini interpretation of the assignment

The goal of this assignment is to explore Ordinary Least Squares (OLS) and Weighted Least Squares (WLS) regression in the presence of heteroscedastic noise. We will generate synthetic data with a known underlying linear relationship and non-constant noise variance. Then, we will perform both OLS and WLS on this data, using the known variances for the weights in WLS. Finally, we will compare the results of the two methods, paying close attention to the estimated parameters and their variances.

# OLS for a straight line with constant variance

Take $\phi_0(x)=1$, $\phi_1(x)=x$; parameters $\beta_0$ (intercept), $\beta_1$ (slope). With constant $\sigma$, $W\propto I$ and the proportionality cancels. Define

$$
\bar{x}=\frac{1}{n}\sum_{i=1}^n x_i,\qquad
\bar{y}=\frac{1}{n}\sum_{i=1}^n y_i,
$$

$$
S_{xx}=\sum_{i=1}^n (x_i-\bar{x})^2,\qquad
S_{xy}=\sum_{i=1}^n (x_i-\bar{x})(y_i-\bar{y}).
$$

### OLS estimators

$$
\boxed{\,\hat{\beta}_1=\frac{S_{xy}}{S_{xx}},\qquad \hat{\beta}_0=\bar{y}-\hat{\beta}_1\,\bar{x}\,}.
$$

### OLS Estimators with explicit sums:

$$
\boxed{\,\hat{\beta}_1=\frac{\sum_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^n (x_i-\bar{x})^2},\qquad \hat{\beta}_0=\bar{y}-\hat{\beta}_1\,\bar{x}\,}.
$$

<br>

### Variance Estimates

- The fitted pointwise means are $\hat{y}_i=\bar y_i(\hat{\boldsymbol{\beta}})=\hat{\beta}_0+\hat{\beta}_1 x_i$

- Residuals are $e_i=y_i-\hat{y}_i =y_i-(\hat{\beta}_0+\hat{\beta}_1 x_i)$.

- $\mathrm{SSE}=\sum e_i^2$.



$$
\widehat{\sigma}^2_{\mathrm{MLE}}=\frac{\mathrm{SSE}}{n},
\qquad
\widehat{\sigma}^2=\frac{\mathrm{SSE}}{n-2}\quad(\text{unbiased; here }p=2).
$$



## OLS Estimator Expressions without the averages:

$$
\sum_{i=1}^n (x_i-\bar{x})(y_i-\bar{y}) = \sum_{i=1}^n (x_i y_i - x_i \bar{y} - \bar{x} y_i + \bar{x}\bar{y}) = \sum x_i y_i - \bar{y}\sum x_i - \bar{x}\sum y_i + \sum \bar{x}\bar{y} = \sum x_i y_i - \frac{\sum y_i}{n}\sum x_i - \frac{\sum x_i}{n}\sum y_i + n \frac{\sum x_i}{n}\frac{\sum y_i}{n} = \sum x_i y_i - \frac{(\sum x_i)(\sum y_i)}{n} - \frac{(\sum x_i)(\sum y_i)}{n} + \frac{(\sum x_i)(\sum y_i)}{n} = \sum x_i y_i - \frac{(\sum x_i)(\sum y_i)}{n}
$$

$$
\sum_{i=1}^n (x_i-\bar{x})^2 = \sum_{i=1}^n (x_i^2 - 2x_i\bar{x} + \bar{x}^2) = \sum x_i^2 - 2\bar{x}\sum x_i + \sum \bar{x}^2 = \sum x_i^2 - 2\frac{\sum x_i}{n}\sum x_i + n\left(\frac{\sum x_i}{n}\right)^2 = \sum x_i^2 - \frac{2(\sum x_i)^2}{n} + \frac{(\sum x_i)^2}{n} = \sum x_i^2 - \frac{(\sum x_i)^2}{n}
$$

So,

$$
\boxed{\,\hat{\beta}_1=\frac{\sum_{i=1}^n x_i y_i - \frac{1}{n}(\sum x_i)(\sum y_i)}{\sum_{i=1}^n x_i^2 - \frac{1}{n}(\sum x_i)^2},\qquad \hat{\beta}_0=\bar{y}-\hat{\beta}_1\,\bar{x}\,}.
$$


# Weighted straight line (known $\sigma_i$)

## Expressions with average values

With $w_i=1/\sigma_i^2$,

$$
\bar{x}_w=\frac{\sum_i w_i x_i}{\sum_i w_i},\qquad
\bar{y}_w=\frac{\sum_i w_i y_i}{\sum_i w_i},
$$

$$
S_{xx,w}=\sum_i w_i (x_i-\bar{x}_w)^2,\qquad
S_{xy,w}=\sum_i w_i (x_i-\bar{x}_w)(y_i-\bar{y}_w),
$$

$$
\boxed{\,\hat{\beta}_1=\frac{S_{xy,w}}{S_{xx,w}},\qquad \hat{\beta}_0=\bar{y}_w-\hat{\beta}_1\,\bar{x}_w\,}.
$$

## Expressions without averages

$$
S_{xy,w} = \sum_i w_i (x_i-\bar{x}_w)(y_i-\bar{y}_w) = \sum_i w_i (x_i y_i - x_i \bar{y}_w - \bar{x}_w y_i + \bar{x}_w \bar{y}_w) = \sum_i w_i x_i y_i - \bar{y}_w \sum_i w_i x_i - \bar{x}_w \sum_i w_i y_i + \bar{x}_w \bar{y}_w \sum_i w_i
$$
Substitute $\bar{x}_w = \frac{\sum_i w_i x_i}{\sum_i w_i}$ and $\bar{y}_w = \frac{\sum_i w_i y_i}{\sum_i w_i}$:
$$
S_{xy,w} = \sum_i w_i x_i y_i - \left(\frac{\sum_i w_i y_i}{\sum_i w_i}\right) \sum_i w_i x_i - \left(\frac{\sum_i w_i x_i}{\sum_i w_i}\right) \sum_i w_i y_i + \left(\frac{\sum_i w_i x_i}{\sum_i w_i}\right) \left(\frac{\sum_i w_i y_i}{\sum_i w_i}\right) \sum_i w_i
$$
$$
S_{xy,w} = \sum_i w_i x_i y_i - \frac{(\sum_i w_i y_i)(\sum_i w_i x_i)}{\sum_i w_i} - \frac{(\sum_i w_i x_i)(\sum_i w_i y_i)}{\sum_i w_i} + \frac{(\sum_i w_i x_i)(\sum_i w_i y_i)}{\sum_i w_i} = \sum_i w_i x_i y_i - \frac{(\sum_i w_i x_i)(\sum_i w_i y_i)}{\sum_i w_i}
$$

$$
S_{xx,w} = \sum_i w_i (x_i-\bar{x}_w)^2 = \sum_i w_i (x_i^2 - 2x_i\bar{x}_w + \bar{x}_w^2) = \sum_i w_i x_i^2 - 2\bar{x}_w \sum_i w_i x_i + \bar{x}_w^2 \sum_i w_i
$$

Substitute $\bar{x}_w = \frac{\sum_i w_i x_i}{\sum_i w_i}$:

$$
S_{xx,w} = \sum_i w_i x_i^2 - 2\left(\frac{\sum_i w_i x_i}{\sum_i w_i}\right) \sum_i w_i x_i + \left(\frac{\sum_i w_i x_i}{\sum_i w_i}\right)^2 \sum_i w_i
$$
$$
S_{xx,w} = \sum_i w_i x_i^2 - \frac{2(\sum_i w_i x_i)^2}{\sum_i w_i} + \frac{(\sum_i w_i x_i)^2}{\sum_i w_i} = \sum_i w_i x_i^2 - \frac{(\sum_i w_i x_i)^2}{\sum_i w_i}
$$

So, the weighted least squares estimators are:

$$
\boxed{\,\hat{\beta}_1=\frac{\sum_i w_i x_i y_i - \frac{(\sum_i w_i x_i)(\sum_i w_i y_i)}{\sum_i w_i}}{\sum_i w_i x_i^2 - \frac{(\sum_i w_i x_i)^2}{\sum_i w_i}},\qquad \hat{\beta}_0=\bar{y}_w-\hat{\beta}_1\,\bar{x}_w\,}.
$$