# Statistics/Econometrics

This notebook has a focus on statistics and econometrics. Following the youtube series here by Ben Lambert:
 - [A full course in econometrics - undergraduate level - part 1](https://www.youtube.com/playlist?list=PLwJRxp3blEvZyQBTTOMFRP_TDaSdly3gU)
 - [A full course in econometrics - undergraduate level - part 2](https://www.youtube.com/playlist?list=PLwJRxp3blEvb7P-7po9AxuBwquPv75LjU)

## Super summary


**Esitamtors**
 - We estimate population parameters $\beta^p$ using sample estimates $\hat\beta$. We want these estimates to be
     - Unbiased $\mathbb{E}(\beta_i) = \beta^p$
     - Consistent $\lim_{n \to \infty} \beta_n = \beta_p$
     - Efficient (close to the true value)
     - Ideally linear in parameters

**Least squares estiamtion**
 - For a population process which connects an output variable (dependent variable) with an independent variable we can model with a linear model (line of best fit) $\hat{y}=\hat{\beta}X+\hat{\alpha}$.
 - In least squares we find the parameters to minimise $S = \sum_{i=1}^N(y_i - \hat{y}_i)^2$
 - Such that $\hat{\beta} = \frac{\sum_{i=1}^N(x_i-\bar{x})(y_i - \bar{y})}{\sum_{i=1}^N(x_i - \bar{x})^2} = \frac{Cov(x_i, y_i)}{Var(x_i)}$ and  $\hat{\alpha} = \bar{y} - \hat{\beta}\bar{x}$
 
**Expectation, moments, variance, kurtosis, skewness**
 - We define the expectation of a random variable $\mathbb{E}[X] = \int_{-\inf}^{\inf}f_y(x)x\,dx$
 - We define the kth moment of X as $\mathbb{E}[X^k] = \int_{-\inf}^{\inf}f_x(x)x^k\,dx$
 - $\mathbb{E}[(X-\bar{X})^2]$ is known as the 2nd central moment or more commonly the variance.
 - The kurtosis is defined as the 4th standardised moment $Kurt[X] = \mathbb{E}[(\frac{X-\bar{X}}{\sigma})^4]$. Large Kurtosis indicates fat tails and vice versa.
 - Skewness is the third central moment $\mathbb{E}[(X-\bar{X})^3]$\
 - $\mathbb{E}[aX + bY] = a\mathbb{E}[X] + b\mathbb{E}[Y]$
 - $\text{Var}(aX + bY) = a^2\text{Var}(X) + b^2\text{Var}(Y) + 2ab\,\text{Cov}(X,Y)$

----
----
----
----

### Video 1 - Undergraduate econometrics syllabus

Summary of course

 - Estimating population parameters from sample (using estimators)
 - Cross sectional data (no time element)
 - OLS (ordinary least squares) is BLUE (best linear unbiased estimator) under gauss markoff conditions
 - Consistency
 - Instrumental variables
 - GLS
 - Time series
 - Stationary time series
 - Auto regressive (1)
 - Moving average (1)
 - Panel Data
 - Data generating process
 - Between estimator
 - Withing estimator
 - Fixed/Random effects

### Video 2 - What is econometrics

 - Econometrics helps to explain relationships (e.g. TV ad spend and sales)
 - Estimating Population parameters from samples (sampling error)

### Video 3 - Econometrics vs hard science

 - Can't always use a controlled test (AB test)
 - Issues with reverse-causal effects (y actually causes x, or confounding variable)


### Video 4 - Natural experiments in econometrics

 - An experiment that resulted due to another action e.g. army conscription by birth date

### Video 5 - Populations and samples in Economics

 - Examples of a population - e.g. UK under 18s
 - We often want to understand relationships e.g. impact of years of education on wages $W=\alpha + \beta E$
 - We often only have data for a sample
 - When estimating parameters from a sample this parameter will differ from the population parameter and we call this sampling error $\beta_s\neq\beta$

### Video 6 - Estimators - the basics

 - Effect of years of education on wages $W=\alpha + \beta E$ in UK under 18s.
 - Let's say we only have 1000 in our sample we want to make an estimator $\hat\beta$ of the population parameter $\beta^\star$ from the sample data

### Video 7- Estimator properties

 - What does it mean to be a good estimator?
 - If we take many samples $S_1, S_2, S_3, ...$ from the population and calculate estimators $\beta_1, \beta_2, \beta_3, ...$ from the samples using our estimator formula.
 - We want the expectation of the estimates to be equal to the population parameter $\mathbb{E}(\beta_i) = \beta^p$. This is known as an **Unbiased estimator**
 - We also want our estimate to tend to the population parameter as the sample size n increases $\lim_{n \to \infty} \beta_n = \beta_p$. This is known as a **Consistent estimator**

### Video 8 - Unbiasedness and consistency

 - It is possible to have a bias consistent estimator

### Video 9 - Unbiasedness vs consistency of estimators - an example

 - An example of a consistent but bias estimator is using $\hat{\mu}=\frac{1}{N-1}\sum{X_i}$ to estimate the population mean from a sample $X_1, X_2, X_3, ...$

### Video 10 - Efficiency of estimators

 - The efficiency of an estimator $\tilde\beta$ relates to how close the estimate will be for a given sample size

### Video 11 - Good estimator properties summary

 - Unbiased, consistent, efficient and linear in parameters

### Video 12 - Lines of best fit in econometrics

 - Looking at the example of the impact of education, X on wages, Y. We can write the relationship for the population as $Y = \beta^PX + \alpha$. We can do similar for a sample $Y = \beta^SX + \alpha$. We are using $\beta^S$ to estimate $\beta^P$.

### Video 13 - The mathematics behind a line of best fit

 - Again we are modeling the situation $Y = \hat{\beta}X + \hat{\alpha}$
 - We can minimise the sum of abs errors $S = \sum_{i=1}^N|y_i - \hat{y}_i|$
 - We can also minimise the sum of the square of the errors $S = \sum_{i=1}^N(y_i - \hat{y}_i)^2$

### Video 14 - Least squares Estimators as Blue

 - We want an estimate to be Unbiased, consistent and efficient
 - BLUE - Best Linear Unbiased Estimator
 - Best means there are no other linear estimators that are more efficient
 - Under the Gauss Markov assumptions least squares estimators are BLUE

### Video 15 - Deriving Least Squares Estimators - part 1

 - Deriving $\hat{\alpha}$ and $\hat{\beta}$ where $\hat{y}=\hat{\beta}X+\hat{\alpha}$ to minimise $S = \sum_{i=1}^N(y_i - \hat{y}_i)^2$
 - To solve this we want $\frac{\partial S}{\partial \hat{\alpha}} = 0, \frac{\partial S}{\partial \hat{\beta}} = 0$
 - We can write S as $S = \sum_{i=1}^N(y_i - \hat{\beta}x_i+\hat{\alpha})^2$ so then we have the following conditions
 - $\frac{\partial S}{\partial \hat{\alpha}} = -2\sum_{i=1}^N(y_i - \hat{\beta}x_i+\hat{\alpha}) = 0$
 - $\frac{\partial S}{\partial \hat{\beta}} = -2\sum_{i=1}^Nx_i(y_i - \hat{\beta}x_i+\hat{\alpha}) = 0$

### Video 16 - Deriving Least Squares Estimators - part 2

 - Note based on the definition of the mean we can write: $\sum_{i=1}^Nx_i=N\bar{x}$ and  $\sum_{i=1}^Ny_i=N\bar{y}$
 - Also note $\sum_{i=1}^N(x_i - \bar{x})(y_i - \bar{y}) = \sum_{i=1}^Ny_i(x_i - \bar{x}) = \sum_{i=1}^Nx_i(y_i - \bar{y})$
 Proof:
 $$\sum_{i=1}^N(x_i - \bar{x})(y_i - \bar{y}) \\
  = \sum_{i=1}^N(x_iy_i - x_i\bar{y} - \bar{x}y_i + \bar{x}\bar{y}) \\
  = \sum_{i=1}^Nx_iy_i - \bar{y}\sum_{i=1}^Nx_i - \bar{x}\sum_{i=1}^Ny_i + N\bar{x}\bar{y} \\
  = \sum_{i=1}^Nx_iy_i - \bar{y}N\bar{x} - \bar{x}N\bar{y} + N\bar{x}\bar{y} \\
  = \sum_{i=1}^Nx_iy_i - N\bar{x}\bar{y} \\
  = \sum_{i=1}^Nx_iy_i - \bar{x}\sum_{i=1}^Ny_i \\
  = \sum_{i=1}^Ny_i(x_i - \bar{x}) \\
 $$
 

### Video 17 - Deriving Least Squares Estimators - part 3

 - Two videos ago we differentiated S to get the following conditions:
 - $\frac{\partial S}{\partial \hat{\alpha}} = -2\sum_{i=1}^N(y_i - \hat{\beta}x_i+\hat{\alpha}) = 0$ (1)
 - $\frac{\partial S}{\partial \hat{\beta}} = -2\sum_{i=1}^Nx_i(y_i - \hat{\beta}x_i+\hat{\alpha}) = 0$ (2)
 - Using (1) $\sum_{i=1}^Ny_i = N\hat{\alpha} + \hat{\beta}\sum_{i=1}^Nx_i$
 - hence $N\bar{y} = \hat{\alpha}N + \hat{\beta}N\bar{x}$
 - hence $\bar{y} = \hat{\alpha} + \hat{\beta}\bar{x}$ (so the line goes through the means)

### Video 18 - Deriving Least Squares Estimators - part 4

 - From the previous video we have $\hat{\alpha} = \bar{y} - \hat{\beta}\bar{x}$
 - Using the second equation we have $\sum_{i=1}^Nx_iy_i = \hat{\alpha}\sum_{i=1}^Nx_i + \hat{\beta} \sum_{i=1}^Nx_i^2$
 - hence $\sum_{i=1}^Nx_iy_i = \hat{\alpha}N\bar{x} + \hat{\beta} \sum_{i=1}^Nx_i^2$
 - subbing in $\hat{\alpha}$ we see  $\sum_{i=1}^Nx_iy_i =(\bar{y} - \hat{\beta}\bar{x})N\bar{x} + \hat{\beta} \sum_{i=1}^Nx_i^2$
 - hence $\sum_{i=1}^Nx_iy_i=\bar{y}N\bar{x} - \hat{\beta}\bar{x}N\bar{x} + \hat{\beta} \sum_{i=1}^Nx_i^2$

### Video 19 - Deriving Least Squares Estimators - part 5

 - From the last video $\sum_{i=1}^Nx_iy_i=\bar{y}N\bar{x} - \hat{\beta}\bar{x}N\bar{x} + \hat{\beta} \sum_{i=1}^Nx_i^2$
 - hence $\sum_{i=1}^Nx_iy_i - \bar{y}N\bar{x} = \hat{\beta}(\sum_{i=1}^Nx_i^2 - \bar{x}N\bar{x} )$
 - hence $\hat{\beta} = \frac{\sum_{i=1}^Nx_iy_i - \bar{y}N\bar{x}}{\sum_{i=1}^Nx_i^2 - \bar{x}N\bar{x}}$
 - so $\hat{\beta} = \frac{\sum_{i=1}^N(x_i-\bar{x})(y_i - \bar{y})}{\sum_{i=1}^N(x_i - \bar{x})^2} = \frac{Cov(x_i, y_i)}{Var(x_i)}$
 

### Video 20 - Least squares estimators - in summary

 - Using sample to estimate relationships in the population.
 - In a wages vs education model the $\hat{\alpha}$ estimates the wages for someone with no education and $\hat{\beta}$ estimates the increase in wage for every extra year of education.

### Video 21 - Taking the expectation of a random variable

 - A discrete random variable X takes an integer number of values $v_1, v_2, v_3,...,v_k$ with certain probabilities $p_1, p_2, p_3, ..., p_k$
 - We define the expectation $\mathbb{E}[X] = \sum_xP(X=x)x$
 - A continuous random variable Y takes a continuous range of values over some interval and has a probability distribution function $f_y$ defining the probability of all continuous values. 
 - We define the expectation $\mathbb{E}[X] = \int_{-\inf}^{\inf}f_x(x)x\,dx$

### Video 22 - Moments of a random variable

 - We define the expectation of a random variable $\mathbb{E}[X] = \int_{-\inf}^{\inf}f_x(x)x\,dx$
 - We define the expectation of $X^2$ as $\mathbb{E}[X^2] = \int_{-\inf}^{\inf}f_x(x)x^2\,dx$. This is known as the second moment.
 - We define the kth moment of X as $\mathbb{E}[X^k] = \int_{-\inf}^{\inf}f_x(x)x^k\,dx$

### Video 23 - Central moments of a random variable

 - $\mathbb{E}[(X-\bar{X})^2]$ is known as the 2nd central moment or more commonly the variance. The variance tells us about as the shoulders of the distribution (the spread).
 - variance = 0 for a constant random variable.

### Video 24 - Kurtosis

 - $\mathbb{E}[(X-\bar{X})^4]$ is known as the 4th central moment.
 - The kurtosis is defined as the 4th standardised moment $Kurt[X] = \mathbb{E}[(\frac{X-\bar{X}}{\sigma})^4]$
 - The excess Kurtosis is defined as the kurtosis - 3 as the standard normal has kurtosis = 3. If the excess Kurtosis (sometimes just called Kurtosis) is negative that indicates thin tails and positive indicates fat tails.
 - The kurtosis helps to understand the tails of the distribution.

### Video 25 - Skewness

 - Skewness is the standardised third central moment $\mathbb{E}[(\frac{X-\bar{X}}{\sigma})^3]$. A random variable 

### Video 26 - Expectations and variance properties

 - $\mathbb{E}[aX] = \int_{-\inf}^{\inf}aXf_x(x)dx = a \mathbb{E}[X]$
 - $Var(aX) = \mathbb{E}[(aX - a \bar{X})^2] = a^2 Var(X)$
 - $\mathbb{E}[aX + bY] = a\mathbb{E}[X] + b\mathbb{E}[Y]$
 - $
 Var(aX + bY) = \mathbb{E}[(aX + bY - a \bar{X} - b\bar{y})^2]\\
    = \mathbb{E}[((aX - a\bar{X}) + (bY - b\bar{Y}))^2] \\
    = \mathbb{E}[(aX - a\bar{X})^2] + \mathbb{E}[(bY - b\bar{Y})^2] + 2\mathbb{E}[(aX - a\bar{X})(bY - b\bar{Y})]\\
    = a^2 Var(X) + b^2Var(Y) + 2abCov(X,Y)
   $

### Video n - zzzz

 - zzzzz

### Video n - zzzz

 - zzzzz