# HS 421 Econometric Data Science
## Review of Statistics and Probability Part 1
### Sunil Paul
(Reference: Stock and Watson)
***
## Some definitions

**Outcome** are the mutually exclusive potential results of a random process
  - Example1:  Your grade on the exam,
  - Example 2 : while attending the online class, the wireless connection might never fail, it might fail once, it might fail twice, and so on
  
**Random variable** is a numerical summary of a random outcome
- The number of times your wireless connection might fail is random and takes on a numerical value (0,1,2,3 or 4).

There are two types of random variables:

- **Discrete random variable** takes on discrete number of values, like 0,1,2,
- **Continuous random variable** takes on a continuum of possible values
- Each outcome of a discrete random variable occurs with a certain probability

A **Probability distribution** of a discrete random variable is the list of possible values of the variable and the probability that each value willoccur.

A **cumulative probability distribution** is the probability that the random variable is less than or equal to a particular value. For example, the probability of at most one connection failure,$ Pr(M \le 1)$, is 90%, which is the sum of the probabilities of no failures (80%) and of one failure (10%).
- A cumulative probability distribution is also referred to as a cumulative distribution function or a CDF.

<center><b>Probability of your wireless Network Connection failing M times</b></center>

|     |   |   | |||
|---|---|---|---|---|---|
| **Outcome (number of failures)**  |0|1|2|3|4|
|**Probability distribution**|0.80|0.10|0.06|0.03|0.01|
|**Cumulative probability distribution**|0.80|0.90|0.96|0.99|1.00|

- Tomorrow’s temperature is an example of a continuous random variable
- The CDF is defined similar to a discrete random variable.
- A probability distribution that lists all values and the probability of each value is not suitable for a continuous random variable.
- Instead the probability is summarized in a probability density function (PDF/ density)

## Measures of the shape of a probability distribution
The **expected value** or mean of a random variable is the average value over many repeated trails or occurrences.

Suppose a discrete random value Y takes on k possible values
$$E(Y)=\sum_{i=1}^{k}y_i.Pr(Y=y_i)=\mu_y$$

Expected value of a continuous random variable
$$E(Y)=\int_{-\infty}^{\infty}yf(y) dy=\mu_y$$
The **variance** of a random variable Y is the expected value of the square of
the deviation of Y from its mean.
- The variance is a measure of the spread of a probability distribution.
- Suppose a discreet random variable Y takes on k possible values
$$\sigma^2_Y=Var(Y)=E[(Y-\mu_Y)^2]=\sum_{i-1}^{k}(y_i=\mu_Y)^2.Pr(Y=y_i)$$

**Skewness** is a measure of the lack of symmetry of a distribution

- For a symmetric distribution positive values of $(Y − \mu_Y)^3$ are offset by negative values (equally likely) and skewness is 0
- For a negatively (positively) skewed distribution negative (positive) values $(Y − \mu_Y)^3$ are more likely and the skewness is negative (positive).
- Skewness is unit free.

**Kurtosis** is a measure of how much mass is in the tails of a distribution

- If a random variable has extreme values “outliers” the kurtosis will be high
- The Kurtosis is unit free and cannot be negative

Mean($E(Y)$  is also called the first moment of Y, and the expected value of the square of Y, is called the second moment of Y $E(Y^2)$. In general, expected value of $Y^r$ is called the $r^{th}$ moment of the random variable Y. That is, the $r^{th}$ 
moment of Y is $E(Y^r)$. The skewness is a function of the first, second, and third
moments of Y, and the kurtosis is a function of the first through fourth moments of Y.

## Two random variables and their joint distribution

- Most of the interesting questions in economics involve 2 or more random variables
- Answering these questions requires understanding the concepts of joint, marginal and conditional probability distribution.

The **joint probability distribution** of two random variables X and Y can bewritten as $Pr(X = x; Y = y)$

- Let Y equal 1 if it rains and 0 if it does not rains.
- Let X equal 1 if it is very cold and 0 if it is not very cold.

<p style="text-align: center;">Joint probability distribution of X and Y</p>
<table>
     
<tr>
    <th> </th>
    <th>Very cold (X = 1)</th>
    <th> Not very cold (X = 0) </th>
    <th>Total</th> 
</tr>

<tr>
    <td> rains (Y = 1) </td>
    <td>0.15 </td> 
    <td> 0.07</td> 
    <td> 0.22</td>
</tr>

<tr>
    <td>No rain (Y = 0) </td>
    <td>0.15 </td> 
    <td>0.63</td> 
    <td> 0.78</td>   
</tr>
<tr>
   <td> Total </td>
     <td>0.30 </td>
          <td>0.70 </td>
               <td>1.00</td>
    </tr>
</table>

The **marginal probability distribution** of a random variable is just another
name for its probability distribution
- The marginal distribution of Y can be computed from the joint distribution of X and Y by adding up the probabilities of all possible outcomes for which Y takes a specific value
  $$Pr(Y=y)=\sum_{i=1}^{k}Pr(X=x_i,Y=y)$$
    - The probability that it will rain
     $$Pr(Y=1)=Pr(X=1,Y=1)+Pr(X=0,Y=1)=0.22$$

<p style="text-align: center;">Joint probability distribution of X and Y</p>
<table>
     
<tr>
    <th> </th>
    <th>Very cold (X = 1)</th>
    <th> Not very cold (X = 0) </th>
    <th>Total</th> 
</tr>

<tr>
    <td> rains (Y = 1) </td>
    <td><p style="text-align: center;">0.15 </p></td> 
    <td> <p style="text-align: center;">0.07</p></td> 
    <td> <p style="text-align: center;color:red">  0.22</p></td>
</tr>

<tr>
    <td>No rain (Y = 0) </td>
    <td><p style="text-align: center;">0.15</p> </td> 
    <td><p style="text-align: center;">0.63</p> </td> 
    <td><p style="text-align: center;color:red">  0.78</p></td>
</tr>
<tr>
   <td> Total </td>
     <td><p style="text-align: center;color:red">  0.30</p></td>
     <td><p style="text-align: center;color:red">  0.70</p></td>
               <td>1.00</td>
    </tr>
</table
  
  

  

The **conditional distribution** is the distribution of a random variable conditional on another random variable taking on a specific value.
- The conditional probability that it rains given that is it very cold
$$Pr(Y=1|X=1)=\frac{0.15}{0.3}=0.5$$
- In general the conditional distribution of Y given X is
$$Pr(Y=y|X=x)=\frac{Pr(Y=y,X=x)5}{Pr(X=x)}=0.5$$
- The conditional expectation of Y given X is
$$E(Y|X=x)=\sum_{i=1}^{k}y_iPr(Y=y_i|X=x)$$
- The expected value of rain given that it is very cold equals
$$E(Y|X=x)=1.Pr(Y=1|X=1)+0.Pr(Y=0|X=1)=1(0.5)+0(0.5)=0.5$$

**Law of iterated expectations** states that the mean of Y is the weighted average of the conditional expectation of Y given X, weighted by the probability distribution of X:

**Independence**: Two random variables X and Y are independent if the conditional distribution of Y given X does not depend on X
$$Pr(Y=y|X=x)=Pr(Y=y)$$

- If X and Y are independent this also implies
\begin{align}
Pr(X=x,Y=y)&=Pr(X=x).Pr(Y=y|X=x)\\&=Pr(X=x).Pr(Y=y)\end{align}

**Mean independence**: The conditional mean of Y given X equals the unconditional mean of Y
$$E(Y|X)=E(Y)$$
- For example if the expected value of rain (Y) does not depend on whether it is very cold (X) $E(Y|X = 1) = E(Y|X = 0) = E (Y )$

The **covariance** is a measure of the extend to which two random variables X and Y move together,
\begin{align}
Cov(X,Y)&=\sigma_{XY}\\&E[(X-\mu_X).(Y-\mu_Y)]\\&\sum_{i=1}^{k}\sum_{j=1}^{m}(x_j-\mu_X)(y_i-\mu_Y). Pr(X=x_j,Y=y_i)
\end{align}

<p style="text-align: center;">Joint probability distribution of X and Y</p>
<table>
     
<tr>
    <th> </th>
    <th>Very cold (X = 1)</th>
    <th> Not very cold (X = 0) </th>
    <th>Total</th> 
</tr>

<tr>
    <td> rains (Y = 1) </td>
    <td><p style="text-align: center;">0.15 </p></td> 
    <td> <p style="text-align: center;">0.07</p></td> 
    <td> <p style="text-align: center;color:red">  0.22</p></td>
</tr>

<tr>
    <td>No rain (Y = 0) </td>
    <td><p style="text-align: center;">0.15</p> </td> 
    <td><p style="text-align: center;">0.63</p> </td> 
    <td><p style="text-align: center;color:red">  0.78</p></td>
</tr>
<tr>
   <td> Total </td>
     <td><p style="text-align: center;color:red">  0.30</p></td>
     <td><p style="text-align: center;color:red">  0.70</p></td>
               <td>1.00</td>
    </tr>
</table>
  
*Example* the covariance between snow (Y) and it being very cold (X):
    
 \begin{align*}
    Cov(X,Y) = & (1-0.3)(1-0.22).0.15+\\&(1-0.3)(0-0.22).0.15+\\&(0-0.3)(1-0.22).0.07+\\&(0-0.3)(0-0.22).0.63\\=&0.084\\
    \end{align*}
    
    


The units of the covariance of X and Y are the units of X multiplied bythe units of Y
- This makes it hard to interpret the size of the covariance.
- The correlation between X and Y is unit free:

$$Corr (X, Y ) =\frac{Cov(X,Y)}{\sqrt{Var(X)Var(Y)}}=\frac{\sigma_{XY}}{\sigma_X\sigma_Y} $$

- A correlation is always between -1 and 1 and X and Y are uncorrelated if $Corr (X, Y ) = 0$
- If the conditional mean of Y does not depend on X, X and Y are uncorrelated:
$ E(Y |X) = E (Y )\text{ then} Cov(X, Y ) = 0 $
- If X and Y are uncorrelated this does not necessarily imply mean Independence!



## Examples of often used probability distributions in Econometrics
### Normal distribution
The most often encountered probability density function in econometrics is the Normal distribution:
$$f_Y=\frac{1}{\sigma\sqrt{2\pi}}exp \left[ \frac{1}{2}\left( \frac{y-\mu}{\sigma}\right)\right]$$

A normal distribution with mean $\mu$ and standard deviation $\sigma$ is denoted as $N(\mu,\sigma)$

A standard normal distribution $N(0,1)$ has $\mu = 0$ and $\sigma = 1$

- A random variable with a $N(0,1)$ distribution is often denoted by Z and the
CDF is denoted by $\phi(z) = Pr(Z \le z)$

We can use _pnorm()_ to get the probabilities from a normal distribution given the mean and standard deviation (use help function to know more about this function)



pnorm(0, mean =5 , sd =2 , lower.tail=TRUE )

### The Chi-Squared distribution

The **chi-squared distribution** is the distribution of the sum of m squared independent standard normal random variables
- Let $Z_1, Z_2,...,Z_m$ be m independent standard normal random variables
- The sum of the squares of these random variables has a chi-squared distribution with m degrees of freedom



- The chi-squared distribution is used when testing hypotheses in econometrics

### The Student t distribution
Let Z be a standard normal random variable and W a Chi-Squareddistributed random variable with m degrees of freedom

- The Student t-distribution with m degrees of freedom is defines as
$$\frac{Z}{\sqrt{W/m}}$$


- The _t distribution_ has fatter tails than the standard normal distribution.
- When $m \ge 30$ it is well approximated by the standard normal distribution.

### F distribution
Let W a chi-squared random variable with m degrees of freedom and V a chi-squared random variable with n degrees of freedom.

The F-distribution with m and n degrees of freedom $F_{m,n}$ is the distribution of the random variable $\frac{W/m}{V/n}$

