# Week 7: Estimation

## Estimation

This week you'll learn the principles of deriving estimators, estimators of the mean and the standard deviation and their basic properties. 


### Reminder
- An estimator $\hat{\alpha}$ is a random variable obtained from a particular sample
    - e.g. The sample mean $\hat{\mu}$ in a binomial experiment



- We derive the estimator $\hat{\alpha}$ in two ways:
    - Method of moments
    - Least Squares

- Measures of central tendency:
    - Mean: average of all number in a sample

    - Mode: Most often appearing value in a distribution

    - Median: Middle term in a sorted distribution. 



- relative position of measures for skewed data 

![central](../central.png)


### Definitions

Unbiasedness: 
- The Expected value of the estimator is equal to the true value of the parameter, 
    - for the mean: $\mathop{\mathbb{E}}(\hat{\mu})=\mu$, 
    - for the variance: $\mathop{\mathbb{E}}({\hat{\sigma}^2})=\sigma^2$ 

Standard Deviation (SD):
- Measures the dispersion from the individual data values to the mean

Standard Error (SE):
- Measures how far the sample mean is likely to be from the true population mean.




**Q16.**

**Suppose you have a sample of data, $Y_i, \quad i=1,2, \cdot , N$ where $Y \sim \mathcal{IN}(\mu,\,\sigma^{2})\,.$

`````{topic} (a) Explain what $Y \sim \mathcal{IN}(\mu,\,\sigma^{2})\,.$ means.

The random variable $Y$ is independently normally distributed with expected value $(\mu)$ and variance $(\sigma^2)$,

`````

`````{topic} (b)  How would you obtain unbiased estimates of $\mu$ and $\sigma^2$? Explain what unbiased means. 
-   The unbiased estimator of $\mu$ is:

$$
    \hat{\mu} = \sum_{i=1}^N \frac{Y_i}{N},
$$

-   The unbiased estimator of the $\sigma^2$:

$$
   \hat{\sigma}^2 = \sum_{i=1}^N \frac{(Y_i - \bar{Y})^2}{N-1},
$$

`````

`````{topic} (c) How would you estimate the standard error of your estimate of $\mu$ ?
Recall from week 5 that we were asked to generate a $50 \times 15$ matrix of random uniforms

-   The estimated standard error of $\hat{\mu}$ can be estimated using the following:

$$
    SE(\hat{Y}) = \sqrt{ \frac{s^2}{N}} = \frac{s}{\sqrt{N}}
$$ 

`````

`````{topic} (d) Suppose that the distribution of your sample was not normal but highly skewed. Explain what this means and discuss what other measures of central tendency that you might use. 
 If the distribution is not symmetrical the mean, median and mode will diﬀer and you may want to use one of the others since the mean is sensitive to the outliers that may be associated with very skewed distributions.

`````

**Q17.** 

`````{topic} Marks on an exam, $Y_i \sim \mathcal{IN}(50,10^{2})$, are independently normally distributed with expected value 50 and standard deviation 10. In a class of 16, what is the probability that the average mark is greater than 53?

-   The standard error of the average is $\frac{10}{\sqrt{16}} = 2.5$

-   $Z = \frac{(53-50)}{2.5} = 1.2$

-   from tables $P(Z>1.2) = 0.1151$

The probability of an average greater than 53 is 11.51%

`````


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Generate the random matrix using Pandas DataFrame
rows = 15
columns = 50
random_matrix = pd.DataFrame(np.random.uniform(0, 1, (rows, columns)))


In [3]:
random_matrix.iloc[0,:]

0     0.897443
1     0.136783
2     0.676054
3     0.442156
4     0.484211
5     0.835267
6     0.324832
7     0.583233
8     0.812069
9     0.541406
10    0.973640
11    0.562685
12    0.117128
13    0.845065
14    0.054975
15    0.397418
16    0.133289
17    0.426002
18    0.162413
19    0.560183
20    0.800143
21    0.956463
22    0.702957
23    0.081454
24    0.545835
25    0.306865
26    0.931041
27    0.116734
28    0.044032
29    0.031046
30    0.285822
31    0.368590
32    0.245715
33    0.851877
34    0.381274
35    0.220157
36    0.836283
37    0.684478
38    0.327825
39    0.727890
40    0.055627
41    0.894620
42    0.728910
43    0.921497
44    0.452578
45    0.568864
46    0.185779
47    0.971980
48    0.569079
49    0.084482
Name: 0, dtype: float64

In [2]:
# Compute the means of the rows
row_means = random_matrix.mean(axis=0)
row_means

0     0.551019
1     0.570050
2     0.527971
3     0.523724
4     0.610550
5     0.586588
6     0.489707
7     0.661555
8     0.476817
9     0.370671
10    0.485703
11    0.511375
12    0.513262
13    0.507818
14    0.284509
15    0.373475
16    0.537019
17    0.625274
18    0.493758
19    0.444939
20    0.498657
21    0.447068
22    0.558953
23    0.512573
24    0.467357
25    0.466224
26    0.401366
27    0.463509
28    0.446356
29    0.363717
30    0.515757
31    0.513608
32    0.457843
33    0.536012
34    0.474204
35    0.408904
36    0.534391
37    0.560532
38    0.438903
39    0.642480
40    0.471351
41    0.566263
42    0.567396
43    0.549365
44    0.406210
45    0.433338
46    0.552597
47    0.551763
48    0.490374
49    0.479130
dtype: float64

In [4]:


variance = random_matrix.var(axis=0)
variance

0     0.102154
1     0.081637
2     0.082269
3     0.078033
4     0.038020
5     0.109770
6     0.061527
7     0.062156
8     0.028212
9     0.089125
10    0.111703
11    0.072542
12    0.098429
13    0.082870
14    0.053761
15    0.080756
16    0.096767
17    0.075762
18    0.086519
19    0.102341
20    0.097425
21    0.106944
22    0.065200
23    0.062813
24    0.087628
25    0.077733
26    0.105893
27    0.065116
28    0.068401
29    0.094905
30    0.052174
31    0.084267
32    0.032814
33    0.088372
34    0.053556
35    0.084009
36    0.116613
37    0.086017
38    0.061684
39    0.060121
40    0.137334
41    0.074953
42    0.076441
43    0.085291
44    0.107163
45    0.072596
46    0.068619
47    0.111103
48    0.080504
49    0.112671
dtype: float64

In [7]:
# Standard deviation of the Mean
std_dev = row_means.std() 
std_dev

0.07409081838473781

In [8]:
# Standard Error of the Mean
std_err = row_means.std() / np.sqrt(columns)  
std_err

0.010478024020701806