# Central Limit Theorem

## Basic Idea
>  In probability theory, the central limit theorem (CLT) states that the distribution of a sample variable approximates a normal distribution (i.e., a “bell curve”) as the sample size becomes larger, assuming that all samples are identical in size, and regardless of the population's actual distribution shape.

#### Law of Large Numbers
$$if\ n\ larges\ \bar{X} \rightarrow \mu$$
$$if\ n\ larges\ \mu_{\bar{X}} \rightarrow \mu$$
$$n \geq 30\ imply\ \mu_{\bar{x}} \approx \mu$$


## Standard Deviation of Error
$$\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}}$$


## Z-score
$$Z = \frac{\bar{X} - \mu_{\bar{X}}}{\sigma_{\bar{X}}}$$
$$Z = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}}$$


# Recall
## Uniform Distribution
Suppose we have $x \sim U(a,b) $, So the density function will be
$$f(x) = \frac{1}{b-a}$$
Mean
$$\mu = \frac{a+b}{2}$$
Standard Deviation
$$\sigma = \frac{b-a}{\sqrt{12}}$$
##  Exponential Distribution
$$f(x) = \lambda e^{-\lambda x}$$
where,
$$\lambda = \frac{1}{\mu}$$
and
$$\mu = \sigma$$
### Cumulative Distribution Function
Area in the left some x variable
$$A_l = P(X < x) = 1 - e^{-\lambda x}$$
Area in the right some x variable
$$A_r = P(X > x) = e^{-\lambda x}$$
## Normal Distribution vs Sampling Distribution
### Normal
$$X \sim N(\mu, \sigma)$$
Probability Area
$$P(X < a) = P\left(z < \frac{a - \mu}{\sigma}\right)$$
### Sampling
$$\bar{X} \sim N(\mu, \frac{\sigma}{\sqrt{n}})$$
Probability Area
$$P(\bar{X} < a) = P\left(z < \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}}
\right)$$

## Interval Confidence
### Margin of Error
$$EBM = \frac{z\sigma}{\sqrt{n}}$$
And we can find $z_{score}$ with $A_l$

$$A_l = \frac{CL + 1}{2}$$
Where
$$P(x < z) = A_l$$
### Confidence Interval
$$ \bar{x} - EBM \leq \mu \leq \bar{x} + EBM $$
> If you increases confidence interval, the margin of errors will also increase
### Confidence Level and $\alpha$
$\alpha$ means the probability of mean in the outside of the confidence interval
$$CL = 1 - \alpha$$
### Example
        1. The average test scores in a physics class is normally distributed with a standard deviation of 5.4. 50 scores with a sample mean of 79 were selected at random. (a) Find a 95% confidene interval for the population mean test score. (b) What is the value of the margin of error?
        
#### Answer:
$A_l = \frac{.95 + 1}{2} = .975$

        
        

In [1]:
import pandas as pd
import numpy as np

In [3]:
array = np.array([['90%', 1.645], ['95%', 1.96], ['98%', 2.33], ['99%', 2.57]])
col = ['CL', 'Z-score']
df = pd.DataFrame(data=array, columns = col)
df

Unnamed: 0,CL,Z-score
0,90%,1.645
1,95%,1.96
2,98%,2.33
3,99%,2.57


## Sample Size given proportion
$$n = \frac{z^2\hat{p}(1-\hat{p})}{E^2}$$
E = Margin of Error
### Maximum Sample
$$y = \hat{p}(1-\hat{p})$$
$$\frac{dy}{dx} = 1 - 2\hat{p} = 0$$
$$\hat{p} = 0.5$$

## Student's T Distribution
### Criteria
If you don't have enough sample size and you just only know about standard deviation of sample instead of standard deviation of population
### Formula
$$
\mu \rightarrow \bar{x} \pm t_{v, \frac{\alpha}{2}} \frac{s}{\sqrt{n}}
$$
#### The Degree of Freedom
$$v = n - 1$$
#### Recall
$$
    \tag{1}
    \alpha = 1 - CL
    \newline
$$
$$
    \tag{2}
    \frac{\alpha}{2} = \frac{1-CL}{2}
$$

t### Example
                1. A chemistry class at a certain university has 5000 students. The scores of 10 students were selected at randomand are shown in th tabnle below. (a) Calculate the mean and sstandard deviation of the sample. (b) Calculate the margin of error (EBM). (c) Construct a 90% confiddence interval for the mean score of all the students in the chemistry class.
|76|84|69|92|58|
|--|--|--|--|--|
|89|73|97|85|77|

### Answer
$\mu$
$$
\mu = \frac{800}{10} = 80
$$
$\sigma$
$$
\sigma = \sqrt{\frac{16+16+121+144+484+81+49+289+25+9}{10-1}}
$$
$$
\sigma = 11.709
$$
EBM
$$
EBM = t_{(9, 0.05)}\frac{s}{\sqrt{n}} = 1.833\frac{11.709}{\sqrt{10}} = 6.787
$$

## Approximate  Binomial Distribution using Normal Distribution
### Criteria
> if $np(1-p) \geq 10$, we can approximate Binomial Distribution using Normal Distribution
### Question
    1. 800 randomly selected college students at university XYZ were asked if they own a laptop. 584 responded yes to the survey. (a) Can we approximate the binomial distribution with a normal distribution? (b) Construct a 95% confidence interval to estimate the true population proportion of students at university XYZ who own laptops and determine the margin of error. (c) What sample size is needed to estimate the true proportion with a 2% margin of error at a 90% confidence level?
### Answer
a. $\hat{p} \sim p$, so $np(1-p) \approx n\hat{p}(1-\hat{p})$
$$p \approx \hat{p} = \frac{584}{800} = 0.73\newline$$
$$n\hat{p}(1-\hat{p}) = 800(0.73)(1-0.73) = 157.68 \geq 10 \newline$$
$\therefore$ I conclude that we can approximate binomial distribution with normal distribution\
## Error bound of Proportion
$$EBP = z\sqrt{\frac{\hat{p}\hat{q}}{n}}$$
b. because CL = 95% $\rightarrow z = 1.96$
$$EBP = 1.96\frac{(0.73)(0.27)}{800} = 0.0308$$
$$CI = 0.6992 \geq \hat{p} \geq 0.7608$$
c. We know that
$$
n = z^2\frac{\bar{p}(1-\bar{p})}{E^2}
$$

$$
n = (1.645)^2 \frac{(0.73)(0.27)}{(0.02)^2} = 1333,39381875 \approx 1334
$$
### Question
    2. 500 resident in town ABC were asked if they own a car. Of the 500 surveyed, 405 responded yes. (a) construct a 99% confidence interval to estimate the true population proportion of residents who own a car in town ABC. (b) what sample size is needed to estimate the true proportion with a 3% margin of error at a 98% confidence level?
### Answer
$\bar{p} = \frac{405}{500} = 0.81$\
$z = 2.575$\
$E = z\sqrt{\frac{\bar{p}(1-\bar{p})}{n}}$\
$E = 2.575\sqrt{\frac{(0.81)(0.19)}{500}} = 0.0452$\
a.
$$
CI = 0.7648 \leq \bar{p} \leq 0.8552
$$
b.
$z = 2.33$

$$
n = \frac{z^2\bar{p}(1-\bar{p})}{E^2} = \frac{(2.33)^2(0.81)(0.19)}{(0.03)^2} = 929
$$

### Question
    3. A certain number of people in the state of Texas were asked if they own a house. The true proportionin the range of $0.62383 \leq p \leq 0.67217$ at a 95% confidence level. (a) What is the value of the sample proportion? (b) Calculate the margin of error bouond for proportion. (c) How many Individual were surveyed? (d) How many responded yes to the survey?
### Answer
a. $\frac{0.62383+0.67217}{2} = 0.6480$\
b. 0.67217 - 0.6480 = 0.02423\
c. 
$$n = \frac{z^2\bar{p}\bar{q}}{E^2} = \frac{(1.96)^2(0.6480)(0.352)}{(0.02423)^2} = 1499.9491 = 1500$$
d.
$$
1500*0.6480 = 972
$$

# Chebyshev's Theorem
## Why we use chebyshev's Theorem?
> The Empirical Rule does not apply to all data sets, only to those that are bell-shaped, and even then is stated in terms of approximations. A result that applies to every data set is known as Chebyshev’s Theorem.
## basic Idea
For any numerical data set,

* at least  3/4  of the data lie within two standard deviations of the mean, that is, in the interval with endpoints  x¯±2s  for samples and with endpoints  μ±2σ  for populations;
* at least  8/9  of the data lie within three standard deviations of the mean, that is, in the interval with endpoints  x¯±3s  for samples and with endpoints  μ±3σ  for populations;
* at least  1−1/k2  of the data lie within  k  standard deviations of the mean, that is, in the interval with endpoints  x¯±ks  for samples and with endpoints  μ±kσ  for populations, where  k  is any positive whole number that is greater than  1 .