#Lesson 2: Statistical Techniques
In this lab, we will explore data and sampling distributions, statistics, Maximum Likelihood Estimation (MLE), Markov and Chebyshev inequalities, the Law of Large Numbers (LLN), and the Central Limit Theorem (CLT).


- Data and sampling distributions
- Statistic
- MLE
- Markov and Chebyshev inequalities
- LLN
- CLT



## **Data and Sampling Distributions**

---


- ### Data Distribution
The distribution of the original dataset.


- ### Sampling Distribution
The distribution of a statistic calculated on many samples drawn from the original dataset.

### Important Note:
Sampling distribution != Sample distribution.

![Data Distribution](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*jC44QqrrZM2CiyAPmclwuQ.png)




## **Statistic**

---


A statistic is any function of the random variables constituting a random sample (Walpole R. E. et al. Probability and Statistics for Engineers and Scientists).

**Examples**: mean, median, standard deviation.

- ## Sampling Distribution of the Mean
If each $X_i$, $i = 1, \dots, n$ has a normal distribution with mean $\mu$ and variance $\sigma^2$, then the sample mean $\bar{X} = \frac{1}{n} (X_1 + \dots + X_n)$ has a normal distribution with mean $\mu_X = \mu$ and variance $\sigma_X^2 = \frac{\sigma^2}{n}$.






## **Maximum Likelihood Estimation (MLE)**

---


MLE is a method used to estimate the parameters of a given distribution.
### EXERCISE 1:

  1. Write the PDF;
  2. Write the likelihood function (the product of the PDF for the observed values);
  3. Write the log likelihood function;
  4. Calculate the derivative of the log likelihood function with respect to the parameters;
  5. Set the derivative equal to zero and solve for the parameters.

## **Markov and Chebyshev Inequalities**

---


### Markov Inequality
$P(X \geq a) \leq \frac{E[X]}{a}$

### Chebyshev Inequality
$P(\mu - k\sigma < X < \mu + k\sigma) \geq 1 - \frac{1}{k^2}$



### EXERCISE 2:

1. What are $E[X]$, $\mu$, $\sigma$?
2. What do these formulas represent?






## **Law of Large Numbers (LLN)**

---


Given a collection of iid samples from a random variable with finite mean, the sample mean converges in probability to the expected value. That is, for any $\epsilon > 0$:
$$\lim_{n \to \infty} P(|\bar{X}_n - \mu| < \epsilon) = 1$$

## **Central Limit Theorem (CLT)**

---


If $\bar{X}$ is the mean of a random sample of size $n$ taken from a population with mean $\mu$ and finite variance $\sigma^2$, then the limiting form of the distribution of
$$Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}}$$
as $n \to \infty$, is the standard normal distribution $N(z; 0, 1)$.

## Sampling Distribution: Tasks

##EXERCISE 3:
Complete the following code.If the standard deviation of the mean for the sampling distribution of random samples of size 36 from a large or infinite population is 2, how large must the sample size become if the standard deviation is to be reduced to 1.2?



In [None]:
import math

sigma_old = 2
n_old = 36
sigma_new = 1.2

n_new = ................................

# Round the new sample size to the nearest integer
n_new = .................

print(n_new)

In [None]:

assert int(n_new) == n_new, "The new sample size must be an integer."

##EXERCISE 4:
Complete the following code. The heights of 1000 students are approximately normally distributed with a mean of 174.5 centimeters and a standard deviation of 6.9 centimeters. Suppose 200 random samples of size 25 are drawn from this population and the means recorded to the nearest tenth of a centimeter. Determine the mean and standard deviation of the sampling distribution.

In [None]:
mean_population = 174.5
std_population = 6.9
sample_size = 25

# your code here

mean_sampling_distribution, std_sampling_distribution

In [None]:
assert mean_sampling_distribution == mean_population, "Mean of sampling distribution should equal population mean"

calculated_std_dev = std_population / math.sqrt(sample_size)
assert abs(std_sampling_distribution - calculated_std_dev) < 1e-10, "Standard deviation calculation is incorrect"

## Statistic:




###EXERCISE 5:

Complete the following code. find Which statistic describes the data better?

The lengths of time, in minutes, that 10 patients waited in a doctor’s office before receiving treatment were recorded as follows: 5, 11, 9, 5, 10, 15, 6, 10, 5, and 10. Treating the data as a random sample, find:
1. The mean;
2. The median;
3. The mode.

In [None]:
import numpy as np
from scipy import stats

data = [5, 11, 9, 5, 10, 15, 6, 10, 5, 10]

mean = .............................
median = ....................
mode = ...................

# Extract the mode value correctly
mode_value = ...................

mean, median, mode_value



In [None]:
assert mean == 8.6, "The mean should be 8.6"
assert median > 8, "The median should be greater than 8"
assert mode_value in [5, 10], "The mode should be either 5 or 10"
assert mean < median, "The mean should be less than the median"

## MLE:
Estimate the parameter $\lambda$ of a Poisson distribution. The probability function of Poisson distribution is:
$$P(X = x) = \frac{\lambda^x e^{-\lambda}}{x!}$$



The Maximum Likelihood Estimation (MLE) for λ is given by:

  $$\lambda = \frac{1}{n} \sum_{j=1}^{n} x_j$$



## Markov and Chebyshev Inequalities:



###EXERCISE 6:

Complete the following code.

1. Suppose that the average grade on the upcoming math exam is 70%. Give an upper bound on the proportion of students who score at least 90%.

In [None]:
from fractions import Fraction

mean_grade = 70
threshold_grade = 90

upper_bound = # your code here
upper_bound

In [None]:
assert upper_bound == Fraction(7, 9), "The upper bound should be 7/9"

2. If the distribution of $Y$ is $b(n; 0.25)$, give a lower bound for $P(\frac{Y}{n} - 0.25 < 0.05)$ when $n = 100$.


In [None]:
n =.............
p =.............
epsilon =..............

#formula for calculating lower_bound
lower_bound = # your code here
lower_bound

In [None]:
assert lower_bound >= 0, "The lower bound should be a positive number"

3. Suppose a fair coin is flipped 100 times. Find a bound on the probability that the number of times the coin lands on heads is at least 60 or at most 40.

In [None]:
import math

n = 100
p = 0.5

# To calculate k, we need to take the square root of the entire expression.
k = .......................

# The formula for prob_bound
prob_bound = ............................
print(prob_bound)




In [None]:
assert prob_bound <= 1, "The probability bound should be less than or equal to 1"

## CLT


###EXERCISE 7:
Complete the following code. If a certain machine makes electrical resistors having a mean resistance of 40 ohms and a standard deviation of 2 ohms, what is the probability that a random sample of 36 of these resistors will have a combined resistance of more than 1458 ohms?

In [None]:
# Solution to CLT Task
mean_resistance = 40
std_resistance = 2
sample_size = 36
threshold_resistance = 1458

mean_sample_resistance = # your code here
std_sample_resistance = # your code here

z_score = # your code here

from scipy.stats import norm
probability = 1 - norm.cdf(z_score)
probability

In [None]:

assert 0 <= probability <= 1, "The probability should be between 0 and 1"

##**Coding part**

---


[Clich here](https://colab.research.google.com/drive/1C8I2cPWWeGlZ2CRTAYEh9uhm5gP9gCMi)

## **Conclusion**




In this lab, we explored various statistical techniques, including data and sampling distributions, calculating statistics, Maximum Likelihood Estimation (MLE), and applying Markov and Chebyshev inequalities. We also reviewed important theorems such as the Law of Large Numbers (LLN) and the Central Limit Theorem (CLT).

By working through these tasks and implementing them in Python, we reinforced our understanding of key statistical concepts. These skills are crucial for analyzing data and making inferences in both academic research and practical applications.


Thank you for your participation and hard work in this lab!