## **Maximum Likelihood Estimation (MLE)**

Maximum Likelihood Estimation (MLE) is a method used to estimate the parameters of a given distribution. In this notebook, we will demonstrate how to use MLE to estimate the mean $(\mu)$ and standard deviation $(\sigma)$ of a normal distribution.

## EXERCISE 1:
## Algorithm for MLE with Normal Distribution
1. **Write the PDF (Probability Density Function)**:
   $$f(x|\mu, \sigma) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right)$$
2. **Write the likelihood function** (the product of the PDF for the observed values):
   $$L(\mu, \sigma) = \prod_{i=1}^n \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x_i - \mu)^2}{2\sigma^2}\right) $$
3. **Write the log likelihood function**:
   $$ \ell(\mu, \sigma) = \ln L(\mu, \sigma) = \sum_{i=1}^n \left[ -\ln(\sqrt{2\pi\sigma^2}) - \frac{(x_i - \mu)^2}{2\sigma^2} \right] $$
4. **Calculate the derivative of the log likelihood function with respect to the parameters**:

   For $(\mu)$:
   $$ \frac{\partial \ell}{\partial \mu} = - \frac{1}{\sigma^2} \sum_{i=1}^n (x_i - \mu) $$
   For $(\sigma)$:
   $$\frac{\partial \ell}{\partial \sigma} = - \frac{n}{\sigma} + \frac{1}{\sigma^3} \sum_{i=1}^n (x_i - \mu)^2 $$
5. **Set the derivative equal to zero and solve for the parameters**:
   Solving for $(\mu)$:
   $$ \mu = \frac{1}{n} \sum_{i=1}^n x_i $$
   Solving for$(\sigma^2)$:
   $$ \sigma^2 = \frac{1}{n} \sum_{i=1}^n (x_i - \mu)^2 $$

## **Markov and Chebyshev Inequalities**



##EXERCISE 2:

1. What are $E[X]$, $\mu$, $\sigma$?

- $E[X]$: The expected value (mean) of the random variable $X$.
- $\mu$: Another notation for the expected value (mean) of $X$.
- $\sigma$: The standard deviation of $X$, which measures the spread or dispersion of the random variable around its mean.


2. What do these formulas represent?

## Markov Inequality
Markov's Inequality provides an upper bound for the probability that a non-negative random variable exceeds a certain value. Formally, if $X$ is a non-negative random variable and $a > 0$

## Chebyshev Inequality
Chebyshev's Inequality provides a bound on the probability that the value of a random variable deviates from its mean. For any random variable $X$ with mean $\mu$ and standard deviation $\sigma$, and for any $k > 0$

## Sampling Distribution:





###EXERCISE 3:
If the standard deviation of the mean for the sampling distribution of random samples of size 36 from a large or infinite population is 2, how large must the sample size become if the standard deviation is to be reduced to 1.2?

In [7]:
# Solution to exercise 3
import math

sigma_old = 2
n_old = 36
sigma_new = 1.2

n_new = (sigma_old / sigma_new)**2 * n_old

# Round the new sample size to the nearest integer
n_new = round(n_new)  # Round to the nearest integer

print(n_new)

100


In [8]:
# Assert that the new sample size is an integer
assert int(n_new) == n_new, "The new sample size must be an integer."



###EXERCISE 4:
The heights of 1000 students are approximately normally distributed with a mean of 174.5 centimeters and a standard deviation of 6.9 centimeters. Suppose 200 random samples of size 25 are drawn from this population and the means recorded to the nearest tenth of a centimeter. Determine the mean and standard deviation of the sampling distribution.

In [10]:
# Solution to Task 4
mean_population = 174.5
std_population = 6.9
sample_size = 25

mean_sampling_distribution = mean_population
std_sampling_distribution = std_population / math.sqrt(sample_size)

mean_sampling_distribution, std_sampling_distribution

(174.5, 1.3800000000000001)

In [11]:
assert mean_sampling_distribution == mean_population, "Mean of sampling distribution should equal population mean"

calculated_std_dev = std_population / math.sqrt(sample_size)
assert abs(std_sampling_distribution - calculated_std_dev) < 1e-10, "Standard deviation calculation is incorrect"

## Statistic:




###EXERCISE 5:
Which statistic describes the data better?

The lengths of time, in minutes, that 10 patients waited in a doctor’s office before receiving treatment were recorded as follows: 5, 11, 9, 5, 10, 15, 6, 10, 5, and 10. Treating the data as a random sample, find:
1. The mean;
2. The median;
3. The mode.

In [28]:
import numpy as np
from scipy import stats

data = [5, 11, 9, 5, 10, 15, 6, 10, 5, 10]

mean = np.mean(data)
median = np.median(data)
mode = stats.mode(data)

# Extract the mode value correctly
mode_value = mode.mode  # Remove the [0] index

mean, median, mode_value  # Return the mode value instead of the ModeResult object



(8.6, 9.5, 5)

In [30]:
assert mean == 8.6, "The mean should be 8.6"
assert median > 8, "The median should be greater than 8"
assert mode_value in [5, 10], "The mode should be either 5 or 10" # Use mode_value for comparison
assert mean < median, "The mean should be less than the median"

## Markov and Chebyshev Inequalities:
###EXERCISE 6:
1. Suppose that the average grade on the upcoming math exam is 70%. Give an upper bound on the proportion of students who score at least 90%.


In [31]:
from fractions import Fraction

mean_grade = 70
threshold_grade = 90

upper_bound = Fraction(mean_grade, threshold_grade)
upper_bound

Fraction(7, 9)

In [32]:
assert upper_bound == Fraction(7, 9), "The upper bound should be 7/9"

2. If the distribution of $Y$ is $b(n; 0.25)$, give a lower bound for $P(\frac{Y}{n} - 0.25 < 0.05)$ when $n = 100$.


In [41]:
# Solution to Task 2
n = 100
p = 0.25
epsilon = 0.05

# The corrected formula for calculating lower_bound, ensuring correct order of operations:
lower_bound = 1 - (1 / (4 * n * epsilon**2))
# The assertion below is likely too strict given the formula used to calculate lower_bound
assert lower_bound >= 0, "The lower bound should be a positive number"
print(lower_bound)

2.220446049250313e-16


3. Suppose a fair coin is flipped 100 times. Find a bound on the probability that the number of times the coin lands on heads is at least 60 or at most 40.

In [48]:
import math

n = 100
p = 0.5

# To calculate k correctly, we need to take the square root of the entire expression.
k = math.sqrt(10 / (n * p * (1 - p)))

# The formula for prob_bound
prob_bound = 2 * math.exp(-2 * k**2)
print(prob_bound)


assert prob_bound <= 1, "The probability bound should be less than or equal to 1"

0.8986579282344431


## CLT
###EXERCISE 7:
If a certain machine makes electrical resistors having a mean resistance of 40 ohms and a standard deviation of 2 ohms, what is the probability that a random sample of 36 of these resistors will have a combined resistance of more than 1458 ohms?

In [49]:
# Solution to CLT Task
mean_resistance = 40
std_resistance = 2
sample_size = 36
threshold_resistance = 1458

mean_sample_resistance = mean_resistance * sample_size
std_sample_resistance = std_resistance * math.sqrt(sample_size)

z_score = (threshold_resistance - mean_sample_resistance) / std_sample_resistance

from scipy.stats import norm
probability = 1 - norm.cdf(z_score)
probability

0.06680720126885809

In [50]:
# Adding an assertion to check if probability is a valid value
assert 0 <= probability <= 1, "The probability should be between 0 and 1"