# Estimation and Confidence Intervals

## Background
In quality control processes, especially for high-value items, destructive testing is often required to evaluate product durability. 
A manufacturer of computer print-heads wants to estimate the mean durability of print-heads in terms of the number of characters 
(in millions) printed before failure.

Due to the destructive nature of testing, only a small sample of print-heads can be tested.

## Scenario
A total of 15 print-heads were selected and tested until failure.  
Their durability values (in millions of characters printed) are:

1.13, 1.55, 1.43, 0.92, 1.25, 1.36, 1.32, 0.85, 1.07, 1.48,  
1.20, 1.33, 1.18, 1.22, 1.29

## Assignment Tasks
1. Build a **99% Confidence Interval** using **sample standard deviation (t-distribution)**.  
2. Build a **99% Confidence Interval** using **known population standard deviation (z-distribution)**.

In [1]:
import numpy as np
from scipy.stats import t, norm

# Sample data
data = np.array([1.13, 1.55, 1.43, 0.92, 1.25, 1.36, 1.32, 0.85, 1.07, 1.48,
                 1.20, 1.33, 1.18, 1.22, 1.29])

n = len(data)
sample_mean = np.mean(data)
sample_std = np.std(data, ddof=1)

sample_mean, sample_std, n

(np.float64(1.2386666666666666), np.float64(0.19316412956959936), 15)

# Part A: 99% Confidence Interval (Sample Standard Deviation)

Since the population standard deviation is **not known**, we use the **t-distribution**.

Formula:
\[
CI = \bar{x} \pm t_{\alpha/2, n-1} \times \frac{s}{\sqrt{n}}
\]

Where:
- \(\bar{x}\) = sample mean
- \(s\) = sample standard deviation
- \(n\) = 15
- Degrees of freedom = n - 1 = 14
- Confidence level = 99%
- α = 0.01

In [2]:
alpha = 0.01
df = n - 1

# t critical value
t_crit = t.ppf(1 - alpha/2, df)

# Margin of error
ME_t = t_crit * (sample_std / np.sqrt(n))

# Confidence interval
CI_lower_t = sample_mean - ME_t
CI_upper_t = sample_mean + ME_t

(CI_lower_t, CI_upper_t)

(np.float64(1.0901973384384906), np.float64(1.3871359948948425))

# Interpretation (Sample SD)
This 99% confidence interval tells us the range in which the **true mean durability** of all print-heads lies, based on sample data.

Because population SD is unknown and sample size is small (n = 15), the **t-distribution** is appropriate.

# Part B: 99% Confidence Interval (Known Population Standard Deviation)

If the population standard deviation is known to be **0.2 million characters**, we use the **z-distribution**.

Formula:
\[
CI = \bar{x} \pm z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}}
\]

Where:
- σ = 0.2 (population SD)
- z-value for 99% = 2.576

In [3]:
pop_std = 0.2   # given population SD
z_crit = norm.ppf(1 - alpha/2)

# Margin of error
ME_z = z_crit * (pop_std / np.sqrt(n))

# Confidence interval
CI_lower_z = sample_mean - ME_z
CI_upper_z = sample_mean + ME_z

(CI_lower_z, CI_upper_z)

(np.float64(1.1056514133957607), np.float64(1.3716819199375725))

# Final Conclusion

### 1. 99% CI using sample SD (t-distribution):
This interval is wider because:
- Sample size is small (n = 15)
- Population SD is unknown
- t-distribution accounts for extra uncertainty

### 2. 99% CI using known population SD (z-distribution):
This interval is narrower because:
- We know the population SD (σ = 0.2)
- z-distribution assumes less uncertainty

Both confidence intervals estimate the **true mean durability** of print-heads in millions of characters before failure.