# Assignment 2: Confidence Intervals

## Background
A manufacturer of print-heads for personal computers wants to estimate the mean durability of their print-heads (number of characters printed before failure).

## Data
15 print-heads were tested. Durability recorded (in millions of characters):

1.13, 1.55, 1.43, 0.92, 1.25, 1.36, 1.32, 0.85, 1.07, 1.48, 1.20, 1.33, 1.18, 1.22, 1.29

**Topics Covered:**
- Confidence Intervals
- t-distribution
- z-distribution
- Sample vs Population Standard Deviation

---
## Step 1: Import Libraries and Load Data

In [None]:
# Import required libraries
import numpy as np
from scipy import stats

# Sample data - durability in millions of characters
durability = [1.13, 1.55, 1.43, 0.92, 1.25, 1.36, 1.32, 0.85, 1.07, 1.48, 1.20, 1.33, 1.18, 1.22, 1.29]

print("Print-head durability data (millions of characters):")
print(durability)
print("\nNumber of samples:", len(durability))

---
## Step 2: Calculate Basic Statistics

In [None]:
# Calculate sample statistics

# Sample size
n = len(durability)

# Sample mean
sample_mean = sum(durability) / n

# Sample standard deviation (using n-1 for sample)
# Step 1: Calculate sum of squared differences from mean
sum_squared_diff = 0
for value in durability:
    diff = value - sample_mean
    squared_diff = diff * diff
    sum_squared_diff = sum_squared_diff + squared_diff

# Step 2: Divide by (n-1) and take square root
sample_std = (sum_squared_diff / (n - 1)) ** 0.5

print("=== Sample Statistics ===")
print("Sample Size (n):", n)
print("Sample Mean:", round(sample_mean, 4))
print("Sample Standard Deviation:", round(sample_std, 4))

---
## Task A: 99% Confidence Interval Using Sample Standard Deviation

**Why use t-distribution?**
- Sample size is small (n = 15)
- Population standard deviation is unknown
- We use sample standard deviation

**Formula:**
```
CI = sample_mean ± (t_critical * standard_error)
where:
  standard_error = sample_std / sqrt(n)
  degrees of freedom = n - 1
```

In [None]:
# Task A: Build 99% Confidence Interval using Sample Standard Deviation

# Confidence level
confidence_level = 0.99

# Degrees of freedom
degrees_of_freedom = n - 1
print("Degrees of Freedom:", degrees_of_freedom)

# Calculate alpha (significance level)
alpha = 1 - confidence_level
print("Alpha (significance level):", alpha)

# Get t-critical value for 99% confidence (two-tailed)
# We need t-value for alpha/2 in each tail
t_critical = stats.t.ppf(1 - alpha/2, degrees_of_freedom)
print("t-critical value:", round(t_critical, 4))

# Calculate standard error
standard_error = sample_std / (n ** 0.5)
print("Standard Error:", round(standard_error, 4))

# Calculate margin of error
margin_of_error = t_critical * standard_error
print("Margin of Error:", round(margin_of_error, 4))

# Calculate confidence interval
lower_bound = sample_mean - margin_of_error
upper_bound = sample_mean + margin_of_error

print("\n" + "=" * 50)
print("99% CONFIDENCE INTERVAL (using sample std):")
print("=" * 50)
print("Lower Bound:", round(lower_bound, 4), "million characters")
print("Upper Bound:", round(upper_bound, 4), "million characters")
print("\nInterval: (", round(lower_bound, 4), ",", round(upper_bound, 4), ")")

**Interpretation:**
We are 99% confident that the true population mean durability of print-heads lies between the lower and upper bound values.

---
## Task B: 99% Confidence Interval Using Known Population Standard Deviation

**Given:** Population standard deviation = 0.2 million characters

**Why use z-distribution?**
- Population standard deviation is known
- We use z-distribution (normal distribution)

**Formula:**
```
CI = sample_mean ± (z_critical * standard_error)
where:
  standard_error = population_std / sqrt(n)
```

In [None]:
# Task B: Build 99% Confidence Interval using Known Population Standard Deviation

# Given population standard deviation
population_std = 0.2
print("Population Standard Deviation:", population_std)

# Confidence level
confidence_level = 0.99
alpha = 1 - confidence_level
print("Alpha:", alpha)

# Get z-critical value for 99% confidence (two-tailed)
z_critical = stats.norm.ppf(1 - alpha/2)
print("z-critical value:", round(z_critical, 4))

# Calculate standard error using population std
standard_error_pop = population_std / (n ** 0.5)
print("Standard Error:", round(standard_error_pop, 4))

# Calculate margin of error
margin_of_error_pop = z_critical * standard_error_pop
print("Margin of Error:", round(margin_of_error_pop, 4))

# Calculate confidence interval
lower_bound_pop = sample_mean - margin_of_error_pop
upper_bound_pop = sample_mean + margin_of_error_pop

print("\n" + "=" * 50)
print("99% CONFIDENCE INTERVAL (using population std):")
print("=" * 50)
print("Lower Bound:", round(lower_bound_pop, 4), "million characters")
print("Upper Bound:", round(upper_bound_pop, 4), "million characters")
print("\nInterval: (", round(lower_bound_pop, 4), ",", round(upper_bound_pop, 4), ")")

---
## Comparison of Both Methods

In [None]:
# Compare both confidence intervals
print("=== COMPARISON ===")
print("\nMethod A (Sample Std, t-distribution):")
print("  Interval: (", round(lower_bound, 4), ",", round(upper_bound, 4), ")")
print("  Width:", round(upper_bound - lower_bound, 4))

print("\nMethod B (Population Std, z-distribution):")
print("  Interval: (", round(lower_bound_pop, 4), ",", round(upper_bound_pop, 4), ")")
print("  Width:", round(upper_bound_pop - lower_bound_pop, 4))

print("\n" + "=" * 50)
print("OBSERVATIONS:")
print("=" * 50)
print("1. Both intervals are centered around the sample mean:", round(sample_mean, 4))
print("2. Method A (t-distribution) gives a wider interval because:")
print("   - Sample std is larger than population std")
print("   - t-distribution has heavier tails than z-distribution")
print("3. Method B gives narrower interval because:")
print("   - Known population std reduces uncertainty")

---
## Summary

In this assignment, we learned:

1. **Confidence Interval** - A range of values likely to contain the true population parameter
2. **t-distribution** - Used when population std is unknown and sample size is small
3. **z-distribution** - Used when population std is known
4. **Standard Error** - Measures the variability of sample mean
5. **Margin of Error** - How much the estimate might differ from true value

**Key Takeaways:**
- 99% confidence means if we repeated this sampling 100 times, about 99 intervals would contain the true mean
- Larger sample size = narrower confidence interval
- Higher confidence level = wider interval