# Module 2: Hypothesis Testing
In this module, we'll explore hypothesis testing — a fundamental concept in inferential statistics. Hypothesis testing allows us to make data-driven decisions by comparing groups or assessing claims about populations.

## 🎯 Learning Objectives
- Understand null and alternative hypotheses
- Differentiate between Type I and Type II errors
- Perform and interpret t-tests and chi-squared tests
- Construct confidence intervals
- Use p-values to assess statistical significance

## 🧠 What is Hypothesis Testing?
Hypothesis testing evaluates assumptions (hypotheses) about a population parameter using sample data.

**Steps in Hypothesis Testing:**
1. State the null hypothesis ($H_0$) and alternative hypothesis ($H_1$)
2. Choose a significance level (commonly $\alpha = 0.05$)
3. Select and perform a test (t-test, chi-squared, etc.)
4. Calculate a test statistic and p-value
5. Compare p-value to $\alpha$ and make a decision

## 🔎 Type I and Type II Errors
- **Type I Error** ($\alpha$): Rejecting a true null hypothesis (false positive)
- **Type II Error** ($\beta$): Failing to reject a false null hypothesis (false negative)

## 📦 Example 1: Two-Sample T-Test
Let's test if two independent groups have significantly different means.

In [None]:
import numpy as np
from scipy.stats import ttest_ind
import matplotlib.pyplot as plt

# Simulate two groups
np.random.seed(42)
group1 = np.random.normal(60, 10, 50)
group2 = np.random.normal(65, 12, 50)

# Perform two-sample t-test
stat, p = ttest_ind(group1, group2)
print(f'T-statistic: {stat:.2f}, P-value: {p:.4f}')

**Interpretation**: If p-value < 0.05, we reject the null hypothesis and conclude that there is a significant difference between the groups.

## 📊 Visualizing the Groups

In [None]:
plt.hist(group1, alpha=0.7, label='Group 1')
plt.hist(group2, alpha=0.7, label='Group 2')
plt.legend()
plt.title('Score Distributions')
plt.xlabel('Score')
plt.ylabel('Frequency')
plt.show()

## 🧮 Example 2: Chi-Squared Test for Independence
This test determines whether two categorical variables are associated.

In [None]:
import pandas as pd
from scipy.stats import chi2_contingency

# Example contingency table
data = pd.DataFrame({
    'Gender': ['M', 'F', 'M', 'F', 'M', 'F'],
    'Preference': ['A', 'B', 'B', 'A', 'A', 'B']
})
contingency = pd.crosstab(data['Gender'], data['Preference'])
chi2, p, dof, expected = chi2_contingency(contingency)
print(f'Chi-squared: {chi2:.2f}, P-value: {p:.4f}')

**Interpretation**: A low p-value suggests an association between the two categorical variables.

## 📏 Confidence Intervals (CI)
A confidence interval gives a range where the true population parameter is likely to fall.

We'll compute a 95% CI for the mean of one of our groups.

In [None]:
import scipy.stats as stats

mean = np.mean(group1)
sem = stats.sem(group1)
ci = stats.t.interval(0.95, len(group1)-1, loc=mean, scale=sem)
print(f'95% CI for the mean: {ci}')

## ✅ Practice Exercises
1. Simulate two groups with overlapping distributions and test for a significant difference.
2. Create a contingency table with survey data and run a chi-squared test.
3. Construct a 99% confidence interval for a sample mean.
4. Reflect: What does a p-value of 0.001 mean in context?
5. Describe when you'd use a one-sample vs two-sample t-test.