## Daniel Barella
## 9/16/25

# 📓 Day 17 — Hypothesis Testing

## 1. Introduction
Hypothesis testing is a method for making inferences about a population using sample data.

- **Null Hypothesis (H₀)**: No effect, no difference, or status quo.  
- **Alternative Hypothesis (H₁)**: What we want to test/prove.  
- **Type I Error**: Rejecting H₀ when it’s true (false positive).  
- **Type II Error**: Failing to reject H₀ when it’s false (false negative).  
- **Significance Level (α)**: Threshold for deciding if results are “statistically significant” (commonly 0.05).  

---

## 2. Steps in Hypothesis Testing
1. Define hypotheses (H₀, H₁)  
2. Choose significance level (α)  
3. Select appropriate test (t-test, chi-square, etc.)  
4. Compute test statistic and p-value  
5. Compare p-value with α  
6. Make decision  


## 3. Example: One-Sample T-Test

In [2]:
import numpy as np
from scipy import stats

# Sample data: exam scores
scores = np.array([88, 92, 85, 91, 87, 95, 89, 90, 86, 93])

# Claim: average exam score is 90
mu = 90

# Perform one-sample t-test
t_stat, p_val = stats.ttest_1samp(scores, mu)

print("T-statistic:", t_stat)
print("P-value:", p_val)

if p_val < 0.05:
    print("Reject the null hypothesis (mean is not 90).")
else:
    print("Fail to reject the null hypothesis (mean could be 90).")


T-statistic: -0.394771016975867
P-value: 0.7022041268374959
Fail to reject the null hypothesis (mean could be 90).


## 4. Example: Two-Sample T-Test


In [3]:
# Two groups: Control vs Treatment
control = np.array([22, 21, 23, 20, 19, 24, 22, 23])
treatment = np.array([25, 27, 26, 30, 29, 28, 27, 26])

t_stat, p_val = stats.ttest_ind(control, treatment)

print("T-statistic:", t_stat)
print("P-value:", p_val)

if p_val < 0.05:
    print("Reject null hypothesis (means differ).")
else:
    print("Fail to reject null hypothesis (means are similar).")


T-statistic: -6.590591584668059
P-value: 1.207850807546064e-05
Reject null hypothesis (means differ).


## 5. Example: Chi-Square Test for Independence

In [4]:
import pandas as pd

# Contingency table: Gender vs Preference
data = pd.DataFrame({
    "Gender": ["M","M","F","F","M","F","M","F","M","F"],
    "Preference": ["A","B","A","A","B","B","A","B","B","A"]
})

contingency = pd.crosstab(data["Gender"], data["Preference"])
print("Contingency Table:\n", contingency)

chi2, p_val, dof, expected = stats.chi2_contingency(contingency)

print("Chi-square:", chi2)
print("P-value:", p_val)

if p_val < 0.05:
    print("Reject null hypothesis (variables are not independent).")
else:
    print("Fail to reject null hypothesis (variables may be independent).")


Contingency Table:
 Preference  A  B
Gender          
F           3  2
M           2  3
Chi-square: 0.0
P-value: 1.0
Fail to reject null hypothesis (variables may be independent).


## 6. Wrap-Up

- Use t-tests for comparing means.

- Use chi-square for categorical independence.

- Always check assumptions (normality, variance, independence).

- This sets us up for Day 18: ANOVA & Regression Intro.