```{contents}

```

## Chi-Square Test (Goodness of Fit)

### 1. What is it?

* A **non-parametric test** used for **categorical data** (nominal/ordinal).
* Tests whether **observed data** fits a **theoretical (expected) distribution**.
* Often used to test **population proportions**.

---

### 2. Key Concepts

* **Observed categorical distribution** → Data collected from the sample.
* **Theoretical categorical distribution (Expected values)** → Proportion based on a hypothesis/theory.
* The test checks whether observed proportions are consistent with expected proportions.

---

### 3. Example 1 – Bike Colors

* Theory: ⅓ like yellow, ⅓ like red, ⅓ like orange.
* Sample: 22 like yellow, 17 like red, 59 like orange.
* Question: Does the sample fit the theoretical distribution?

---

### 4. Example 2 – Left vs. Right Handed Students

* Total students = 75
* **Observed**: 11 left-handed, 64 right-handed
* **Theory**: 12% of people are left-handed

  * Expected left-handed = $0.12 \times 75 = 9$
  * Expected right-handed = $75 - 9 = 66$
* Comparison:

  * Observed → (11, 64)
  * Expected → (9, 66)
* Next step: Apply **Chi-Square formula** to test if observed data fits theory.

---

### 5. Formula 

$$
\chi^2 = \sum \frac{(O - E)^2}{E}
$$

Where:

* $O$ = Observed frequency
* $E$ = Expected frequency

---

### 6. Takeaway

* **Chi-Square Goodness of Fit Test** checks if **sample data aligns with theoretical distribution**.
* Used only for **categorical data**.
* Requires **Observed vs. Expected** distributions.
* In practice → performed with **hand calculations** or **Python/R**.





### 🎯 Problem Statement

* **2010 Census (Population Data / Expected Distribution):**

  * Less than 50kg → 20%
  * 50–75kg → 30%
  * Greater than 75kg → 50%

* **2020 Sample (Observed Data, n = 500):**

  * < 50kg → 140
  * 50–75kg → 160
  * > 75kg → 200

* Question: At **α = 0.05**, do weights in 2020 differ significantly from the 2010 distribution?

---

### 🔹 Step 1: Define Hypotheses

* **H₀ (Null Hypothesis):** 2020 weight distribution fits the 2010 population proportions.
* **H₁ (Alternative Hypothesis):** 2020 weight distribution does not fit the 2010 proportions.

---

### 🔹 Step 2: Expected Frequencies (based on 2010 proportions & n=500)

* < 50kg → 0.2 × 500 = 100
* 50–75kg → 0.3 × 500 = 150
* > 75kg → 0.5 × 500 = 250

---

### 🔹 Step 3: Degrees of Freedom

* df = (number of categories – 1) = 3 – 1 = **2**

---

### 🔹 Step 4: Critical Value

* From Chi-Square table (α = 0.05, df = 2) → **5.991**

---

### 🔹 Step 5: Test Statistic (χ² Formula)

$$
χ² = \sum \frac{(O - E)^2}{E}
$$

* For <50kg → (140 − 100)² / 100 = 1600/100 = 16
* For 50–75kg → (160 − 150)² / 150 = 100/150 ≈ 0.67
* For >75kg → (200 − 250)² / 250 = 2500/250 = 10

**χ² = 26.66**

---

### 🔹 Step 6: Decision Rule

* If χ² > 5.991 → Reject H₀
* 26.66 > 5.991 → **Reject H₀**

---

### ✅ Conclusion

The **2020 weight distribution is significantly different** from the 2010 distribution.
Weights of individuals in the population have **changed over 10 years**.

