---

## üìå **Topic: Chi-Square Test ‚Äì Goodness of Fit**

### ‚úÖ **What is the Chi-Square Test?**

* A **non-parametric test** used to determine whether observed categorical data matches an expected (theoretical) distribution.
* Often referred to as the **Chi-Square Goodness of Fit Test**.
* Used for **categorical data**:

  * **Nominal** (e.g., colors, gender)
  * **Ordinal** (e.g., ranks)

---

# üéØ What is **Goodness of Fit**?

The **goodness of fit test** is a **statistical test** to see **how well your observed data fit a particular expected distribution**.

It tells you:

> üìä **Does my data follow the theoretical distribution I think it does?**
> 

### ‚úÖ **Purpose:**

To **compare**:

* **Observed categorical distribution** (from sample)
* **Expected categorical distribution** (based on theory or known proportion)

---

### ‚úÖ **Example 1: Bike Color Preferences**

* Theory:
  1/3 like **Yellow**, 1/3 like **Red**, 1/3 like **Orange**
* Observed (from sample):

  * Yellow: 22
  * Red: 17
  * Orange: 59
* **Goal**: Check if the sample supports the theory using the chi-square test.

---

### ‚úÖ **Key Terms:**

* **Theoretical (Expected) Distribution**: Based on assumption/theory
* **Observed Distribution**: Actual data collected from a sample

---

### ‚úÖ **Example 2: Left-handed Students**

* **Total students** = 75
* **Observed**:

  * Left-handed = 11
  * Right-handed = 64
* **Theory**: 12% of people are left-handed
* **Expected**:

  * Left-handed = 12% of 75 = **9**
  * Right-handed = 75 - 9 = **66**

üî∏ Chi-Square test will be applied to compare these values and determine if the sample **fits the theory**.

---

### üí° **Conclusion:**

> The Chi-Square Test checks whether **sample proportions** match **expected proportions** for **categorical variables**. It helps test if a **theory** holds true for a given sample.



---

## üìå **Topic: Chi-Square Test for Goodness of Fit ‚Äì Solved Example**

### üéØ **Objective:**

To determine if **population weights in 2020** are **significantly different** from the **2010 distribution** using the **Chi-Square Goodness of Fit Test** at **Œ± = 0.05**.

---

### üßÆ **Step-by-Step Summary**

#### ‚úÖ **Given Data:**

* **2010 Expected Proportions**:

  * < 50 kg ‚Üí 20%
  * 50‚Äì75 kg ‚Üí 30%
  * > 75 kg ‚Üí 50%

* **2020 Observed Data** (from sample of 500):

  * < 50 kg ‚Üí 140
  * 50‚Äì75 kg ‚Üí 160
  * > 75 kg ‚Üí 200

---

#### üßæ **Step 1: Hypotheses**

* **Null Hypothesis (H‚ÇÄ)**: The 2020 weight distribution matches the 2010 distribution.
* **Alternate Hypothesis (H‚ÇÅ)**: The 2020 distribution does **not** match the 2010 distribution.

---

#### üßæ **Step 2: Expected Counts (from 2010 proportions):**

$$
\text{Expected} = \text{Proportion} \times \text{Total sample size (500)}
$$

* < 50 kg ‚Üí 0.2 √ó 500 = **100**
* 50‚Äì75 kg ‚Üí 0.3 √ó 500 = **150**
* > 75 kg ‚Üí 0.5 √ó 500 = **250**

---

#### üßæ **Step 3: Test Statistic Calculation**

Using the formula:

$$
\chi^2 = \sum \frac{(O - E)^2}{E}
$$

Where O = Observed, E = Expected:

$$
\chi^2 = \frac{(140-100)^2}{100} + \frac{(160-150)^2}{150} + \frac{(200-250)^2}{250}
= \frac{1600}{100} + \frac{100}{150} + \frac{2500}{250}
= 16 + 0.66 + 10 = \mathbf{26.66}
$$

---

#### üßæ **Step 4: Decision Rule**

* **Degrees of Freedom (df)** = categories ‚àí 1 = 3 ‚àí 1 = **2**
* **Critical value from Chi-Square table** at Œ± = 0.05 and df = 2:

  $$
  \chi^2_{\text{critical}} = \mathbf{5.991}
  $$

---

#### ‚úÖ **Step 5: Conclusion**

Since

$$
\chi^2_{\text{calculated}} = 26.66 > 5.991
$$

We **reject the null hypothesis**.

---

### üü¢ **Final Interpretation:**

> There is **significant evidence** to conclude that the **weight distribution in 2020** is **different** from the **2010 distribution**. The population has changed in terms of weight.

---


In [1]:
from scipy.stats import chisquare

# Observed values from 2020 sample
observed = [140, 160, 200]

# Expected proportions from 2010 census
expected_proportions = [0.2, 0.3, 0.5]

# Total sample size in 2020
sample_size = 500

# Expected frequencies based on 2010 proportions
expected = [p * sample_size for p in expected_proportions]

# Perform Chi-Square Goodness of Fit Test
chi_statistic, p_value = chisquare(f_obs=observed, f_exp=expected)

# Print the results
print("Chi-Square Statistic:", round(chi_statistic, 4))
print("p-value:", round(p_value, 4))

# Decision
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: The 2020 distribution is different from 2010.")
else:
    print("Fail to reject the null hypothesis: No significant change in distribution.")


Chi-Square Statistic: 26.6667
p-value: 0.0
Reject the null hypothesis: The 2020 distribution is different from 2010.
