# Chi-Square Test

The Chi-Square test is used to measure:
1. **Independence of two categorical variables**: to see if one variable affects the other.  
   *Example*: Does the gender of a person determine the type of chocolate they prefer?
   
2. **Goodness of Fit**: to see if observed values match the expected distribution.

The test statistic used is the **Chi-Square Statistic**.
The formula is:

$$
X^2 = \sum \frac{(O_i - E_i)^2}{E_i}
$$

Where:
- **\( O_i \)** and **\( E_i \)** are the observed and expected values for the \( i \)-th term.

**Null Hypothesis (H₀)**: The categorical variables are independent (no association).

**Decision Rule**:
- If **p-value < significance level** → Reject \( H_0 \)
- or if **\( X^2 \) value > critical value** → Reject \( H_0 \)

Otherwise, we fail to reject \( H_0 \).


In [1]:
# Import necessary libraries
import numpy as np
from scipy.stats import chisquare,chi2


In [2]:

# Set significance level
alpha = 0.10

# Consider a die is thrown 600 times with following results.
observed = np.array([115, 97, 91, 101, 110, 86])

# Expected frequency for each face (if die is fair)
total_rolls = 600
num_faces = 6
expected = np.full(num_faces, total_rolls / num_faces)

# Degrees of freedom (k-1)
dof = num_faces - 1

print("Observed Frequencies:", observed)
print("Expected Frequencies (Fair Die):", expected)

# Calculate the critical value
critical_value = chi2.ppf(1 - alpha, dof)


Observed Frequencies: [115  97  91 101 110  86]
Expected Frequencies (Fair Die): [100. 100. 100. 100. 100. 100.]


In [3]:
# Perform Chi-Square Goodness of Fit Test
chi2_stat, p_val = chisquare(f_obs=observed, f_exp=expected)

print("\nChi-Square Statistic:", chi2_stat)
print(f"Critical Value at {alpha*100}% significance level with {dof} dof:", critical_value)




Chi-Square Statistic: 6.12
Critical Value at 10.0% significance level with 5 dof: 9.236356899781123


In [4]:
# Decision based on critical value comparison
if chi2_stat < critical_value:
    print(f"\nSince Chi-Square Statistic ({chi2_stat}) < Critical Value ({critical_value}), we fail to reject the null hypothesis.")
    print("Conclusion: The die appears to be fair (unbiased) at the 10% significance level.")
else:
    print(f"\nSince Chi-Square Statistic ({chi2_stat}) > Critical Value ({critical_value}), we reject the null hypothesis.")
    print("Conclusion: The die may be biased.")



Since Chi-Square Statistic (6.12) < Critical Value (9.236356899781123), we fail to reject the null hypothesis.
Conclusion: The die appears to be fair (unbiased) at the 10% significance level.


Since X2(6.12)<Critical value(9.24) we fail to reject null hypothesis and conclude die is fair(unbiased) at 10% significance level.

### Comparing with p value

In [5]:
chi2_stat, p_val = chisquare(f_obs=observed, f_exp=expected)
print("p-value:",p_val)

p-value: 0.29471693654506914


As we can see 0.29(p-val)>0.10(significance level)  we fail to reject the null hypothesis and the same conclusion follows.

# The Chi-Square Test for Homogeneity

The Chi-Square Test for Homogeneity is used to determine if different populations have the same distribution of a categorical variable. In this case, we want to test the null hypothesis that the proportions of pins that are categorized as "too thin," "OK," or "too thick" are the same across multiple machines.

## Steps to Perform the Chi-Square Test for Homogeneity

### 1. State the Hypotheses

- **Null Hypothesis (H₀)**: The proportions of pins that are too thin, OK, or too thick are the same for all machines.
- **Alternative Hypothesis (H₁)**: The proportions of pins that are too thin, OK, or too thick are not the same for all machines.

### 2. Data Collection

Collect the counts of pins from each machine for each category. For illustration, let’s assume the following data:

| Machine   | Too Thin | OK | Too Thick |
|-----------|----------|----|-----------|
| Machine 1 | 30       | 50 | 20        |
| Machine 2 | 40       | 60 | 10        |
| Machine 3 | 20       | 40 | 30        |

### 3. Create the Contingency Table

The data is arranged in a contingency table format to perform the Chi-Square test.

### 4. Perform the Chi-Square Test

Using Python, the Chi-Square test can be performed using the `scipy.stats.chi2_contingency` function.



In [6]:

import numpy as np
from scipy import stats

# Sample data: counts of pins for each machine and category
data = np.array([[30, 50, 20],   # Machine 1
                 [40, 60, 10],   # Machine 2
                 [20, 40, 30]])  # Machine 3

# Perform Chi-Square Test
chi2_stat, p_value, dof, expected = stats.chi2_contingency(data)

# Output the results
print(f"Chi-Square Statistic: {chi2_stat}")
print(f"P-Value: {p_value}")
print(f"Degrees of Freedom: {dof}")
print(f"Expected Frequencies:\n{expected}")

# Determine the conclusion
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: Proportions are not the same across machines.")
else:
    print("Fail to reject the null hypothesis: Proportions are the same across machines.")


Chi-Square Statistic: 18.855218855218855
P-Value: 0.0008391237323066545
Degrees of Freedom: 4
Expected Frequencies:
[[30. 50. 20.]
 [33. 55. 22.]
 [27. 45. 18.]]
Reject the null hypothesis: Proportions are not the same across machines.
