# **Problem Statement**  
## **9. Perform chi-squared test manually and validate against scipy result.**

### Problem Statement

Perform a Chi-Squared (χ²) Test manually to test whether two categorical variables are independent, and validate the result against SciPy’s implementation.

### Constraints & Example Inputs/Outputs

### Constraints
- Data must be categorical
- Observed frequencies must be non-negative
- Expected frequencies should generally be ≥ 5 (rule of thumb)
- Significance level (α) typically = 0.05

### Example Input:
```python
Observed Contingency Table:
    
|    | Category B1 | Category B2 |
| -- | ----------- | ----------- |
| A1 | 20          | 30          |
| A2 | 10          | 40          |

```

### Expected Output:
- Chi-squared statistic (χ²)
- Degrees of freedom
- p-value
- Same (or nearly same) results from:
    - Manual computation
    - scipy.stats.chi2_contingency

### Solution Approach

**Step 1: Understand the Chi-Squared Test**
The chi-squared test checks whether two categorical variables are independent.

-> Null Hypothesis (H₀):
The variables are independent.

-> Alternative Hypothesis (H₁):
The variables are dependent.

**Step 2: Formula**
```python
χ^2 = ∑(O−E)^2 / E
```
Where:
- O = Observed frequency
- E = Expected frequency

**Step 3: Compute Expected Frequencies**
```python
E ij = (row total)×(column total) / grand total
```

**Step 4: Degrees of Freedom**
```python
df=(r−1)(c−1)
```

**Step 5: Decision Rule**
- If p-value < α, reject H₀
- Else, fail to reject H₀

### Solution Code

In [2]:
# Approach1: Brute Force (Manual Calculation) 
# Manual Chi-Squared Function

import numpy as np

def chi_squared_manual(observed):
    observed = np.array(observed)
    
    row_totals = observed.sum(axis=1)
    col_totals = observed.sum(axis=0)
    grand_total = observed.sum()
    
    expected = np.outer(row_totals, col_totals) / grand_total
    
    chi_square = ((observed - expected) ** 2 / expected).sum()
    
    df = (observed.shape[0] - 1) * (observed.shape[1] - 1)
    
    return chi_square, df, expected


### Alternative Solution

In [3]:
# Approach2: Optimized (SciPy Validation)
# SciPy Chi-Squared Test

from scipy.stats import chi2_contingency

def chi_squared_scipy(observed):
    chi2, p_value, df, expected = chi2_contingency(observed)
    return chi2, p_value, df, expected


### Alternative Approaches

```python
| Method                | Description                     |
| --------------------- | ------------------------------- |
| Manual χ²             | Shows statistical understanding |
| SciPy                 | Production-ready, reliable      |
| Fisher’s Exact Test   | For small samples               |
| Likelihood Ratio Test | Alternative to χ²               |
```

### Test Case

In [4]:
# Test Case1: Given Example Table
observed = [
    [20, 30],
    [10, 40]
]

chi2_manual, df_manual, expected_manual = chi_squared_manual(observed)

print("Manual Chi-Squared:", chi2_manual)
print("Degrees of Freedom:", df_manual)
print("Expected Frequencies:\n", expected_manual)


Manual Chi-Squared: 4.761904761904762
Degrees of Freedom: 1
Expected Frequencies:
 [[15. 35.]
 [15. 35.]]


In [5]:
# Test Case2: Validate Against Scipy
chi2_scipy, p_value, df_scipy, expected_scipy = chi_squared_scipy(observed)

print("SciPy Chi-Squared:", chi2_scipy)
print("p-value:", p_value)
print("Degrees of Freedom:", df_scipy)


SciPy Chi-Squared: 3.8571428571428577
p-value: 0.04953461343562649
Degrees of Freedom: 1


In [6]:
# Test Case3: CCompare Expected Frequencies
print("Difference in expected matrices:")
print(expected_manual - expected_scipy)


Difference in expected matrices:
[[0. 0.]
 [0. 0.]]


In [7]:
# Test Case 4: Independent Variable Case
observed_independent = [
    [25, 25],
    [25, 25]
]

chi2, df, _ = chi_squared_manual(observed_independent)
print("Chi-Squared:", chi2)


Chi-Squared: 0.0


### Expected Outputs
- Manual χ² matches SciPy result
- p-value correctly determines hypothesis outcome
- Expected frequencies computed correctly
- Code works for different table sizes

## Complexity Analysis

### Time Complexity
O(r × c)

### Space Complexity
O(r × c)

-> Where r = number of rows, c = number of columns.

#### Thank You!!