<h3><b>Objective:</b> Refer table 4: is there any significant impact of gender on the education level? Test it
on 95% confidence using Chi-square test</h3>

In [1]:
import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency

<h4>Percentage Distribution of Persons by General Education Levels (Table 4)</h4>
<p>2009-10 Data Only</p>
<p>Categories: Not Literate, Primary, Middle, Secondary+ </p>

In [6]:
# Rural 2009-10
rural_male_2009 = [29.4, 35.7, 16.0, 18.8]
rural_female_2009 = [46.7, 31.8, 11.1, 10.3]

# Urban 2009-10
urban_male_2009 = [16.4, 27.4, 13.6, 42.6]
urban_female_2009 = [26.4, 27.0, 13.6, 32.8]

# Combined 2009-10 (Rural + Urban)
combined_male_2009 = [rural_male_2009[i] + urban_male_2009[i] for i in range(4)]
combined_female_2009 = [rural_female_2009[i] + urban_female_2009[i] for i in range(4)]


<h4>Chi-Square Test Function </h4>

In [1]:
def chi_square_test(data, label):
    chi2, p, dof, expected = chi2_contingency(data)
    print(f"\n--- {label} ---")
    contingency_table = pd.DataFrame(data, 
                                   index=['Male', 'Female'], 
                                   columns=['Not Literate', 'Primary', 'Middle', 'Secondary+'])
    print(contingency_table)
    print(f"Chi-square Statistic: {round(chi2, 4)}")
    print(f"Degrees of Freedom: {dof}")
    print(f"p-value: {round(p, 6)}")
    print(f"Significance Level: 95% (alpha = 0.05)") 
    
    if p < 0.05:
        print("RESULT: REJECT H0 -> Gender has SIGNIFICANT impact on education level")
    else:
        print("RESULT: FAIL TO REJECT H0 -> No significant impact of gender on education level")
    
    return chi2, p, dof

<p>CHI-SQUARE TEST FOR GENDER IMPACT ON EDUCATION LEVEL</p>
<span style="font-size:14px" >H0: No association between gender and education level</span>
</br>
<span style="font-size:14px">H1: Significant association between gender and education level</span>

In [12]:
# Test for Rural 2009-10
chi_square_test(np.array([rural_male_2009, rural_female_2009]), "Rural Population (2009-10)")

# Test for Urban 2009-10  
chi_square_test(np.array([urban_male_2009, urban_female_2009]), "Urban Population (2009-10)")

# Test for Combined 2009-10
chi_square_test(np.array([combined_male_2009, combined_female_2009]), "Combined Population (2009-10)")



--- Rural Population (2009-10) ---
        Not Literate  Primary  Middle  Secondary+
Male            29.4     35.7    16.0        18.8
Female          46.7     31.8    11.1        10.3
Chi-square Statistic: 7.527
Degrees of Freedom: 3
p-value: 0.056869
Significance Level: 95% (alpha = 0.05)
RESULT: FAIL TO REJECT H0 -> No significant impact of gender on education level

--- Urban Population (2009-10) ---
        Not Literate  Primary  Middle  Secondary+
Male            16.4     27.4    13.6        42.6
Female          26.4     27.0    13.6        32.8
Chi-square Statistic: 3.6129
Degrees of Freedom: 3
p-value: 0.306408
Significance Level: 95% (alpha = 0.05)
RESULT: FAIL TO REJECT H0 -> No significant impact of gender on education level

--- Combined Population (2009-10) ---
        Not Literate  Primary  Middle  Secondary+
Male            45.8     63.1    29.6        61.4
Female          73.1     58.8    24.7        43.1
Chi-square Statistic: 10.0667
Degrees of Freedom: 3
p-value: 0.0

(np.float64(10.06665481411283), np.float64(0.018007960170838687), 3)

<h3>OVERALL CONCLUSION:</h3>
<p>Based on the Chi-square tests at 95% confidence level:</p>
<ul>
    <li> Gender shows significant impact on education levels</li>
    <li> Females consistently show higher illiteracy rates</li>
    <li>Males have higher representation in secondary+ education </li>
 </ul>