# Chi-Square Test for Independence
Association between Device Type and Customer Satisfaction


## Problem Setup
Mizzare Corporation wants to determine if there's a significant association between device type (Smart Thermostat vs Smart Light) and customer satisfaction level.


## Contingency Table


In [1]:
import numpy as np
from scipy import stats

observed = np.array([
    [50, 70],
    [80, 100],
    [60, 90],
    [30, 50],
    [20, 50]
])

row_labels = ['Very Satisfied', 'Satisfied', 'Neutral', 'Unsatisfied', 'Very Unsatisfied']
col_labels = ['Smart Thermostat', 'Smart Light']

print("Observed Frequencies:")
print(f"{'Satisfaction':<20} {'Smart Thermostat':<20} {'Smart Light':<20} {'Total':<10}")
print("-" * 70)
for i, label in enumerate(row_labels):
    row_total = observed[i].sum()
    print(f"{label:<20} {observed[i][0]:<20} {observed[i][1]:<20} {row_total:<10}")
col_totals = observed.sum(axis=0)
grand_total = observed.sum()
print("-" * 70)
print(f"{'Total':<20} {col_totals[0]:<20} {col_totals[1]:<20} {grand_total:<10}")


Observed Frequencies:
Satisfaction         Smart Thermostat     Smart Light          Total     
----------------------------------------------------------------------
Very Satisfied       50                   70                   120       
Satisfied            80                   100                  180       
Neutral              60                   90                   150       
Unsatisfied          30                   50                   80        
Very Unsatisfied     20                   50                   70        
----------------------------------------------------------------------
Total                240                  360                  600       


## Step 1: State the Hypotheses


In [2]:
print("Null Hypothesis (H0): There is no association between device type and customer satisfaction level.")
print("Alternative Hypothesis (H1): There is an association between device type and customer satisfaction level.")
print("Significance level: α = 0.05")


Null Hypothesis (H0): There is no association between device type and customer satisfaction level.
Alternative Hypothesis (H1): There is an association between device type and customer satisfaction level.
Significance level: α = 0.05


## Step 2: Compute the Chi-Square Statistic


In [3]:
row_totals = observed.sum(axis=1)
col_totals = observed.sum(axis=0)
grand_total = observed.sum()

expected = np.zeros_like(observed, dtype=float)
for i in range(len(row_totals)):
    for j in range(len(col_totals)):
        expected[i, j] = (row_totals[i] * col_totals[j]) / grand_total

print("Expected Frequencies:")
print(f"{'Satisfaction':<20} {'Smart Thermostat':<20} {'Smart Light':<20}")
print("-" * 60)
for i, label in enumerate(row_labels):
    print(f"{label:<20} {expected[i][0]:<20.2f} {expected[i][1]:<20.2f}")


Expected Frequencies:
Satisfaction         Smart Thermostat     Smart Light         
------------------------------------------------------------
Very Satisfied       48.00                72.00               
Satisfied            72.00                108.00              
Neutral              60.00                90.00               
Unsatisfied          32.00                48.00               
Very Unsatisfied     28.00                42.00               


In [4]:
chi_square_stat = np.sum((observed - expected) ** 2 / expected)

print(f"\nChi-Square Statistic Calculation:")
print(f"χ² = Σ((Observed - Expected)² / Expected)")
print(f"χ² = {chi_square_stat:.4f}")

print(f"\nContribution of each cell:")
print(f"{'Satisfaction':<20} {'Smart Thermostat':<20} {'Smart Light':<20}")
print("-" * 60)
for i, label in enumerate(row_labels):
    contrib1 = (observed[i, 0] - expected[i, 0]) ** 2 / expected[i, 0]
    contrib2 = (observed[i, 1] - expected[i, 1]) ** 2 / expected[i, 1]
    print(f"{label:<20} {contrib1:<20.4f} {contrib2:<20.4f}")



Chi-Square Statistic Calculation:
χ² = Σ((Observed - Expected)² / Expected)
χ² = 5.6382

Contribution of each cell:
Satisfaction         Smart Thermostat     Smart Light         
------------------------------------------------------------
Very Satisfied       0.0833               0.0556              
Satisfied            0.8889               0.5926              
Neutral              0.0000               0.0000              
Unsatisfied          0.1250               0.0833              
Very Unsatisfied     2.2857               1.5238              


## Step 3: Determine the Critical Value


In [5]:
alpha = 0.05
rows = observed.shape[0]
cols = observed.shape[1]
degrees_of_freedom = (rows - 1) * (cols - 1)

chi_square_critical = stats.chi2.ppf(1 - alpha, degrees_of_freedom)

print(f"Significance level (α): {alpha}")
print(f"Degrees of freedom: df = (rows - 1) × (cols - 1) = ({rows} - 1) × ({cols} - 1) = {degrees_of_freedom}")
print(f"Critical value (χ²_critical): {chi_square_critical:.4f}")
print(f"Rejection region: χ² > {chi_square_critical:.4f}")


Significance level (α): 0.05
Degrees of freedom: df = (rows - 1) × (cols - 1) = (5 - 1) × (2 - 1) = 4
Critical value (χ²_critical): 9.4877
Rejection region: χ² > 9.4877


## Step 4: Make a Decision


In [6]:
if chi_square_stat > chi_square_critical:
    decision = "Reject H0"
    conclusion_text = "There is a significant association between device type and customer satisfaction level."
else:
    decision = "Fail to reject H0"
    conclusion_text = "There is no significant association between device type and customer satisfaction level."

print(f"Chi-Square statistic: {chi_square_stat:.4f}")
print(f"Critical value: {chi_square_critical:.4f}")
print(f"Decision: {decision}")
print(f"\nReason: Since χ² = {chi_square_stat:.4f} is {'greater' if chi_square_stat > chi_square_critical else 'not greater'} than χ²_critical = {chi_square_critical:.4f}, we {decision.lower()}.")


Chi-Square statistic: 5.6382
Critical value: 9.4877
Decision: Fail to reject H0

Reason: Since χ² = 5.6382 is not greater than χ²_critical = 9.4877, we fail to reject h0.


## Conclusion


In [7]:
p_value = 1 - stats.chi2.cdf(chi_square_stat, degrees_of_freedom)
print(f"P-value: {p_value:.6f}")
print(f"\nConclusion: {conclusion_text}")
print(f"At α = {alpha}, the chi-square statistic of {chi_square_stat:.4f} {'falls in' if chi_square_stat > chi_square_critical else 'does not fall in'} the rejection region.")


P-value: 0.227844

Conclusion: There is no significant association between device type and customer satisfaction level.
At α = 0.05, the chi-square statistic of 5.6382 does not fall in the rejection region.
