In [None]:
import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency

Assume that the occurrence of defective widgets $X$ may be described as an i.i.d. sequence of Bernoulli trials, each with probability $0.03$ of defect; that is, $X\ \sim Bin\left(\ n,\ p=0.03\ \right)$. Addionally, assume that the sampling rate is $200$ widgets per hour and that the production rate is $1000$ widgets per hour.

(*Note: Here, I modified the sample rate from $50$ widgets per hour to eliminate the problem of observing fractional defect counts.*)

# Question:
Over three consecutive hours, the defect rates observed are 2%, 3.5%, and 1.5%. Analyze the trend and propose potential reasons for variations.

# Solution

For hour $i\in \{1,2,3\}$, let $\hat{p}_{i}$ be the sample proportion of defects over $i$, $n_i$ be the number of samples over $i$, and $x_i$ be the number of defects over $i$. But $\hat{p}_{1}=0.02$, $\hat{p}_{2}=0.035$, $\hat{p}_{3}=0.015$, and $n_1 = n_2 = n_3 = 200$, from which it follows that the number of defects are $x_{1}=4$, $x_{2}=7$, and $x_{3}=3$ and the number of non-defects are $n_{1} - x_{1}=196$, $n_{2} - x_{2}=193$, and $n_{3} - x_{3}=197$.



In [None]:
data = {
    'Hr 1':[4, 196],
    'Hr 2':[7, 193],
    'Hr 3':[3, 197],
}

df = pd.DataFrame(data, index=['Defective', 'Non-Defective'])
print(df)

               Hr 1  Hr 2  Hr 3
Defective         4     7     3
Non-Defective   196   193   197


## $\chi^{2}$ Test for Homogeneity

Using the $\chi^{2}$ Test for Homogeneity, we test the null hypothesis

$$ H_{0} : p_{1} = p_{2} \text{ and } p_{2} = p_{3} \text{ and } p_{1} = p_{3}$$

against the alternative hypothesis

$$ H_{A} : p_{1} \neq p_{2} \text{ or } p_{2} \neq p_{3} \text{ or } p_{1} \neq p_{3}$$

at the $\alpha = 0.05$ level of significance.  

In [None]:
chi2, p_val, dof, expected = chi2_contingency(df)

print(f"\nChi-squared statistic: {chi2:.4f}")
print(f"Degrees of freedom: {dof}")
print(f"P-value: {p_val:.4f}")

expected_df = pd.DataFrame(expected, index=df.index, columns=df.columns)
print("\nExpected Frequencies:")
print(expected_df)


Chi-squared statistic: 1.9015
Degrees of freedom: 2
P-value: 0.3864

Expected Frequencies:
                     Hr 1        Hr 2        Hr 3
Defective        4.666667    4.666667    4.666667
Non-Defective  195.333333  195.333333  195.333333
