This code illustrates how to calculate the confidence inteval for the difference in means if the sample sizes are small.

The assumptions are that the data drawn from two normal independent distributions that have the same variance.

The problem is as follows:

To reach maximum eﬀiciency in performing an assembly operation in a manufacturing plant, new employees require approximately a 1-month training period. A new method of training was suggested, and a test was conducted to compare the new method with the standard procedure. Two groups of nine new employees each were trained for a period of 3 weeks, one group using the new method and the other following the standard training procedure. The length of time (in minutes) required for each employee to assemble the device was recorded at the end of the 3-week period.

 Estimate the true mean difference ($\mu_1 - \mu_2$) with confidence coeﬀicient $.95$. Assume that the assembly times are approximately normally distributed, that the variances of the assembly times are approximately equal for the two methods, and that the samples are independent.

In [None]:
import numpy as np
import scipy.stats as stats

# Example data for 9 employees in each group
# Group 1 (new method)
group1 = np.array([20.2, 22.1, 18.6, 21.3, 19.8, 23.0, 21.7, 17.9, 20.4])

# Group 2 (standard method)
group2 = np.array([25.4, 24.2, 28.1, 27.6, 29.0, 25.3, 23.9, 24.1, 26.5])

# Sample sizes
n1 = len(group1)
n2 = len(group2)

# Sample means
mean1 = np.mean(group1)
mean2 = np.mean(group2)

# Sample variances (unbiased, so use ddof=1)
var1 = np.var(group1, ddof=1)
var2 = np.var(group2, ddof=1)

# Pooled variance, assuming equal variances
sp_squared = ((n1 - 1)*var1 + (n2 - 1)*var2) / (n1 + n2 - 2)
sp = np.sqrt(sp_squared)

# Mean difference
mean_diff = mean1 - mean2

# Standard error for the difference in means under equal variances
se_diff = sp * np.sqrt((1/n1) + (1/n2))

# Degrees of freedom
df = n1 + n2 - 2

# Confidence level
alpha = 0.05  # for a 95% confidence interval
t_crit = stats.t.ppf(1 - alpha/2, df)

# Confidence interval
ci_lower = mean_diff - t_crit * se_diff
ci_upper = mean_diff + t_crit * se_diff

print(f"Group 1 mean (new method): {mean1:.3f} minutes")
print(f"Group 2 mean (standard):   {mean2:.3f} minutes")
print(f"Mean difference (new - standard): {mean_diff:.3f} minutes")
print(f"95% Confidence Interval for the difference (μ1 - μ2):")
print(f"[{ci_lower:.3f}, {ci_upper:.3f}]")


Group 1 mean (new method): 20.556 minutes
Group 2 mean (standard):   26.011 minutes
Mean difference (new - standard): -5.456 minutes
95% Confidence Interval for the difference (μ1 - μ2):
[-7.225, -3.686]
