In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats
import plotly.express as px
%matplotlib inline

### Two independent sample z-test for equality of means

**Assumptions:**
* Normally distributed populations or Sample sizes > 30 - Since the sample sizes are greater than 30, Central Limit Theorem states that the distribution of sample means will be normal.
* Independent populations
* Known population standard deviation  𝜎1 and  𝜎2
* Random sampling from the population

**Example 1:** A school wants to know if there is a significant difference in exam scores between students taught using Method A and those taught using Method B.
* Sample Size for Method A (n1)=40
* Mean Score for Method A $(X_1=78)$
* Standard Deviation for Method A $(\sigma_1=10)$
* Sample Size for Method A (n2) =50
* Mean Score for Method A $(X_2=82)$
* Standard Deviation for Method A $(\sigma_2=9)$


Testing the null hypothesis 

>$H_0:\mu_1=\mu_2$

against the alternate hypothesis

>$H_1:\mu_1\neq\mu_2$

In [2]:
#as it satisfies all the assumptions
#checking the hypothesis using p value approach
#Data for Method A and Method B
x1, sigma1, n1 = 78, 10, 40
x2, sigma2, n2 = 82, 9, 50
#z-test
z_stat=(x1-x2)/np.sqrt((sigma1**2/n1)+(sigma2**2/n2))
p_value=2*(1-stats.norm.cdf(abs(z_stat)))
p_value

0.048762943958365224

since p value is less than alpha we reject the null hypothesis

In [3]:
#checking the hypothesis using critical value approach
alpha=0.05
critical=stats.norm.ppf(1-alpha/2)
critical

1.959963984540054

In [4]:
z_stat

-1.9706585563285863

since abs(z stat) is greater than the critical value we reject the null hypothesis

**Example 2:** A study compares the average daily steps taken by users of Fitness App X and Fitness App Y.

* Sample Size for App X $(n_1)$=100
* Mean Daily Steps for App X $(\bar{x_1})$=7500
* Standard deviation for App X $(\sigma_1)$=1200
* Sample Size for App Y $(n_2)$=120
* Mean Daily Steps for App Y $(\bar{x_2})$=7200
* Standard deviation for App Y $(\sigma_2)$=1150



Testing the null hypothesis

>$H_0:\mu_1=\mu_2$

against the alternate hypothesis

>$H_1:\mu_1\neq\mu_2$

In [5]:
#It satisfies all the assumptions
#Data for App X and App Y
x1, sigma1, n1 = 7500, 1200, 100
x2, sigma2, n2 = 7200, 1150, 120
#z test
z_stat=(x1-x2)/np.sqrt((sigma1**2/n1)+(sigma2**2/n2))
#p_value
p_value=2*(1-stats.norm.cdf(abs(z_stat)))
p_value

0.059890900387221224

since p value is greater than alpha we fail reject the null hypothesis

In [6]:
#checking the hypothesis using critical value
critical=stats.norm.ppf(1-alpha/2)
critical

1.959963984540054

In [7]:
z_stat

1.8815959217079794

since z_stat is less than alpha we fail to reject the null hypothesis

**Example 3:** A bank compares the average wait times at Branch A and Branch B.
* Sample Size for Branch A $(n_1)$ = 80
* Mean wait time for Branch A $(\bar{x_1})$ =15 mins
* Standard deviation for Branch A $(\sigma_1)$ = 3 mins
* Sample Size for Branch B $(n_2)$ = 75
* Mean wait time for Branch B $(\bar{x_2})$ = 14 mins
* Standard deviation for Branch B $(\sigma_2)$ = 2.5 mins


Testing the null hypothesis

>$H_0:\mu_1=\mu_2$

against the alternate hypothesis

>$H_1:\mu_2\neq\mu_2$

In [8]:
#data
x1, sigma1, n1 = 15, 3, 80
x2, sigma2, n2 = 14, 2.5, 75
# Z-test calculation
z_stat = (x1 - x2) / np.sqrt((sigma1**2 / n1) + (sigma2**2 / n2))
#p_value
p_value = 2 * (1 - stats.norm.cdf(abs(z_stat)))
p_value

0.023837967768800317

since the p value is less than alpha we reject the null hypothesis

In [9]:
#checking the hypothesis using critical value
alpha=0.05
critical=stats.norm.cdf(1-alpha/2)
critical

0.8352198700196897

In [10]:
z_stat

2.259730731464128

since z stat is greater than the critical value we reject the null hypothesis

**Example 4:** A factory wants to compare the average weight of goods produced on Machine 1 and Machine 2 to ensure consistency.
* Sample Size for Machine 1 $(n_1)$ = 50
* Mean weight for Machine 1 $(\bar{x_1})$ =200g
* Standard deviation for Machine 1 $(\sigma_1)$ = 5g
* Sample Size for Machine 2 $(n_2)$ = 60
* Mean weight for Machine 2 $(\bar{x_2})$ = 198g
* Standard deviation for Machine 2 $(\sigma_2)$ = 4.5g

Testing the null hypothesis 

>$H_0:\mu_1=\mu_2$

against the alternate hypothesis

>$H_1:\mu_1\neq\mu_2$

In [11]:
#Data
x1, sigma1, n1 = 200, 5, 50
x2, sigma2, n2 = 198, 4.5, 60
#z_test
z_stat=(x1-x2)/np.sqrt((sigma1**2/n1)+(sigma2**2/n2))
#p_value
p_value=2*(1-stats.norm.cdf(z_stat))
p_value

0.028857079863545243

since the p value is less than alpha we reject the null hypothesis

In [12]:
#checking the hypothesis using critical value
alpha=0.05
critical=stats.norm.ppf(1-alpha/2)
critical

1.959963984540054

In [13]:
z_stat

2.185433458832612

since the z stat is greater than the critical value we reject the null hypothesis

**Example 5:**  A pharmaceutical company is interested in determining whether a new drug affects reaction times differently compared to an old drug. Researchers collected data on the reaction times (in seconds) from two independent groups of participants, one receiving the new drug and the other receiving the old drug.

* Sample Size for New drug $(n_1)$ = 35
* Mean Reaction time for New drug $(\bar{x_1})$ = 2.5 sec
* Standard deviation for New drug $(\sigma_1)$ = 0.6 sec
* Sample Size for Old drug $(n_2)$ = 30
* Mean Reaction time for Old drug $(\bar{x_2})$ = 3.0 secs
* Standard deviation for Old drug $(\sigma_2)$ = 0.5 sec

Testing the null hypothesis

>$H_0:\mu_1=\mu_2$

against the alternate hypothesis

>$H_1:\mu_1\neq\mu_2$

In [14]:
#Data
x1, sigma1, n1 = 2.5, 0.6, 35  
x2, sigma2, n2 = 3.0, 0.5, 30 
#z_stat
z_stat=(x1-x2)/np.sqrt((sigma1**2/n1)+(sigma2*2/n2))
#p_value
p_value=2*(1-stats.norm.cdf(abs(z_stat)))
p_value

0.01666380301317738

since the p value is less than alpha we reject the null hypothesis

In [15]:
#checking the hypothesis using critical value
alpha=0.05
critical=stats.norm.ppf(1-alpha/2)
critical

1.959963984540054

In [16]:
z_stat

-2.394042825021084

since the abs(z_stat) is greater than the criical value we reject the null hypothesis

**Example 6:**  A school district wants to evaluate whether students who participated in a special tutoring program scored higher on a standardized test than students who did not participate.

* Sample Size for Totoring $(n_1)$ = 40
* Mean score for Tutoring $(\bar{x_1})$ = 78
* Standard deviation for Tutoring $(\sigma_1)$ = 10
* Sample Size for No tutoring $(n_2)$ = 50
* Mean score for No tutoring $(\bar{x_2})$ = 74
* Standard deviation for No tutoring $(\sigma_2)$ = 12

Testing the null hypothesis

>$H_0:\mu_1≤\mu_2$

against the alternate hypothesis

>$H_1:\mu_1>\mu_2$

In [17]:
#Data
x1, sigma1, n1 = 78, 10, 40  
x2, sigma2, n2 = 74, 12, 50
# Z-test calculation
z_stat = (x1 - x2) / np.sqrt((sigma1**2 / n1) + (sigma2**2 / n2))
# P-value 
p_value = 1 - stats.norm.cdf(z_stat)
p_value

0.04230678447587721

since the p value is less than alpha we rejec the null hypothesis

In [18]:
#checking the hypothesis using critical value
alpha = 0.05
critical_value = stats.norm.ppf(1 - alpha)
critical_value

1.6448536269514722

In [19]:
z_stat

1.7245224542369075

since z stat is greater than the critical value we reject the null hypothesis

**Example 7:** A company wants to assess whether a new marketing campaign has resulted in higher sales($) compared to previous sales data.

* Sample Size for campaign $(n_1)$ = 30
* Mean sales for campign $(\bar{x_1})$ = 120,000
* Standard deviation for campaign $(\sigma_1)$ = 15,000
* Sample Size for Previous sales $(n_2)$ = 25
* Mean sales for Previous $(\bar{x_2})$ = 110,000
* Standard deviation for Previous sales $(\sigma_2)$ = 10,000

Testing the null hypothesis 

>$H_0:\mu_1≤\mu_2$

against the alternate hypothesis

>$H_1:\mu_1>\mu_2$

In [20]:
#Data
x1, sigma1, n1 = 120000, 15000, 30  
x2, sigma2, n2 = 110000, 10000, 25
# Z-test calculation
z_stat = (x1 - x2) / np.sqrt((sigma1**2 / n1) + (sigma2**2 / n2))
#P-value method 
p_value = 1 - stats.norm.cdf(z_stat)
p_value

0.0015948498531084265

since the p value is less than alpha we reject the null hypothesis

In [21]:
#critical value method
alpha = 0.05
critical_value = stats.norm.ppf(1 - alpha)
critical_value

1.6448536269514722

In [22]:
z_stat

2.948839123097943

since the z stat is greater than the critical value we reject the null hypothesis

**Example 8:** A company evaluates whether a new employee training program has resulted in higher productivity levels than the previous training program.

* Sample Size for new training $(n_1)$ = 50
* Mean productivity for new training $(\bar{x_1})$ = 85 tasks
* Standard deviation for new training $(\sigma_1)$ = 10
* Sample Size for old training $(n_2)$ = 45
* Mean productivity for old training $(\bar{x_2})$ = 78 tasks
* Standard deviation for old training $(\sigma_2)$ = 12

Testing the null hypothesis 

>$H_0:\mu_1≤\mu_2$

against the alternate hypothesis

>$H_1:\mu_1>\mu_2$

In [23]:
#Data for New Training and Old Training
x1, sigma1, n1 = 85, 10, 50  
x2, sigma2, n2 = 78, 12, 45
# Z-test calculation
z_stat = (x1 - x2) / np.sqrt((sigma1**2 / n1) + (sigma2**2 / n2))
#P-value method 
p_value = 1 - stats.norm.cdf(z_stat)
p_value

0.0010713584256426545

since the p value is less than alpha we reject the null hypothesis

In [24]:
#critical value approach
alpha = 0.05
critical_value = stats.norm.ppf(1 - alpha)
critical_value

1.6448536269514722

In [25]:
z_stat

3.069703067574602

since the z stat is greater than critical value we reject the null hypothesis

**Example 9:** An automotive company wants to know if a new car model is more fuel-efficient than the previous model, measured in miles per gallon (MPG).

* Sample Size for new model $(n_1)$ = 60
* Mean MPG for new model $(\bar{x_1})$ = 35 
* Standard deviation for new model $(\sigma_1)$ = 5
* Sample Size for old model $(n_2)$ = 55
* Mean MPG for old model $(\bar{x_2})$ = 32
* Standard deviation for old model $(\sigma_2)$ = 6

Testing the null hypothesis

>$H_0:\mu_1≤\mu_2$

against the alternate hypothesis

>$H_1:\mu_1>\mu_2$


In [26]:
#$Data for New Model and Old Model
x1, sigma1, n1 = 35, 5, 60  
x2, sigma2, n2 = 32, 6, 55 
# Z-test calculation
z_stat = (x1 - x2) / np.sqrt((sigma1**2 / n1) + (sigma2**2 / n2))
# P-value method 
p_value = 1 - stats.norm.cdf(z_stat)
p_value

0.0018743541589492096

since the p value is less than alpha we reject the null hypothesis

In [27]:
# Critical value approach
alpha = 0.05
critical_value = stats.norm.ppf(1 - alpha)
critical_value

1.6448536269514722

In [28]:
z_stat

2.898568148687969

since the z stat is greater than the critical value we reject the null hypothesis

**Example 10:** A fitness study investigates whether participants who engage in a new exercise regimen recover their heart rate faster than those who follow a traditional regimen.

* Sample Size for new exercise regimen $(n_1)$ = 25
* Mean recovery time for exercise regimen $(\bar{x_1})$ = 5.0 mins
* Standard deviation for new exercise regimen $(\sigma_1)$ = 1.0 mins
* Sample Size for traditional regimen $(n_2)$ = 30
* Mean Recovery time for traditional regimen $(\bar{x_2})$ = 6.0 mins
* Standard deviation for traditional regimen $(\sigma_2)$ = 1.2 mins

Testing the null hypothesis 

>$H_0:\mu_1≥\mu_2$

against the alternate hypothesis

>$H_1:\mu_1<\mu_2$

In [29]:
# Data for New Exercise and Traditional Regimen
x1, sigma1, n1 = 5.0, 1.0, 25  
x2, sigma2, n2 = 6.0, 1.2, 30  
# Z-test calculation
z_stat = (x1 - x2) / np.sqrt((sigma1**2 / n1) + (sigma2**2 / n2))

# P-value method 
p_value = stats.norm.cdf(abs(z_stat))
p_value

0.9996255197761805

since the p value is greater than alpha we fail to reject the null hypothesis

In [30]:
#critical value approach
alpha = 0.05
critical_value = stats.norm.ppf(alpha)
critical_value

-1.6448536269514729

In [31]:
z_stat

-3.3709993123162105

since the absolute z stat is less than the critical value we fail to reject the null hypothesis