In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import scipy.stats as stats
%matplotlib inline

### Two proportion z-test

**Assumptions of two proportion z-test :**

1. The samples are randomly selected from each population.
2. The two samples are independent of each other.
3.  Binomially outocme
4. The sample sizes should be large enough for the normal approximation to be valid specifically:
* $n_1p_0≥5$
* $n_1(1-p_0)≥5$
* $n_2p_0≥5$
* $n_2(1-p_0)≥5$





**Example 1:** A company wants to compare the customer satisfaction rate between two branches. At Branch A, 250 out of 400 customers reported being satisfied. At Branch B, 220 out of 350 customers reported satisfaction. Is there a significant difference in satisfaction rates between these two branches?

Testing the null hypothesis

>$H_0:p_1=p_2$

against the alternate hypothesis

>$H_1:p_1\neq p_2$

**Assumptions:**

* $n_1p_1=400X(250/400)=250$
* $n_1(1-p_1)=400X(150/400)=150$
* $n_2p_2=350X(220/350)=220$
* $n_2(1-p_2)=3500X(130/350)=130$

since all the values are greater than 5 it satisfies all the assumptions

**Calculate sample proportions:** 

* $p_1:250/400 =0.625$
* $p_2:220/350= 0.629$

**Calculate the pooled proportion:**
* $p= (250+220)/(400+350)=0.0627$

In [2]:
#performing the hypothesis using z-test

from statsmodels.stats.proportion import proportions_ztest

# Inputs
count = [250, 220]
nobs = [400, 350]

# Perform the test
stat, p_value = proportions_ztest(count, nobs, alternative='two-sided')
p_value

0.9196450559662122

since the p value is greater than the alpha we fail to reject the null hypothesis

In [3]:
#performing the hypothesis using critical value
alpha=0.05
critical_value= stats.norm.ppf(1-alpha/2)
critical_value

1.959963984540054

In [4]:
stat

-0.1008808362260768

since the z stat is less than the critical value we fail to reject the null hypothesis

**Example 2:** A business compares conversion rates on two different website layouts. Layout 1 converted 40 out of 150 visitors, while Layout 2 converted 50 out of 180 visitors. Is there a significant difference in conversion rates?

Testing the null hypothesis

>$H_0:p_1=p_2$

against the alternate hypothesis

>$H_1:p_1\neq p_2$

In [5]:
#testing the hypothesis using p value
# Data
count = [15, 20]  # successes
nobs = [300, 400]  # total samples

# Perform two-proportion Z-test
stat, p_value = proportions_ztest(count, nobs, alternative='two-sided')
p_value

1.0

since p value is greater than alpha we fail to reject the null hypothesis

In [6]:
#testing the hypothesis using critical value
alpha=0.05
critical_value=stats.norm.ppf(1-alpha/2)
critical_value

1.959963984540054

In [7]:
stat

0.0

since z stat is less than critical value we fail to reject the null hypothesis

**Example 3:** A manufacturer compares defect rates of two production lines. Line A had 15 defective items out of 300, while Line B had 20 defective items out of 400. Is there a difference in defect rates?

Testing the null hypothesis 

>$H_0:p_1=p_2$

against the alternate hypothesis

>$H_1:p_1\neq p_2$

In [8]:
#testing the hypothesis using p value
# Data
count = [15, 20]  # successes
nobs = [300, 400]  # total samples

stat, p_value = proportions_ztest(count, nobs, alternative='two-sided')
p_value

1.0

since p value is greater than alpha we fail to reject the null hypothesis

In [9]:
#testing the hypothesis using critical value
alpha=0.05
critical_value=stats.norm.ppf(1-alpha/2)
critical_value

1.959963984540054

In [10]:
stat

0.0

since z stat is less than critical value we fail to reject the null hypothesis

**Example 4:** A business compares conversion rates on two different website layouts. Layout 1 converted 40 out of 150 visitors, while Layout 2 converted 50 out of 180 visitors. Is there a significant difference in conversion rates?


Testing the null hypothesis

>$H_0:p_1=p_2$

against the alternate hypothesis

>$H_1:p_1\neq p_2$

In [11]:
#testing the hypothesis using p value
# Data
count = [40, 50]  # successes
nobs = [150, 180]  # total samples

# Perform two-proportion Z-test
stat, p_value = proportions_ztest(count, nobs, alternative='two-sided')
p_value

0.8214598556989368

since the p value is greater than alpha we fail to reject the nul, hypothesis

In [12]:
#testing the hypothesis using critical value
alpha=0.05
critical_value=stats.norm.ppf(1-alpha/2)
critical_value

1.959963984540054

In [13]:
stat

-0.22566773346211036

since absolute(z stat) is less than the critical value we fail to reject the null hypothesis

**Example 5:** A company tests two training methods. Method 1 resulted in 30 out of 80 employees passing, while Method 2 had 45 out of 100 employees pass. Is one method more effective?

Testing the null hypothesis

>$H_0:p_1=p_2$

against the alternate hypothesis

>$H_1:p_1\neq p_2$

In [14]:
#testing the hypothesis using critical value
# Data
count = [30, 45]  # successes
nobs = [80, 100]  # total samples

# Perform two-proportion Z-test
stat, p_value = proportions_ztest(count, nobs, alternative='two-sided')
p_value

0.31049443431723467

since p value is greater than alpha we fail to reject the null hypothesis

In [15]:
#testing the hypothesis using critical value
alpha=0.05
critical_value=stats.norm.ppf(1-alpha/2)
critical_value

1.959963984540054

In [16]:
stat

-1.0141851056742202

since absolute of z stat is less than the critical value we fail to reject the null hypothesis

**Example 6:** Two social media posts are tested for engagement. Post A received 500 likes out of 2000 views, while Post B received 400 likes out of 1500 views. Is there a significant difference in engagement?

Testing the null hypothesis

>$H_0:p_0=p_1$

against the alternate hypothesis

>$H_1:p_1\neq p_2$

In [17]:
#testing the hypothesis using critical value
#Data
count=[500,400] #no of successes
nobs=[2000,1500] #no of samples
#perform z-test
stat,p_value=proportions_ztest(count,nobs,alternative='two-sided')
p_value

0.2642337490158899

since p value is greater than alpha we fail to reject the null hypothesis

In [18]:
#testing the hypothesis using critical value
alpha=0.05
critical_value=stats.norm.ppf(1-alpha/2)
critical_value


1.959963984540054

In [19]:
stat

-1.116440211761806

since absolute of z stat is less than the critical value we fail to reject the null hypothesis

**Example 7:** In an election, candidate A was supported by 900 voters out of 2000 surveyed, and candidate B was supported by 800 voters out of 1800 surveyed. Is there a significant difference in the support rates?

Testing the null hypothesis

>$H_0:p_0=p_1$

against the alternate hypothesis

>$H_1:p_1\neq p_2$

In [20]:
#testing the hypothesis using p value
#data
count=[900,800] #no of successes
nobs=[2000,1800] # no of samples
#performing z test
stat,p_value=proportions_ztest(count,nobs,alternative="two-sided")
p_value

0.7309189550540995

since p value is greater than alpha we fail to reject the null hypothesis

In [21]:
#performing hypothesis using critical value
alpha=0.05
critical_value=stats.norm.ppf(1-alpha/2)
critical_value

1.959963984540054

In [22]:
stat

0.3439033731068037

since z stat is less than the critical value we fail to reject the null hypothesis

**Example 8:** A retail company wants to investigate whether customer satisfaction has dropped after a change in their return policy. Last year, 70% of customers were satisfied, and this year, 65% of customers are satisfied. Is the satisfaction rate significantly lower this year than last year?

Testing the null hypothesis

>$H_0:p_1≥p_2$

against the alternate hypothesis

>$H_1:p_1<p_2$

In [23]:
#testing the hypothesis using p value
#data
count = [650, 700]  # successes
nobs = [1000, 1000]  # total samples
stat, p_value = proportions_ztest(count, nobs, alternative='smaller')
p_value

0.00849210029106536

since p value is less than alpha we reject the null hypothesis

In [24]:
#testing the hypothesis using critical value
alpha=0.05
critical_value=stats.norm.ppf(alpha)
critical_value

-1.6448536269514729

In [25]:
stat

-2.38704958013144

since z stat is greater than the critical value we reject the null hypothesis

**Example 9:** A factory produces light bulbs, and historically, 2% of the bulbs produced were defective. After implementing a new quality control process, the factory wants to check whether the percentage of defective bulbs has decreased. A sample of 500 bulbs shows 8 defective bulbs. Has the defect rate decreased?

Testing the null hypothesis

>$H_0:p_1≥p_2$

against the alternate hypothesis

>$H_1:p_1<p_2$

In [26]:
#calculating the hypothesis using p value
#Data
count = [8, 10]  # successes
nobs = [500, 500]  # total samples
#z_stat
stat, p_value = proportions_ztest(count, nobs, alternative='smaller')
p_value

0.3171421687031628

since the p value is greater than alpha we fail to reject the null hypothesis

In [27]:
#testing the hypothesis using critical value
alpha=0.05
critical_value=stats.norm.ppf(alpha)
critical_value

-1.6448536269514729

In [28]:
stat

-0.475705310016425

since z stat is less than the critical value we fail to reject the null hypothesis

**Example 10:** A school wants to know if the percentage of students passing their final exams has increased compared to last year. Last year, 45% of students passed, and this year, 50% of students passed out of 400 students. Does the new pass rate indicate a significant increase?

Testing the null hypothesis

>$H_0:p_1≤p_2$

against the alternate hypothesis

>$H_1:p_1>p_2$

In [29]:
#testing the null hypothesis using p value
#Data
count = [200, 180]  # successes
nobs = [400, 400]  # total samples
#z-stat
stat, p_value = proportions_ztest(count, nobs, alternative='larger')
p_value

0.07838999925606568

since the p value is greater than alpha we fail to reject the null hypothesis

In [30]:
#testing the hypothesis using critical value
alpha=0.05
critical_value=stats.norm.ppf(1-alpha)
critical_value

1.6448536269514722

In [31]:
stat

1.415984650809577

since stat is less than critical value we fail to reject the null hypothesis

**Example 11:** In a company with branches in two regions, Region A retained 500 out of 1000 customers, while Region B retained 450 out of 1000 customers. Is Region A’s retention rate significantly greater than Region B’s?

Testing the null hypothesis

>$H_0:p_1≤p_2$

against the alternate hypothesis

>$H_1:p_1>p_2$

In [32]:
#testing the hypothesis using p value
#Data
count = [500, 450]  # successes
nobs = [1000, 1000]  # total samples
#z-stat
stat, p_value = proportions_ztest(count, nobs, alternative='larger')
p_value

0.012582242850159278

since p value less than alpha we reject the null hypothesis

In [33]:
#testing the hypothesis using critical value
alpha=0.05
critical_value = stats.norm.ppf(1 - alpha)
critical_value

1.6448536269514722

In [34]:
stat

2.2388683141982244

since z stat is greater than the critical value we reject the null hypothesis