# Lab: Hypothesis Testing (Z-Test) with Proportions

In [1]:
import math
from scipy import stats

### Question 1: One-Proportion z-test (Right-tailed Test)

**Scenario**: A university claims that at least 65% of its graduates secure a job
within six months of graduation. You survey 200 graduates and find that 120 of
them are employed within six months. At a 5% significance level, test if the
university's claim holds.

Null Hypothesis (H₀): p ≤ 0.65 (The proportion of graduates secure a job within six months of graduation is less than or equal 65%.)

Alternative Hypothesis (H₁): p > 0.65 (The proportion of graduates secure a job within six months of graduation greater than 65%.)

In [2]:
# Parameters
n = 200 # sample size
x = 120 # number of successes
p0 = 0.65 # hypothesized proportion

In [3]:
# Sample proportion
p_hat = x / n

In [4]:
# Standard error
se = math.sqrt((p0 * (1 - p0)) / n)

In [5]:
# z-test statistic
z = (p_hat - p0) / se

In [6]:
# p-value (Right-tailed test)
p_value = 1 - stats.norm.cdf(z)

In [7]:
# print values
print(f"z-value: {z}")
print(f"p-value: {p_value}")

z-value: -1.4824986333222037
p-value: 0.9308961665129872


In [8]:
# print conclusion
print(f"Since the p-value ({p_value}) is greater than 0.05, we fail to reject the null hypothesis. There is no significant evidence to refute the university's claim that at least 65% of its graduates secure a job")

Since the p-value (0.9308961665129872) is greater than 0.05, we fail to reject the null hypothesis. There is no significant evidence to refute the university's claim that at least 65% of its graduates secure a job


### Question 2: Two-Proportion z-test (Two-tailed Test)
**Scenario**: A sports team wants to compare the proportion of people attending
their home games to those attending away games. Out of 300 home games, 180
attendees showed up. Out of 250 away games, 140 attendees showed up. At a 5%
significance level, is there a significant difference between the proportions of
attendees at home and away games?

Null Hypothesis (H₀): p₁ = p₂ (The proportion of attendees at home games is equal to the proportion of attendees at away games.)

Alternative Hypothesis (H₁): p₁ ≠ p₂ (The proportion of attendees at home games is not equal to the proportion of attendees at away games.)

In [9]:
# Parameters
n1 = 300 # sample size group 1
x1 = 180 # successes group 1
n2 = 250 # sample size group 2
x2 = 140 # successes group 2

In [10]:
# Sample proportions
p1 = x1 / n1
p2 = x2 / n2

In [11]:
# Pooled proportion
p_pooled = (x1 + x2) / (n1 + n2)

In [12]:
# Standard error
se = math.sqrt(p_pooled * (1 - p_pooled) * ((1 / n1) + (1 / n2)))

In [13]:
# z-test statistic
z = (p1 - p2) / se

In [14]:
# p-value (two-tailed test)
p_value = 2 * (1 - stats.norm.cdf(abs(z)))

In [15]:
# print values
print(f"z-value: {z}")
print(f"p-value: {p_value}")

z-value: 0.9469631093314982
p-value: 0.3436575774939177


In [16]:
# print conclusion
print(f"Since the p-value ({p_value}) is greater than 0.05, we fail to reject the null hypothesis. There is no significant difference between the proportions of attendees at home and away games.")

Since the p-value (0.3436575774939177) is greater than 0.05, we fail to reject the null hypothesis. There is no significant difference between the proportions of attendees at home and away games.


### Question 3: One-Proportion z-test (Left-tailed Test)

**Scenario**: A school claims that at least 75% of its students pass a standardized
exam. You survey 150 students and find that 100 of them passed. Is the school's
claim valid at the 5% significance level?

Null Hypothesis (H₀): p ≤ 0.75 (The proportion of students passing the standardized exam is less than or equal to 75%.)

Alternative Hypothesis (H₁): p > 0.75 (The proportion of students passing the standardized exam is greater than 75%.)

In [17]:
# Parameters
n = 150 # sample size
x = 100 # number of successes
p0 = 0.75 # hypothesized proportion

In [18]:
# Sample proportion
p_hat = x / n

In [19]:
# Standard error
se = math.sqrt((p0 * (1 - p0)) / n)

In [20]:
# z-test statistic
z = (p_hat - p0) / se

In [21]:
# p-value (Left-tailed test)
p_value = stats.norm.cdf(abs(z))

In [22]:
# print values
print(f"z-value: {z}")
print(f"p-value: {p_value}")

z-value: -2.3570226039551594
p-value: 0.9907889372729505


In [23]:
# print conclusion
print(f"Since the p-value ({p_value}) is greater than 0.05, we fail to reject the null hypothesis. There is no significant evidence to refute the school's claim that at least 75% of its students pass a standardized exam.")

Since the p-value (0.9907889372729505) is greater than 0.05, we fail to reject the null hypothesis. There is no significant evidence to refute the school's claim that at least 75% of its students pass a standardized exam.


### Question 4: Two-Proportion z-test (Right-tailed Test)

**Scenario**: A company is comparing the promotion rates between male and female
employees. The company claims that males are promoted at a higher rate than
females. Out of 80 male employees, 45 have been promoted, and out of 70 female
employees, 35 have been promoted. Test if males are promoted at a higher rate
than females at the 5% significance level.

Null Hypothesis (H₀): p₁ = p₂ (The promotion rates of male and female employees are equal.)

Alternative Hypothesis (H₁): p₁ ≠ p₂ (The promotion rate of male employees is greater than that of female employees.)

In [24]:
# Parameters
n1 = 80 # sample size group 1
x1 = 45 # successes group 1
n2 = 70 # sample size group 2
x2 = 35 # successes group 2

In [25]:
# Sample proportions
p1 = x1 / n1
p2 = x2 / n2

In [26]:
# Pooled proportion
p_pooled = (x1 + x2) / (n1 + n2)

In [27]:
# Standard error
se = math.sqrt(p_pooled * (1 - p_pooled) * ((1 / n1) + (1 / n2)))

In [28]:
# z-test statistic
z = (p1 - p2) / se

In [29]:
# p-value (Right-tailed test)
p_value = 1 - stats.norm.cdf(abs(z))

In [30]:
# print values
print(f"z-value: {z}")
print(f"p-value: {p_value}")

z-value: 0.7654655446197431
p-value: 0.2219971880115048


In [31]:
# print conclusion
print(f"Since the p-value ({p_value}) is greater than 0.05, we fail to reject the null hypothesis. There is no significant evidence to refute the company's claim that that males are promoted at a higher rate than females.")

Since the p-value (0.2219971880115048) is greater than 0.05, we fail to reject the null hypothesis. There is no significant evidence to refute the company's claim that that males are promoted at a higher rate than females.


### Question 5: One-Proportion z-test (Two-tailed Test)

Scenario: A car dealership claims that 40% of their sales come from repeat
customers. You sample 100 sales records and find that 30 of them are from repeat customers. Test whether the dealership's claim is accurate at a 5% significance
level.

Null Hypothesis (H₀): p₁ = p₂ (The proportion of sales from repeat customers is equal to 40%.)

Alternative Hypothesis (H₁): p₁ ≠ p₂ ( The proportion of sales from repeat customers is not equal to 40%.)

In [32]:
# Parameters
n = 100 # sample size
x = 30 # number of successes
p0 = 0.40 # hypothesized proportion

In [33]:
# Sample proportion
p_hat = x / n

In [34]:
# Standard error
se = math.sqrt((p0 * (1 - p0)) / n)

In [35]:
# z-test statistic
z = (p_hat - p0) / se

In [36]:
# p-value (two-tailed test)
p_value = 2 * (1 - stats.norm.cdf(abs(z)))

In [37]:
# print values
print(f"z-value: {z}")
print(f"p-value: {p_value}")

z-value: -2.041241452319316
p-value: 0.04122683333716348


In [39]:
# print conclusion
print(f"Since the p-value ({p_value}) is less than 0.05, we reject the null hypothesis. There is sufficient evidence to conclude the car dealership's claim that 40% of their sales come from repeat customers.")

Since the p-value (0.04122683333716348) is less than 0.05, we reject the null hypothesis. There is sufficient evidence to conclude the car dealership's claim that 40% of their sales come from repeat customers.
