# One-Sample _z_ Test for a Proportion (Example)

## Question

Some boxes of a certain brand of breakfast cereal include a voucher for a free video rental inside the box. The company that makes the cereal claims that a voucher can be found in 20 percent of the boxes. However based, on their experiences eating this cereal at home, a group of students believe that the true proportion of boxes with vouchers is less than 20%. This group of students purchased 65 boxes of the cereal to investigate the company's claim. The students found a total of 11 vouchers for free video rentals in the 65 purchased boxes.

Suppose it is reasonable to assume that the 65 boxes purchased by the students are a random sample of all the boxes of this cereal. Based on this sample, is there support for the students' belief that the proportion of boxes with vouchers is truly less than 20%? Provide statistical evidence to support your answer, using a significance level of 5% in your significance test.

## Answer

We want to perform a significance test, at the $α = 0.05$ significance level, of:

$$ H_o: p = 0.2 $$
$$ H_a: p < 0.2 $$

_where_ $p$ _represents the true proportion of cereal boxes from the population that have a voucher for a free video rental_

Before proceeding with the test, we will evaluate the conditions for inference on proportions:

1. **Random**: As stated in the problem, we are assuming that the 65 cereal boxes bought by the students are a random sample of all boxes.
2. **Normal**: The sampling distribution of the sample proportion $\hat{p}$ can be treated as approximately normal because, assuming the null hypothesis is true, the expected number of successes and failures from the sample each are greater than 10.

$$ np_0 \geq 10 \Longrightarrow 65(0.2) = 13 \geq 10 $$

$$ n(1 - p_0) \geq 10 \Longrightarrow 65(1 - 0.2) = 52 \geq 10 $$

3. **Independence**: Since the students are sampling cereal boxes without replacement, the sample size should be smaller than **10%** of the population. In this case, we will assume independence because there are likely more than 650 cereal boxes made by the company.

$$ n \leq 0.1(N) $$

$$ N > 650 \longrightarrow 65 \leq 0.1(N) $$

Since all the conditions for inference have been met, we will proceed to conduct a one-sample _z_ test. We will begin by extracting and noting down all the data from the question and calculating the sample statistics.

Raw Data:

$$ n = 65 $$

$$ \text{successes} = 11 $$

$$ \text{failures} = 65 - 11 = 54 $$

Proportions:

$$ p_0 = 0.2 $$

$$ \hat{p} = \frac{11}{65} \approx 0.169 $$

Standardized Test Statistic $z$:

$$ z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1 - p_0)}{n}}} $$

$$ = \frac{0.169 - 0.2}{\sqrt{\frac{0.2(1 - 0.2)}{65}}} $$

$$ = -0.624824975493137598 $$

$$ \approx -0.625 $$

p-val:

$$ \text{P-Value} = P(z \leq -0.625 \, | \, H_0 \text{ is true}) \approx 0.266 $$

In [17]:
from statsmodels.stats.proportion import proportions_ztest

# Number of observations
n = 65

# Number of successes
count = 11

# Proportion under the null hypothesis (p_0)
p = 0.2

# Proportion successful (\hat{p})
phat = count / n

# Perform a one-sample Z-test for a proportion
z_stat, p_value = proportions_ztest(count, n, value=p, alternative='smaller', prop_var=False)

print(f'Proportion, π = {phat:.1%}; Z-statistic = {z_stat:.3f}; p = {p_value:.3f}')

Proportion, π = 16.9%; Z-statistic = -0.662; p = 0.254


Since the P-Value is greater than the significance level of this test $(0.266 \geq 0.05)$, we fail to reject the null hypothesis because there is not statistically significant evidence to conclude that the true proportion of cereal boxes that contain a voucher for a free video rental is lesser than 20%.

In [5]:
import numpy as np
from scipy.stats import norm

In [7]:
# Significance level for the test
alpha = 0.05

# Known Data
n = 65
successes = 11
failures = 54

# Calculations

p_parameter = 0.2
p_statistic = successes / n

std_dev = np.sqrt((p_parameter * (1 - p_parameter)) / n)

Now, we can calculate the standardized test statistic 'z':

In [8]:
# Standardized Test Statistic (z statistic)
z = (p_statistic - p_parameter) / std_dev

Once we know the z-statistic, we can compute the 'P-value':

In [9]:
# Calculate the P-Value (one-tailed)
p_value = norm.sf(abs(z))

Finally, we can compare the 'P-value' with our significance level and conclude the significance test:

In [10]:
# Function to conclude the one-sample z test
def conclude_ztest1(p_val, significance_level):
    """
    Checks if the P-value indicates a significant difference based on the given significance level.

    Args:
        p_val (float): The p-value calculated from the significance test.
        significance_level (float): The significance level for the same significance test.

    Returns:
        A message indicating if there is statistical evidence to reject the test's null hypothesis.
    """
    if p_val < significance_level:
        print("".join(["Since the P-value is smaller than the significance level (", 
                       str(p_val), " < ", str(significance_level), "), ", 
                       "we reject the null hypothesis because there is significant evidence to suggest that ", 
                       "the true proportion of cereal boxes that contain the voucher is fewer than 20%."]))
    else:
        print("".join(["Since the P-value is not smaller than the significance level (", 
                       str(p_val), " >= ", str(significance_level), "), ", 
                       "we fail to reject the null hypothesis because there is no significant evidence to suggest that ", 
                       "the true proportion of cereal boxes that contain the voucher is fewer than 20%."]))

# Conclusion
conclude_ztest1(p_value, alpha)

Since the P-value is not smaller than the significance level (0.2675717261988753 >= 0.05), we fail to reject the null hypothesis because there is no significant evidence to suggest that the true proportion of cereal boxes that contain the voucher is fewer than 20%.


### Conclusion

Since the P-value was greater than the significance level of $α = 0.05$, we fail to reject the null hypothesis because there is no significant evidence to conclude that the true proportion of cereal boxes that contain the voucher is fewer than 20%.

---