# Conducting a Two-Sample _z_ Test for Proportions

## Question

Researchers suspect that _myopia_, or nearsightedness, is becoming more common over time. A study from the year 2000 showed 132 cases of this condition in 400 randomly selected people. A separate study from 2015 showed 228 cases in 600 randomly selected people. Based on these samples, is there support for the researchers' belief that _myopia_ is becoming more common amongst people? Provide statistical evidence to support your answer, using a significance level of 5% in your significance test.

## Answer

We want to perform a significance test, at the $α = 0.05$ significance level, of:

$$ H_o: p_{2015} - p_{2000} = 0 $$
$$ H_a: p_{2015} - p_{2000} > 0 $$

$p_{2000}:$ _true proportion of myopia cases in 2000_

$p_{2015}:$ _true proportion of myopia cases in 2015_

#### Conditions for inference on proportions:

1. **Random**: As stated in the problem, the samples from 2000 and 2015 were randomly selected.
2. **Normal**: Assuming the null hypothesis is true, the expected number of successes and failures from each sample are greater than 10.

$$
\begin{aligned}
& n_1 \hat{p}_1 \geq 10 && \quad & n_1 (1 - \hat{p}_1) \geq 10 \\
\end{aligned}
$$

$$
\begin{aligned}
& n_2 \hat{p}_2 \geq 10 && \quad & n_2 (1 - \hat{p}_2) \geq 10 \\
\end{aligned}
$$

3. **Independence**: When sampling without replacement, both sample sizes should be smaller than **10%** of their respective populations. In this case, we will assume independence because there were obviously more than 400 and 600 people in the world in the years 200 and 2015 respectively.

$$ n_1 ≤ 0.1(N_1) $$
$$ n_2 ≤ 0.1(N_2) $$

Since all the conditions for inference have been met, we will proceed to conduct a two-sample _z_ test. We will begin by extracting and noting down all the data from the question and calculating the sample statistics.

In [1]:
import numpy as np
from scipy.stats import norm

In [2]:
# Significance level for the test
alpha = 0.05

# Known Data

n_2000 = 400
successes_2000 = 132
failures_2000 = n_2000 - successes_2000

n_2015 = 600
successes_2015 = 228
failures_2015 = n_2015 - successes_2015

# Calculations

p_hat_2000 = successes_2000 / n_2000
p_hat_2015 = successes_2015 / n_2015
p_combined = (successes_2000 + successes_2015) / (n_2000 + n_2015)

parameter = 0
statistic = (p_hat_2015 - p_hat_2000)
std_dev = np.sqrt(p_combined * (1 - p_combined) * ((1 / 600) + (1 / 400)))

Now, we can calculate the standardized test statistic 'z':

In [3]:
# Standardized Test Statistic (z statistic)
z = (statistic - parameter) / std_dev

Once we know the z-statistic, we can compute the 'P-value':

In [4]:
# P-Value (one-tailed)
p_value = norm.sf(abs(z))

Finally, we can compare the 'P-value' with our significance level and conclude the significance test:

In [5]:
# Function to conclude the one-sample z test
def conclude_ztest2(p_val, significance_level):
    """
    Checks if the P-value indicates a significant difference based on the given significance level.

    Args:
        p_val (float): The p-value calculated from the significance test.
        significance_level (float): The significance level for the same significance test.

    Returns:
        A message indicating if there is statistical evidence to reject the test's null hypothesis.
    """
    if p_val < significance_level:
        print("".join(["Since the P-value is smaller than the significance level (", 
                       str(p_val), " < ", str(significance_level), "), ", 
                       "we reject the null hypothesis because there is significant evidence to suggest that ", 
                       "myopia is becoming more common amongst people."]))
    else:
        print("".join(["Since the P-value is not smaller than the significance level (", 
                       str(p_val), " >= ", str(significance_level), "), ", 
                       "we fail to reject the null hypothesis because there is no significant evidence to suggest that ", 
                       "myopia is becoming more common amongst people."]))

# Conclusion
conclude_ztest2(p_value, alpha)

Since the P-value is not smaller than the significance level (0.05329158478744383 >= 0.05), we fail to reject the null hypothesis because there is no significant evidence to suggest that myopia is becoming more common amongst people.


### Conclusion

Since the P-value was greater than the significance level of $α = 0.05$, we fail to reject the null hypothesis because there is no significant evidence to conclude that myopia is becoming more common amongst people.

---