# Sample Size Calculation to Reliably Measure Sampled Omission for Remote Sensing

## Introduction

In the past, we have struggled to interpret the variability of good and bad results when measuring model performance on different datasets. It is my understanding that the main reason for this variability is an insufficient sample size used for validation, especially since our model has reached a higher level of accuracy and a lower level of omission. For this reason, I believe that we must use appropriate sample statistics to evaluate the minimal amount of samples needed to reach reliable conclusions about our model performance.

In this Jupyter Notebook, we aim to evaluate the performance of a classification model by simulating omission rates and calculating the required sample size to achieve a specified confidence level and margin of error. We will also assess the Type I and Type II error rates associated with our hypothesis testing.

## Key Components

### Sample Size Calculation

The function `sample_size_for_omission` calculates the required sample size to estimate the omission rate with a given confidence level and margin of error. The sample size is derived using the Z-score corresponding to the confidence level and the desired margin of error. The function supports both one-sided and two-sided hypothesis tests.

### Simulation of Omission Rates

The function `simulate_omission` simulates the omission rates for a given true omission rate, target omission rate, and sample size over a specified number of simulations. It performs a hypothesis test for each simulation to determine if the observed omission rate is significantly different from the target omission rate.

### Calculation of Type I and Type II Errors

The function `calc_errors` calculates the Type I and Type II error rates based on the simulation results. Type I error occurs when the null hypothesis is incorrectly rejected, while Type II error occurs when the null hypothesis is not rejected when it should be.

## Concrete Example for Type I and Type II Errors

In the context of this notebook, we are evaluating the omission rates of a classification model. The null hypothesis (H0) is that the true omission rate is less than or equal to the target omission rate. Let's define the variables and the errors more concretely:

- **True Omission Rate (`true_omission`)**: The actual omission rate of the classification model.
- **Target Omission Rate (`target_omission`)**: The hypothesized or desired omission rate that we are testing against.

### Null Hypothesis (H0)

The null hypothesis (H0) is:

$ H0: \text{true\_omission} \leq \text{target\_omission} $

### Type I Error

A Type I error occurs when we reject the null hypothesis when it is actually true. In other words, we conclude that the true omission rate is greater than the target omission rate when it is not.

- **Condition**: Reject H0
- **Reality**: true_omission ≤ target_omission
- **Interpretation**: We incorrectly conclude that the model's omission rate is worse than the target.

### Type II Error

A Type II error occurs when we fail to reject the null hypothesis when it is actually false. In other words, we conclude that the true omission rate is less than or equal to the target omission rate when it is actually greater.

- **Condition**: Do not reject H0
- **Reality**: true_omission > target_omission
- **Interpretation**: We incorrectly conclude that the model's omission rate meets the target when it actually does not.

### Example Scenario

Let's consider an example where:

- **True Omission Rate (`true_omission`)**: 0.06
- **Target Omission Rate (`target_omission`)**: 0.05

#### Type I Error Example

- **Reality**: The true omission rate is 0.04 (which is less than the target omission rate of 0.05).
- **Decision**: We perform the hypothesis test and reject H0.
- **Error**: This is a Type I error because we rejected H0 when it was actually true.

#### Type II Error Example

- **Reality**: The true omission rate is 0.06 (which is greater than the target omission rate of 0.05).
- **Decision**: We perform the hypothesis test and do not reject H0.
- **Error**: This is a Type II error because we failed to reject H0 when it was actually false.

### Summary

- **Type I Error**: Incorrectly concluding that the model's omission rate is worse than the target when it is not.
  - **Condition**: Reject H0
  - **Reality**: true_omission ≤ target_omission

- **Type II Error**: Incorrectly concluding that the model's omission rate meets the target when it does not.
  - **Condition**: Do not reject H0
  - **Reality**: true_omission > target_omission

## Simulation

Finally we will run simulation of the expected errors of the measured omission by varying the true simulated omission around our target for our model performance.

In [None]:
import numpy as np
from statsmodels.stats.proportion import proportions_ztest
from scipy.stats import norm

def sample_size_for_omission(target_omission=0.05, confidence_level=0.95, margin_of_error=0.01, alternative='two-sided'):
    # Z-score for the given confidence level
    if alternative == 'two-sided':
        z_score = norm.ppf((1 + confidence_level) / 2)
    elif alternative == 'one-sided':
        z_score = norm.ppf(confidence_level)
    else:
        raise ValueError("alternative must be 'two-sided' or 'one-sided'")
    
    # Calculate the required sample size
    p = target_omission
    q = 1 - p
    sample_size = (z_score**2 * p * q) / (margin_of_error**2)
    return int(np.ceil(sample_size))

def simulate_omission(true_omission, target_omission, sample_size, num_simulations=4000):
    results = []
    for _ in range(num_simulations):
        # Simulate the classification results
        simulated_results = np.random.binomial(1, true_omission, sample_size)
        # Calculate the measured omission
        measured_omission = np.mean(simulated_results)
        # Perform hypothesis test
        count = np.sum(simulated_results)
        z, p_value = proportions_ztest(count, sample_size, value=target_omission, alternative='larger')
        # Check if we reject the null hypothesis
        # The null hypothesis is that the true_omission is less or equal than the measured one
        reject_null = p_value < 0.05
        # Rejecting the null hypothesis means that there is a statistical significant evidence that the measured omission 
        # is larger than compared omission
        results.append((z, measured_omission, reject_null))
    return results

def calc_errors(true_omission, target_omission, sample_size):
    simulation_results = simulate_omission(true_omission, target_omission, sample_size)

    # Calculate Type I and Type II errors
    # Type i error happens when we reject the null hypothesis that the measured omission is less or equal than target_omission within the margin of error
    type_i_errors = sum(1 for _,_, reject_null in simulation_results if reject_null and true_omission <= target_omission  ) / len(simulation_results)
    # Type ii error happens when we fail to reject the null hypothesis when it is actually true
    type_ii_errors = sum(1 for _,_, reject_null in simulation_results if not reject_null and true_omission > target_omission ) / len(simulation_results)
    # that means we should have rejected the null, the true 
    return (true_omission, type_i_errors,type_ii_errors)


## Example usage

The sample size is calculated to minimize errors within a distance from the target omission. Since this is one-sided, we will minimize the Type II errors for omission up to 0.06. Beyond that point, we should expect Type II errors up to 5%.

In [72]:

target_omission = 0.05
margin_of_error= 0.005 
sample_size = int(sample_size_for_omission(target_omission, margin_of_error= margin_of_error, alternative='one-sided'))

print(sample_size)

5141


### Simulation Description

In this section, we will simulate the omission rates for a range of true omission values around the target omission rate. We will calculate the Type I and Type II error rates for each true omission value to understand how the error rates vary with different true omission rates.

1. **True Omission Rates**:
   We generate a range of true omission rates (`true_omissions`) around the target omission rate. Specifically, we vary the true omission rate from one margin below the target omission rate to three margins above it. This range is divided into 20 equally spaced values.

2. **Error Calculation**:
   For each true omission rate in the range, we calculate the Type I and Type II error rates using the `calc_errors` function. This function simulates the omission rates and performs hypothesis testing to determine the error rates.

3. **Data Storage**:
   The results of the error calculations are stored in a Pandas DataFrame (`errors_df`). The DataFrame contains columns for the true omission rate, Type I error rate, and Type II error rate.

4. **Analysis**:
   By examining the DataFrame, we can analyze how the Type I and Type II error rates change as the true omission rate varies around the target omission rate. This helps us understand the robustness of our sample size calculation and the reliability of our hypothesis testing.

Here is the code to perform the simulation and store the results:



In [73]:
import pandas as pd

# Generate a range of true omission rates around the target omission rate
true_omissions = np.linspace(target_omission - 1 * margin_of_error, target_omission + 3 * margin_of_error, 20)

# Calculate Type I and Type II errors for each true omission rate
errors_df = pd.DataFrame([calc_errors(to, target_omission, sample_size) for to in true_omissions], columns=['True Omission', 'Type I Error', 'Type II Error'])

# Display the DataFrame
errors_df

Unnamed: 0,True Omission,Type I Error,Type II Error
0,0.045,0.00025,0.0
1,0.046053,0.00125,0.0
2,0.047105,0.00425,0.0
3,0.048158,0.013,0.0
4,0.049211,0.027,0.0
5,0.050263,0.0,0.9445
6,0.051316,0.0,0.88425
7,0.052368,0.0,0.8075
8,0.053421,0.0,0.7105
9,0.054474,0.0,0.56925


### DataFrame Output Analysis

The DataFrame output shows the calculated Type I and Type II error rates for a range of true omission rates around the target omission rate. Let's analyze the results to see if they are consistent with the input values and expected behavior.

#### True Omission Rates
The true omission rates range from 0.045 (one margin below the target omission rate of 0.05) to 0.065 (three margins above the target omission rate), divided into 20 equally spaced values.

#### Type I Error Rates
- **Expected Behavior**: Type I errors occur when we incorrectly reject the null hypothesis (H0: true_omission ≤ target_omission) when it is actually true.
- **Observed Behavior**: The Type I error rate is non-zero only when the true omission rate is less than or very close to the target omission rate. As the true omission rate increases beyond the target, the Type I error rate drops to zero, which is consistent with the definition of Type I errors.

#### Type II Error Rates
- **Expected Behavior**: Type II errors occur when we fail to reject the null hypothesis when it is actually false (true_omission > target_omission).
- **Observed Behavior**: The Type II error rate is very high when the true omission rate is slightly above the target omission rate (e.g., 0.050263). As the true omission rate increases further, the Type II error rate gradually decreases, which is consistent with the definition of Type II errors.

### Consistency with Input Values

- **True Omission ≤ Target Omission**: For true omission rates less than or equal to the target omission rate (0.05), the Type I error rate is non-zero, and the Type II error rate is zero. This is expected because we are more likely to incorrectly reject the null hypothesis in this range.
- **True Omission > Target Omission**: For true omission rates greater than the target omission rate, the Type I error rate drops to zero, and the Type II error rate is initially high but decreases as the true omission rate increases. This is expected because we are more likely to fail to reject the null hypothesis when the true omission rate is close to the target but less likely as the true omission rate becomes significantly higher.
- **Additional Consistency with Sample Size and Margin of Error Chosen Sample Size**: The sample size was calculated to minimize errors within a distance from the target omission rate. This ensures that the confidence interval is designed to capture the true omission rate with the specified margin of error.
- **Margin of Error**: The margin of error was set to 0.005 for a one-sided test, resulting in a total width of 0.01. This means that the confidence interval is designed to capture the true omission rate within +0.005 of the target omission rate.
- **Type II Error Rate**: The Type II error rate drops below 5% after the true omission rate exceeds the target omission rate by more than 2 times the margin of error (i.e., 0.05 + 2 * 0.005 = 0.06). This is consistent with the width of the confidence interval designed to minimize Type II errors for omission rates up to 0.06. Beyond this point, the Type II error rate is expected to be low, as observed in the results.
### Conclusion

The results in the DataFrame are consistent with the input values and the expected behavior of Type I and Type II errors. The Type I error rate is non-zero when the true omission rate is less than or very close to the target omission rate, and the Type II error rate is high when the true omission rate is slightly above the target omission rate but decreases as the true omission rate increases further. This analysis confirms that the sample size calculation and error rate simulations are functioning as intended.