# Comparing Confidence Interval Methods in ExactCIs

This notebook demonstrates how to compare the different confidence interval methods provided by ExactCIs with other implementations like SciPy and R's exact2x2 package.

## Setup

First, let's import the necessary packages:

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from exactcis import compute_all_cis
from exactcis.methods import (
    exact_ci_conditional,
    exact_ci_midp,
    exact_ci_blaker,
    exact_ci_unconditional,
    exact_ci_wald_haldane
)

# Set style for plots
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("colorblind")

## Define Test Cases

We'll create a variety of test cases to compare the methods:

In [None]:
# Define test cases with different characteristics
test_cases = [
    {"name": "Standard case", "a": 12, "b": 5, "c": 8, "d": 10},
    {"name": "Small counts", "a": 3, "b": 1, "c": 2, "d": 2},
    {"name": "Large sample", "a": 50, "b": 20, "c": 30, "d": 40},
    {"name": "Zero in one cell", "a": 0, "b": 5, "c": 8, "d": 10},
    {"name": "Minimal counts", "a": 1, "b": 1, "c": 1, "d": 1},
    {"name": "Large imbalance", "a": 50, "b": 5, "c": 2, "d": 20},
    {"name": "COVID-19 example", "a": 8, "b": 9992, "c": 86, "d": 9914}
]

## ExactCIs Implementation

Calculate confidence intervals using all ExactCIs methods:

In [None]:
# Function to safely calculate odds ratio
def calculate_odds_ratio(a, b, c, d):
    if b * c == 0:  # Avoid division by zero
        return np.nan
    return (a * d) / (b * c)

# Calculate confidence intervals for all test cases
results = []

for case in test_cases:
    a, b, c, d = case["a"], case["b"], case["c"], case["d"]
    odds_ratio = calculate_odds_ratio(a, b, c, d)
    
    # Calculate CIs using ExactCIs
    try:
        exactcis_results = compute_all_cis(a, b, c, d)
    except Exception as e:
        print(f"Error for case {case['name']}: {e}")
        exactcis_results = {}
    
    # Store results
    for method, ci in exactcis_results.items():
        if ci is not None:
            lower, upper = ci
            results.append({
                "Case": case["name"],
                "a": a, "b": b, "c": c, "d": d,
                "Odds Ratio": odds_ratio,
                "Method": method,
                "Implementation": "ExactCIs",
                "Lower": lower,
                "Upper": upper,
                "Width": upper - lower
            })
        else:
            results.append({
                "Case": case["name"],
                "a": a, "b": b, "c": c, "d": d,
                "Odds Ratio": odds_ratio,
                "Method": method,
                "Implementation": "ExactCIs",
                "Lower": np.nan,
                "Upper": np.nan,
                "Width": np.nan
            })

## SciPy Implementation

Now let's calculate confidence intervals using SciPy's methods for comparison:

In [None]:
# SciPy implementation of Wald-Haldane method
def scipy_wald_haldane(a, b, c, d, alpha=0.05):
    # Add 0.5 to each cell (Haldane adjustment)
    a_adj, b_adj, c_adj, d_adj = a + 0.5, b + 0.5, c + 0.5, d + 0.5
    
    # Calculate log odds ratio
    log_or = np.log((a_adj * d_adj) / (b_adj * c_adj))
    
    # Calculate standard error
    se = np.sqrt(1/a_adj + 1/b_adj + 1/c_adj + 1/d_adj)
    
    # Calculate confidence intervals
    z = stats.norm.ppf(1 - alpha/2)
    lower = np.exp(log_or - z * se)
    upper = np.exp(log_or + z * se)
    
    return lower, upper

# Calculate SciPy confidence intervals
for case in test_cases:
    a, b, c, d = case["a"], case["b"], case["c"], case["d"]
    odds_ratio = calculate_odds_ratio(a, b, c, d)
    
    # SciPy Wald-Haldane method
    try:
        lower, upper = scipy_wald_haldane(a, b, c, d)
        results.append({
            "Case": case["name"],
            "a": a, "b": b, "c": c, "d": d,
            "Odds Ratio": odds_ratio,
            "Method": "Wald-Haldane",
            "Implementation": "SciPy",
            "Lower": lower,
            "Upper": upper,
            "Width": upper - lower
        })
    except Exception as e:
        print(f"SciPy Wald-Haldane error for case {case['name']}: {e}")
        
    # SciPy Fisher's exact test (point estimate only, no CI in SciPy)
    try:
        # We're just using the odds ratio from the table, since SciPy doesn't provide CIs
        results.append({
            "Case": case["name"],
            "a": a, "b": b, "c": c, "d": d,
            "Odds Ratio": odds_ratio,
            "Method": "Fisher",
            "Implementation": "SciPy",
            "Lower": np.nan,  # SciPy doesn't provide CIs
            "Upper": np.nan,
            "Width": np.nan
        })
    except Exception as e:
        print(f"SciPy Fisher error for case {case['name']}: {e}")

## Create DataFrame with Results

Now let's organize the results into a DataFrame for easier analysis:

In [None]:
df_results = pd.DataFrame(results)
df_results.head()

## Visualize the Results

Let's create some visualizations to compare the methods:

In [None]:
# For each test case, plot the confidence intervals by method
test_cases_subset = ["Standard case", "Small counts", "Zero in one cell", "Large imbalance"]

for case_name in test_cases_subset:
    case_data = df_results[df_results["Case"] == case_name].copy()
    
    # Get odds ratio for this case
    odds_ratio = case_data["Odds Ratio"].iloc[0]
    a, b, c, d = case_data["a"].iloc[0], case_data["b"].iloc[0], case_data["c"].iloc[0], case_data["d"].iloc[0]
    
    # Remove rows with NaN values
    case_data = case_data.dropna(subset=["Lower", "Upper"])
    
    # Create a column for method and implementation combined
    case_data["Method_Impl"] = case_data["Method"] + " (" + case_data["Implementation"] + ")"
    
    # Plot
    plt.figure(figsize=(12, 6))
    
    # Sort by CI width
    case_data = case_data.sort_values(by="Width")
    
    # Plot CIs
    y_pos = np.arange(len(case_data))
    
    for i, (idx, row) in enumerate(case_data.iterrows()):
        plt.plot([row["Lower"], row["Upper"]], [i, i], 
                 marker="o", markersize=8, linewidth=2,
                 label=row["Method_Impl"] if i == 0 else "_nolegend_")
    
    # Add odds ratio line
    if not np.isnan(odds_ratio):
        plt.axvline(x=odds_ratio, color="red", linestyle="--", label="Odds Ratio")
    
    plt.yticks(y_pos, case_data["Method_Impl"])
    plt.xscale("log")
    plt.xlabel("Odds Ratio (log scale)")
    plt.title(f"Confidence Intervals for {case_name} (a={a}, b={b}, c={c}, d={d})")
    plt.grid(axis="x", alpha=0.3)
    
    # Add a legend for the odds ratio line
    if not np.isnan(odds_ratio):
        plt.legend()
        
    plt.tight_layout()
    plt.show()

## Compare CI Widths Across Methods

Now let's compare the widths of the confidence intervals across all methods:

In [None]:
# Filter out NaN widths
width_data = df_results.dropna(subset=["Width"])

# Create a box plot of widths by method
plt.figure(figsize=(12, 6))
sns.boxplot(x="Method", y="Width", hue="Implementation", data=width_data)
plt.title("Confidence Interval Widths by Method and Implementation")
plt.yscale("log")
plt.ylabel("CI Width (log scale)")
plt.xticks(rotation=45)
plt.grid(axis="y", alpha=0.3)
plt.tight_layout()
plt.show()

## Compare Methods Across Different Odds Ratios

Let's see how the methods perform for different true odds ratios:

In [None]:
# Plot Lower Bounds vs. Odds Ratio
plt.figure(figsize=(12, 6))
for method in df_results["Method"].unique():
    for impl in df_results["Implementation"].unique():
        subset = df_results[(df_results["Method"] == method) & 
                           (df_results["Implementation"] == impl)].dropna(subset=["Lower"])
        if len(subset) > 0:
            plt.scatter(subset["Odds Ratio"], subset["Lower"], 
                       label=f"{method} ({impl})", alpha=0.7, s=80)

plt.xscale("log")
plt.yscale("log")
plt.xlabel("True Odds Ratio (log scale)")
plt.ylabel("Lower Bound (log scale)")
plt.title("Lower Bounds vs. True Odds Ratio by Method")
plt.grid(True, alpha=0.3)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

# Plot Upper Bounds vs. Odds Ratio
plt.figure(figsize=(12, 6))
for method in df_results["Method"].unique():
    for impl in df_results["Implementation"].unique():
        subset = df_results[(df_results["Method"] == method) & 
                           (df_results["Implementation"] == impl)].dropna(subset=["Upper"])
        if len(subset) > 0:
            plt.scatter(subset["Odds Ratio"], subset["Upper"], 
                       label=f"{method} ({impl})", alpha=0.7, s=80)

plt.xscale("log")
plt.yscale("log")
plt.xlabel("True Odds Ratio (log scale)")
plt.ylabel("Upper Bound (log scale)")
plt.title("Upper Bounds vs. True Odds Ratio by Method")
plt.grid(True, alpha=0.3)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

## Differences from Wald-Haldane Method

Let's examine how each method differs from the Wald-Haldane method (which serves as a normal approximation baseline):

In [None]:
# Calculate percentage differences from Wald-Haldane
diff_data = []

for case_name in df_results["Case"].unique():
    case_data = df_results[df_results["Case"] == case_name]
    
    # Get the Wald-Haldane results for this case
    wald_exactcis = case_data[(case_data["Method"] == "wald_haldane") & 
                              (case_data["Implementation"] == "ExactCIs")]
    
    if len(wald_exactcis) == 0 or pd.isna(wald_exactcis["Lower"].iloc[0]) or pd.isna(wald_exactcis["Upper"].iloc[0]):
        continue
        
    wald_lower = wald_exactcis["Lower"].iloc[0]
    wald_upper = wald_exactcis["Upper"].iloc[0]
    
    for _, row in case_data.iterrows():
        if row["Method"] != "wald_haldane" and not pd.isna(row["Lower"]) and not pd.isna(row["Upper"]):
            # Calculate percentage differences
            pct_diff_lower = ((row["Lower"] - wald_lower) / wald_lower) * 100
            pct_diff_upper = ((row["Upper"] - wald_upper) / wald_upper) * 100
            
            diff_data.append({
                "Case": case_name,
                "Odds Ratio": row["Odds Ratio"],
                "Method": row["Method"],
                "Implementation": row["Implementation"],
                "Lower Diff %": pct_diff_lower,
                "Upper Diff %": pct_diff_upper
            })

df_diff = pd.DataFrame(diff_data)

# Plot differences
plt.figure(figsize=(12, 6))
for method in df_diff["Method"].unique():
    method_data = df_diff[df_diff["Method"] == method]
    plt.scatter(method_data["Odds Ratio"], method_data["Lower Diff %"], 
               label=f"{method} Lower", alpha=0.7, marker="o")
    plt.scatter(method_data["Odds Ratio"], method_data["Upper Diff %"], 
               label=f"{method} Upper", alpha=0.7, marker="s")

plt.axhline(y=0, color="black", linestyle="--")
plt.xscale("log")
plt.xlabel("True Odds Ratio (log scale)")
plt.ylabel("% Difference from Wald-Haldane")
plt.title("Percentage Difference from Wald-Haldane Method")
plt.grid(True, alpha=0.3)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

## Correlation Between CI Width and Sample Size

Let's examine how CI width correlates with sample size:

In [None]:
# Add total sample size to the results
df_results["Sample Size"] = df_results["a"] + df_results["b"] + df_results["c"] + df_results["d"]

# Plot CI width vs. sample size for each method
plt.figure(figsize=(12, 6))

for method in df_results["Method"].unique():
    method_data = df_results[df_results["Method"] == method].dropna(subset=["Width"])
    if len(method_data) > 0:
        plt.scatter(method_data["Sample Size"], method_data["Width"], 
                   label=method, alpha=0.7, s=80)

plt.xscale("log")
plt.yscale("log")
plt.xlabel("Total Sample Size (log scale)")
plt.ylabel("CI Width (log scale)")
plt.title("CI Width vs. Sample Size by Method")
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()

## Summary and Conclusion

Based on the analysis above, here are some observations about the different confidence interval methods:

1. **Method Differences**:
   - The Conditional (Fisher's exact) method tends to produce wider intervals than other methods
   - MidP adjusted intervals are generally narrower than Conditional
   - Blaker's method provides a good compromise between interval width and coverage
   - Unconditional method performs well for moderate and large sample sizes
   - Wald-Haldane is a good approximation for large sample sizes

2. **Implementation Differences**:
   - ExactCIs and SciPy implementations of Wald-Haldane show good agreement
   - Implementation differences are more pronounced for edge cases (zeros, small counts)

3. **Edge Cases**:
   - Tables with zeros pose challenges for several methods
   - Small counts lead to wider confidence intervals across all methods
   - Large imbalances between cells may lead to computational issues with some methods

4. **Sample Size Effects**:
   - CI width decreases with increasing sample size for all methods
   - Differences between methods become less pronounced with larger sample sizes

These observations can guide users in selecting the most appropriate method for their specific data characteristics and requirements.