# Handling Edge Cases with ExactCIs

This notebook demonstrates how to handle various edge cases that may arise when calculating confidence intervals for 2×2 contingency tables, including:
1. Tables with zero cells
2. Sparse tables (with small counts)
3. Large tables
4. Tables with extreme imbalance

In [1]:
import sys
import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Import ExactCIs - adjust the path if needed
from exactcis import compute_all_cis
from exactcis.methods import (
    exact_ci_conditional,
    exact_ci_midp,
    exact_ci_blaker,
    exact_ci_unconditional,
    ci_wald_haldane
)

## 1. Tables with Zero Cells

Zero cells can create challenges for odds ratio calculations, as the odds ratio becomes undefined or infinite. There are several approaches to handle this.

In [2]:
# Example table with a zero cell
# 2×2 table:   Cases   Controls
#   Exposed      10        0
#   Unexposed     5       10

a, b, c, d = 10, 0, 5, 10

print("Table with one zero cell:\n")
print(f"     | Cases | Controls")
print(f"-----|-------|----------")
print(f"Exp. |  {a:3d}  |    {b:3d}")
print(f"Unex.|  {c:3d}  |    {d:3d}\n")

In [3]:
# Attempt to use methods that can handle zero cells
methods_results = {}

# Safe methods to try
try:
    methods_results["conditional"] = exact_ci_conditional(a, b, c, d)
except Exception as e:
    methods_results["conditional"] = (f"Error: {str(e)}", None)
    
try:
    methods_results["midp"] = exact_ci_midp(a, b, c, d)
except Exception as e:
    methods_results["midp"] = (f"Error: {str(e)}", None)
    
try:
    methods_results["blaker"] = exact_ci_blaker(a, b, c, d)
except Exception as e:
    methods_results["blaker"] = (f"Error: {str(e)}", None)
    
# Wald-Haldane adds 0.5 to each cell, so it should work
try:
    methods_results["wald_haldane"] = ci_wald_haldane(a, b, c, d)
except Exception as e:
    methods_results["wald_haldane"] = (f"Error: {str(e)}", None)
    
# Display results in a table
print("Results for table with one zero cell:\n")
for method, result in methods_results.items():
    if isinstance(result[0], str):
        print(f"{method:12s}: {result[0]}")
    else:
        lower, upper = result
        print(f"{method:12s}: ({lower:.3f}, {upper:.3f})")

### Solution 1: Use Haldane's Correction

The Haldane correction adds 0.5 to each cell, which allows calculation of odds ratios and confidence intervals even with zero cells.

In [4]:
# Apply Haldane's correction manually (add 0.5 to each cell)
a_h, b_h, c_h, d_h = a + 0.5, b + 0.5, c + 0.5, d + 0.5

print(f"Original table: {a}, {b}, {c}, {d}")
print(f"With Haldane correction: {a_h}, {b_h}, {c_h}, {d_h}\n")

# Try methods with corrected counts
haldane_results = {}

try:
    haldane_results["conditional"] = exact_ci_conditional(a_h, b_h, c_h, d_h)
except Exception as e:
    haldane_results["conditional"] = (f"Error: {str(e)}", None)
    
try:
    haldane_results["midp"] = exact_ci_midp(a_h, b_h, c_h, d_h)
except Exception as e:
    haldane_results["midp"] = (f"Error: {str(e)}", None)
    
# Display results with Haldane correction
print("Results with Haldane correction applied:\n")
for method, result in haldane_results.items():
    if isinstance(result[0], str):
        print(f"{method:12s}: {result[0]}")
    else:
        lower, upper = result
        print(f"{method:12s}: ({lower:.3f}, {upper:.3f})")

### Solution 2: Use Methods That Handle Zeros Naturally

Some methods can automatically handle zero cells by using special algorithms or limits.

In [5]:
# The ci_wald_haldane method automatically applies the correction
wald_result = ci_wald_haldane(a, b, c, d)
print(f"Wald-Haldane method: ({wald_result[0]:.3f}, {wald_result[1]:.3f})")

# Unconditional method with profile likelihood can handle zeros
try:
    uncond_result = exact_ci_unconditional(a, b, c, d, use_profile=True)
    print(f"Unconditional with profile: ({uncond_result[0]:.3f}, {uncond_result[1]:.3f})")
except Exception as e:
    print(f"Unconditional error: {str(e)}")

## 2. Sparse Tables (Small Counts)

Sparse tables with small counts in some cells can lead to wide confidence intervals.

In [6]:
# Example sparse table
# 2×2 table:   Cases   Controls
#   Exposed       2        1
#   Unexposed     1        3

a, b, c, d = 2, 1, 1, 3

print("Sparse table with small counts:\n")
print(f"     | Cases | Controls")
print(f"-----|-------|----------")
print(f"Exp. |  {a:3d}  |    {b:3d}")
print(f"Unex.|  {c:3d}  |    {d:3d}\n")

# Compare all methods for the sparse table
try:
    results = compute_all_cis(a, b, c, d)
    
    # Display results in a formatted table
    print("Method        Lower   Upper   Width")
    print("-" * 40)
    for method, (lower, upper) in results.items():
        width = upper - lower
        print(f"{method:12s} {lower:.3f}   {upper:.3f}   {width:.3f}")
except Exception as e:
    print(f"Error: {str(e)}")

## 3. Tables with Extreme Imbalance

Tables with extreme imbalance, where one group has many more events than the other, can challenge some methods.

In [7]:
# Example imbalanced table
# 2×2 table:   Cases   Controls
#   Exposed     100        2
#   Unexposed     3       80

a, b, c, d = 100, 2, 3, 80

print("Imbalanced table:\n")
print(f"     | Cases | Controls")
print(f"-----|-------|----------")
print(f"Exp. |  {a:3d}  |    {b:3d}")
print(f"Unex.|  {c:3d}  |    {d:3d}\n")

# Try all methods except unconditional (which might be slow for this table)
methods_to_try = [
    ("conditional", exact_ci_conditional),
    ("midp", exact_ci_midp),
    ("blaker", exact_ci_blaker),
    ("wald_haldane", ci_wald_haldane),
]

imbalanced_results = {}
for name, method in methods_to_try:
    try:
        start_time = time.time()
        result = method(a, b, c, d)
        elapsed = time.time() - start_time
        imbalanced_results[name] = (result, elapsed)
    except Exception as e:
        imbalanced_results[name] = ((f"Error: {str(e)}", None), 0)

# Display results with timing information
print("Method        Lower      Upper      Time (s)")
print("-" * 50)
for method, (result, elapsed) in imbalanced_results.items():
    if isinstance(result[0], str):
        print(f"{method:12s} {result[0]}")
    else:
        lower, upper = result
        print(f"{method:12s} {lower:10.3f} {upper:10.3f} {elapsed:10.6f}")

In [8]:
# Try unconditional method with timeout
from exactcis.utils import create_timeout_checker

try:
    print("\nTrying unconditional method with timeout...")
    start_time = time.time()
    result = exact_ci_unconditional(a, b, c, d, timeout=5)  # 5-second timeout
    elapsed = time.time() - start_time
    lower, upper = result
    print(f"unconditional: ({lower:.3f}, {upper:.3f}) - completed in {elapsed:.2f}s")
except Exception as e:
    print(f"Error: {str(e)}")

## 4. Very Large Tables

For tables with large counts, some methods might become slow, while others remain efficient.

In [9]:
# Example large table
# 2×2 table:   Cases    Controls
#   Exposed     500        400
#   Unexposed   300        600

a, b, c, d = 500, 400, 300, 600

print("Large table:\n")
print(f"     | Cases | Controls")
print(f"-----|-------|----------")
print(f"Exp. |  {a:3d}  |    {b:3d}")
print(f"Unex.|  {c:3d}  |    {d:3d}\n")

# Try fast methods suitable for large tables
fast_methods = [
    ("conditional", exact_ci_conditional),
    ("midp", exact_ci_midp),
    ("wald_haldane", ci_wald_haldane),
]

large_results = {}
for name, method in fast_methods:
    try:
        start_time = time.time()
        result = method(a, b, c, d)
        elapsed = time.time() - start_time
        large_results[name] = (result, elapsed)
    except Exception as e:
        large_results[name] = ((f"Error: {str(e)}", None), 0)

# Display results with timing information
print("Method        Lower      Upper      Time (s)")
print("-" * 50)
for method, (result, elapsed) in large_results.items():
    if isinstance(result[0], str):
        print(f"{method:12s} {result[0]}")
    else:
        lower, upper = result
        print(f"{method:12s} {lower:10.3f} {upper:10.3f} {elapsed:10.6f}")

## Summary: Recommendations for Edge Cases

Based on the examples above, here are recommendations for handling various edge cases:

1. **Tables with Zero Cells**:
   - Use `ci_wald_haldane` for a quick solution (automatically applies Haldane correction)
   - For exact methods, consider using the unconditional method with `use_profile=True`
   - Alternatively, manually add 0.5 to each cell before calculation

2. **Sparse Tables**:
   - The unconditional method often provides the best power for small sample sizes
   - The mid-P method is a good compromise between coverage and power
   - The conditional method is very conservative with small counts

3. **Tables with Extreme Imbalance**:
   - The mid-P and conditional methods handle these well
   - The unconditional method might be slow and should be used with a timeout
   - The Wald-Haldane method is very fast but less accurate

4. **Very Large Tables**:
   - All methods work well, but with varying computational costs
   - For large tables, the differences between methods diminish
   - The Wald-Haldane method offers excellent performance with minimal accuracy trade-offs