# DOOR Analysis Tutorial
## Desirability of Outcome Ranking for Benefit-Risk Assessment

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nexvigilant/nv-BR-toolkit/blob/main/notebooks/DOOR_Analysis_Tutorial.ipynb)

---

### ‚ö†Ô∏è EDUCATIONAL USE ONLY

**This notebook is provided strictly for educational and instructional purposes.**
- Do NOT use for regulatory decision-making
- Do NOT use as a substitute for internal SOPs
- All data in this notebook is **simulated/hypothetical**

---

### Learning Objectives

After completing this notebook, you will be able to:

1. **Explain** the DOOR methodology and when to use it
2. **Construct** an outcome hierarchy for composite endpoints
3. **Calculate** win ratios and net benefit metrics
4. **Interpret** DOOR analysis results in a benefit-risk context
5. **Visualize** outcome distributions across treatment arms

---

### Background: What is DOOR?

**DOOR (Desirability of Outcome Ranking)** is a method for analyzing composite endpoints that:

- Respects the clinical hierarchy of outcomes (death is always worse than hospitalization)
- Avoids arbitrary numerical weighting of outcomes
- Uses pairwise comparison of patients between treatment groups
- Produces intuitive metrics like "win ratio"

**Reference:** CIOMS Working Group XII Report on Benefit-Risk Assessment

## Setup

First, let's install and import the required packages.

In [None]:
# Install dependencies (uncomment if running in Colab)
# !pip install pandas numpy scipy matplotlib

In [None]:
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# Set display options
pd.set_option('display.max_columns', None)
plt.style.use('seaborn-v0_8-whitegrid')

print("‚úÖ Setup complete!")

## Step 1: Define the Outcome Hierarchy

The **most critical step** in DOOR analysis is defining the outcome hierarchy.

### Key Principle
> A patient in a "better" category is ALWAYS preferred to a patient in a "worse" category, regardless of any other factors.

### Example: Cardiovascular Trial

For an anticoagulant trial, we might define outcomes from most to least desirable:

In [None]:
# Define outcome hierarchy (MOST desirable first, LEAST desirable last)
outcome_hierarchy = [
    "Alive, no CV event, no bleed",           # Rank 1 - Best outcome
    "Alive, no CV event, minor bleed",        # Rank 2
    "Alive, minor CV event, no bleed",        # Rank 3
    "Alive, major CV event recovered",        # Rank 4
    "Alive, no CV event, major bleed",        # Rank 5
    "Alive, major CV event + major bleed",    # Rank 6
    "CV death",                               # Rank 7
    "Non-CV death"                            # Rank 8 - Worst outcome
]

# Display the hierarchy
print("OUTCOME HIERARCHY")
print("=" * 50)
for i, outcome in enumerate(outcome_hierarchy, 1):
    emoji = "üü¢" if i <= 2 else "üü°" if i <= 4 else "üü†" if i <= 6 else "üî¥"
    print(f"{emoji} Rank {i}: {outcome}")

### üí° Practice Exercise

**Question:** Why is "CV death" ranked lower than "Non-CV death"?

<details>
<summary>Click for answer</summary>

In many cardiovascular trials, CV death is considered directly related to the disease being treated, while non-CV death may be due to pre-existing conditions or unrelated causes. However, the ranking depends on the specific trial's objectives and can be debated.

This illustrates why **clinical input is essential** when defining hierarchies.
</details>

## Step 2: Generate Simulated Trial Data

For educational purposes, we'll create hypothetical trial data.

**Note:** This data is entirely simulated and should NOT be interpreted as real clinical evidence.

In [None]:
def create_simulated_trial_data(n_treatment=500, n_control=500, seed=42):
    """
    Create simulated trial data with different outcome distributions.
    
    The treatment arm is designed to show better outcomes (fewer severe events).
    """
    np.random.seed(seed)
    
    # Treatment arm probabilities (better outcomes more likely)
    treatment_probs = [0.45, 0.15, 0.12, 0.10, 0.08, 0.05, 0.03, 0.02]
    
    # Control arm probabilities (worse outcomes more likely)
    control_probs = [0.35, 0.12, 0.10, 0.12, 0.10, 0.08, 0.08, 0.05]
    
    # Generate outcomes
    treatment_outcomes = np.random.choice(
        outcome_hierarchy, size=n_treatment, p=treatment_probs
    )
    control_outcomes = np.random.choice(
        outcome_hierarchy, size=n_control, p=control_probs
    )
    
    # Create DataFrames
    treatment_df = pd.DataFrame({
        'patient_id': [f'T{i:04d}' for i in range(n_treatment)],
        'treatment': 'Drug A',
        'outcome': treatment_outcomes
    })
    
    control_df = pd.DataFrame({
        'patient_id': [f'C{i:04d}' for i in range(n_control)],
        'treatment': 'Placebo',
        'outcome': control_outcomes
    })
    
    return pd.concat([treatment_df, control_df], ignore_index=True)

# Generate the data
trial_data = create_simulated_trial_data()

print(f"Total patients: {len(trial_data)}")
print(f"Treatment arm: {len(trial_data[trial_data['treatment'] == 'Drug A'])}")
print(f"Control arm: {len(trial_data[trial_data['treatment'] == 'Placebo'])}")
print("\nSample data:")
trial_data.sample(5)

## Step 3: Assign DOOR Ranks

Each patient is assigned a rank based on their outcome category.

In [None]:
def assign_door_ranks(data, hierarchy, outcome_col='outcome'):
    """
    Assign DOOR ranks to each patient based on the outcome hierarchy.
    
    Rank 1 = Best outcome, higher ranks = worse outcomes
    """
    # Create rank mapping
    rank_map = {outcome: rank + 1 for rank, outcome in enumerate(hierarchy)}
    
    # Assign ranks
    data = data.copy()
    data['door_rank'] = data[outcome_col].map(rank_map)
    
    return data

# Apply ranking
trial_data = assign_door_ranks(trial_data, outcome_hierarchy)

# Verify the ranking
print("Rank distribution by treatment:")
print(trial_data.groupby(['treatment', 'door_rank']).size().unstack(fill_value=0))

## Step 4: Perform Pairwise Comparisons

The core of DOOR analysis is comparing **every** patient in the treatment arm against **every** patient in the control arm.

For each pair:
- If treatment patient has **lower** (better) rank ‚Üí Treatment wins
- If control patient has **lower** (better) rank ‚Üí Control wins
- If ranks are **equal** ‚Üí Tie

In [None]:
def door_pairwise_comparison(data, treatment_col='treatment', 
                              treatment_arm='Drug A', control_arm='Placebo'):
    """
    Perform DOOR pairwise comparison between treatment and control arms.
    """
    # Get ranks for each arm
    trt_ranks = data[data[treatment_col] == treatment_arm]['door_rank'].values
    ctrl_ranks = data[data[treatment_col] == control_arm]['door_rank'].values
    
    # Count outcomes
    n_trt, n_ctrl = len(trt_ranks), len(ctrl_ranks)
    n_pairs = n_trt * n_ctrl
    
    trt_wins = 0
    ctrl_wins = 0
    ties = 0
    
    # Pairwise comparison (vectorized for efficiency)
    for t in trt_ranks:
        trt_wins += np.sum(t < ctrl_ranks)  # Treatment has better rank
        ctrl_wins += np.sum(t > ctrl_ranks)  # Control has better rank
        ties += np.sum(t == ctrl_ranks)      # Same rank
    
    # Calculate metrics
    results = {
        'n_treatment': n_trt,
        'n_control': n_ctrl,
        'n_pairs': n_pairs,
        'treatment_wins': trt_wins,
        'control_wins': ctrl_wins,
        'ties': ties,
        'p_treatment_better': trt_wins / n_pairs,
        'p_control_better': ctrl_wins / n_pairs,
        'p_tie': ties / n_pairs,
        'win_ratio': trt_wins / ctrl_wins if ctrl_wins > 0 else np.inf,
        'net_benefit': (trt_wins - ctrl_wins) / n_pairs
    }
    
    # Add Mann-Whitney U test
    u_stat, p_value = stats.mannwhitneyu(trt_ranks, ctrl_ranks, alternative='less')
    results['mann_whitney_u'] = u_stat
    results['p_value'] = p_value
    
    return results

# Run the analysis
results = door_pairwise_comparison(trial_data)

print("DOOR ANALYSIS RESULTS")
print("=" * 50)
for key, value in results.items():
    if isinstance(value, float):
        print(f"{key}: {value:.4f}")
    else:
        print(f"{key}: {value:,}")

## Step 5: Interpret the Results

### Key Metrics Explained

| Metric | Interpretation |
|--------|---------------|
| **Win Ratio** | For every pair where control wins, how many pairs does treatment win? >1 favors treatment |
| **Net Benefit** | P(treatment better) - P(control better). Range: -1 to +1 |
| **p-value** | Statistical significance (Mann-Whitney U test) |

In [None]:
def interpret_door_results(results):
    """
    Generate interpretation of DOOR analysis results.
    """
    print("\n" + "=" * 60)
    print("INTERPRETATION")
    print("=" * 60)
    
    wr = results['win_ratio']
    nb = results['net_benefit']
    p = results['p_value']
    
    # Win ratio interpretation
    if wr > 1.5:
        print(f"\n‚úÖ STRONG TREATMENT EFFECT")
        print(f"   Win ratio of {wr:.2f} indicates treatment patients are")
        print(f"   substantially more likely to have better outcomes.")
    elif wr > 1.0:
        print(f"\nüü° MODERATE TREATMENT EFFECT")
        print(f"   Win ratio of {wr:.2f} suggests a treatment advantage.")
    elif wr == 1.0:
        print(f"\n‚ö™ NO DIFFERENCE")
        print(f"   Win ratio of 1.0 indicates equivalent outcomes.")
    else:
        print(f"\nüî¥ CONTROL FAVORED")
        print(f"   Win ratio of {wr:.2f} suggests control arm is better.")
    
    # Statistical significance
    print(f"\nüìä STATISTICAL SIGNIFICANCE")
    if p < 0.001:
        print(f"   p = {p:.4f} - Highly significant (p < 0.001)")
    elif p < 0.01:
        print(f"   p = {p:.4f} - Very significant (p < 0.01)")
    elif p < 0.05:
        print(f"   p = {p:.4f} - Significant (p < 0.05)")
    else:
        print(f"   p = {p:.4f} - NOT statistically significant")
    
    # Net benefit
    print(f"\nüìà NET BENEFIT: {nb:.1%}")
    print(f"   In {abs(nb)*100:.1f}% more pairwise comparisons,")
    if nb > 0:
        print(f"   the treatment patient had a better outcome.")
    else:
        print(f"   the control patient had a better outcome.")

interpret_door_results(results)

## Step 6: Visualize the Results

Visualization helps communicate DOOR results to stakeholders.

In [None]:
def plot_door_distribution(data, hierarchy, treatment_col='treatment'):
    """
    Create stacked bar chart of outcome distributions by treatment.
    """
    # Calculate percentages
    dist = data.groupby([treatment_col, 'outcome']).size().unstack(fill_value=0)
    dist = dist[[c for c in hierarchy if c in dist.columns]]
    pct = dist.div(dist.sum(axis=1), axis=0) * 100
    
    # Create plot
    fig, ax = plt.subplots(figsize=(12, 6))
    
    # Color gradient: green (best) to red (worst)
    n_cats = len(pct.columns)
    colors = plt.cm.RdYlGn_r(np.linspace(0.15, 0.85, n_cats))
    
    # Plot stacked horizontal bars
    bottom = np.zeros(len(pct))
    for col, color in zip(pct.columns, colors):
        ax.barh(pct.index, pct[col], left=bottom, label=col, color=color, 
                edgecolor='white', linewidth=0.5)
        bottom += pct[col].values
    
    ax.set_xlabel('Percentage of Patients (%)', fontsize=12)
    ax.set_title('DOOR Outcome Distribution by Treatment Arm', fontsize=14, fontweight='bold')
    ax.set_xlim(0, 100)
    ax.legend(bbox_to_anchor=(1.02, 1), loc='upper left', fontsize=9)
    
    plt.tight_layout()
    return fig

# Create visualization
fig = plot_door_distribution(trial_data, outcome_hierarchy)
plt.show()

In [None]:
def plot_win_ratio_summary(results):
    """
    Create pie chart of pairwise comparison results.
    """
    fig, axes = plt.subplots(1, 2, figsize=(12, 5))
    
    # Pie chart of wins/losses/ties
    sizes = [results['treatment_wins'], results['control_wins'], results['ties']]
    labels = ['Treatment Wins', 'Control Wins', 'Ties']
    colors = ['#2ecc71', '#e74c3c', '#95a5a6']
    explode = (0.05, 0, 0)
    
    axes[0].pie(sizes, explode=explode, labels=labels, colors=colors,
                autopct='%1.1f%%', startangle=90, shadow=True)
    axes[0].set_title('Pairwise Comparison Results', fontsize=12, fontweight='bold')
    
    # Win ratio visualization
    wr = results['win_ratio']
    axes[1].barh(['Win Ratio'], [wr], color='#3498db', height=0.5)
    axes[1].axvline(x=1.0, color='red', linestyle='--', label='No difference (WR=1)')
    axes[1].set_xlim(0, max(2, wr * 1.2))
    axes[1].set_title(f'Win Ratio: {wr:.2f}', fontsize=12, fontweight='bold')
    axes[1].legend()
    
    plt.tight_layout()
    return fig

fig = plot_win_ratio_summary(results)
plt.show()

## Summary

### What We Learned

1. **DOOR ranks outcomes hierarchically** without assuming numerical equivalence between categories
2. **Pairwise comparison** evaluates every treatment vs control patient pair
3. **Win ratio** provides an intuitive effect measure
4. **Visualization** helps communicate results to diverse stakeholders

### When to Use DOOR

‚úÖ Composite endpoints with clinically meaningful hierarchy  
‚úÖ When you don't want to assume numerical weights  
‚úÖ When stakeholder interpretation is important  

### Limitations

‚ö†Ô∏è Requires consensus on outcome hierarchy  
‚ö†Ô∏è Computationally intensive for very large trials  
‚ö†Ô∏è Doesn't capture within-category differences  

---

### Further Reading

- CIOMS Working Group XII Report
- Evans SR, et al. "DOOR/RADAR approach to composite endpoints" Clinical Trials 2016
- NexVigilant Benefit-Risk Intelligence Toolkit (companion materials)

---

**NexVigilant** | *Empowerment Through Vigilance*

This notebook is part of the [Benefit-Risk Intelligence Toolkit](https://github.com/nexvigilant/nv-BR-toolkit).