<a href="https://colab.research.google.com/github/gcosma/personalised_mltc/blob/main/InputOutputPred.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 📊 PREDICTIVE OUTCOME ANALYSIS TOOL

## What This Tool Does
Calculates co-occurrence probabilities between medical conditions in patients with intellectual disability and MLTCs.

**Clinical Question:** "If my patient has Condition A, what's the probability they also have Condition B?"

---

## How It Works: Simple Example

**Patient has Diabetes**

From your uploaded dataset (e.g., Males aged <45 with intellectual disability):
- 280 have Diabetes
- 81 of these 280 also have Hypertension
- **Calculation:** 81 ÷ 280 × 100 = **28.9%**

**Answer:** 28.9% of Diabetes patients also have Hypertension

**Context:** Across all patients in this age/gender group, only 11.2% have Hypertension. So patients with Diabetes are **2.6× more likely** to have Hypertension than others in this same group.

**Risk Level:** Odds Ratio = 8.57 → **HIGH RISK** (strong association, prioritize screening)

---

## Understanding Your Results

| What You See | What It Means |
|--------------|---------------|
| **Probability (28.9%)** | % of patients with input condition who also have output condition |
| **Group Average (11.2%)** | % of all patients in this age/gender group with output condition |
| **Relative Risk (2.6×)** | How many times higher than group average |
| **HIGH/MODERATE/LOW** | Risk level: HIGH ≥5, MODERATE ≥3, LOW ≥2 (based on Odds Ratio) |
| **⭐⭐⭐ (Strong)** | Evidence quality: ⭐⭐⭐ ≥100 patients, ⭐⭐ 50-99, ⭐ 20-49 |

---

## Important Notes

✅ Shows which conditions commonly occur together in your specific population group (age/gender)
❌ Does NOT predict individual outcomes with certainty
❌ Does NOT show causation or which came first

**Your dataset:** Results are specific to the age and gender group in your uploaded file (e.g., Males <45, Females 45-64, etc.). Probabilities may differ across different age/gender groups.

**Direction matters:** 28.9% of Diabetes patients have Hypertension, BUT only 21.4% of Hypertension patients have Diabetes (same 81 patients, different starting populations: 280 vs 378).

---

## How to Use

1. Upload your preprocessed CSV file (specific to one age/gender group)
2. Select INPUT conditions (what patient has)
3. Select OUTPUT conditions (what to predict)
4. Click "Run Analysis"

**Use results for:** Screening decisions, care planning, patient education **within the same age/gender group as your data**

---

**Ready to begin? Upload your data file below.**

In [17]:
# @title Default title text
# PREDICTIVE OUTCOME ANALYSIS
# Upload your preprocessed CSV file and analyse condition outcomes

import pandas as pd
import numpy as np
from IPython.display import display, HTML, clear_output
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed, interact_manual

# ============================================================================
# STEP 1: Upload and Load Data
# ============================================================================

print("📊 PREDICTIVE OUTCOME ANALYSIS TOOL")
print("=" * 60)
print("Upload your preprocessed CSV file (e.g., SAIL_Males_below45_preprocessed.csv)")
print()

from google.colab import files
uploaded = files.upload()

# Get the filename
filename = list(uploaded.keys())[0]
print(f"\n✅ Loaded: {filename}")

# Load the data
data = pd.read_csv(filename)

# Also load individual condition prevalence if available
individual_conditions_file = filename.replace('_preprocessed.csv', '_preprocessed_individual_conditions.csv')
try:
    individual_data = pd.read_csv(individual_conditions_file)
    has_individual_data = True
    print(f"✅ Loaded individual conditions: {individual_conditions_file}")
except:
    has_individual_data = False
    print("ℹ️  No individual conditions file found - will use pairwise data only")

print(f"\n📈 Dataset Info:")
print(f"   • Total relationships: {len(data)}")
print(f"   • Total patients: {data['TotalPatientsInGroup'].iloc[0]:,}")
print(f"   • Unique conditions: {len(set(data['ConditionA']) | set(data['ConditionB']))}")

# ============================================================================
# STEP 2: Helper Functions
# ============================================================================

def get_all_conditions(data):
    """Get sorted list of all unique conditions"""
    conditions = set(data['ConditionA']) | set(data['ConditionB'])
    return sorted(list(conditions))

def get_condition_count(condition, data):
    """Get total number of patients with a condition"""
    # Try to find from ConditionA_Count or ConditionB_Count
    rows_a = data[data['ConditionA'] == condition]
    rows_b = data[data['ConditionB'] == condition]

    if not rows_a.empty:
        return rows_a['ConditionA_Count'].iloc[0]
    elif not rows_b.empty:
        return rows_b['ConditionB_Count'].iloc[0]
    else:
        return None

def get_baseline_prevalence(condition, data):
    """Calculate baseline prevalence of a condition"""
    total_patients = data['TotalPatientsInGroup'].iloc[0]
    condition_count = get_condition_count(condition, data)

    if condition_count:
        return (condition_count / total_patients) * 100
    return None

def calculate_conditional_probability(input_cond, output_cond, data):
    """Calculate P(output | input) using direct counts - treating relationships as bidirectional"""
    # Find the relationship
    relationship = data[
        ((data['ConditionA'] == input_cond) & (data['ConditionB'] == output_cond)) |
        ((data['ConditionA'] == output_cond) & (data['ConditionB'] == input_cond))
    ]

    if relationship.empty:
        return None, None, None, None, None

    row = relationship.iloc[0]
    pair_freq = row['PairFrequency']
    odds_ratio = row['OddsRatio']
    ci_low = row['OddsRatio_CI_Low']
    ci_high = row['OddsRatio_CI_High']

    # Get the count for the input condition (regardless of which column it's in)
    if row['ConditionA'] == input_cond:
        input_count = row['ConditionA_Count']
    else:
        input_count = row['ConditionB_Count']

    # Calculate conditional probability
    if input_count and input_count > 0:
        probability = (pair_freq / input_count) * 100
    else:
        probability = None

    # Get time to onset
    time_str = row['MedianDurationYearsWithIQR']

    return probability, odds_ratio, (ci_low, ci_high), pair_freq, time_str

def evidence_strength(sample_size):
    """Determine evidence strength based on sample size"""
    if sample_size is None or pd.isna(sample_size):
        return "No data", 0
    if sample_size >= 100:
        return "Strong", 3
    elif sample_size >= 50:
        return "Moderate", 2
    elif sample_size >= 20:
        return "Limited", 1
    else:
        return "Weak", 0

# ============================================================================
# STEP 3: Create Interactive Analysis Tool
# ============================================================================

def analyze_outcomes(input1, input2, input3, input4, input5,
                     output1, output2, output3, output4, output5):
    """Main analysis function"""

    # Clear previous output to prevent repetition
    clear_output(wait=True)

    # Collect inputs (filter out None/empty)
    input_conditions = [i for i in [input1, input2, input3, input4, input5]
                       if i and i != 'None' and i != '']
    output_conditions = [o for o in [output1, output2, output3, output4, output5]
                        if o and o != 'None' and o != '']

    if not input_conditions:
        print("⚠️  Please select at least one input condition")
        return

    if not output_conditions:
        print("⚠️  Please select at least one output condition")
        return

    # Display header with styling
    header_html = f"""
    <div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); padding: 30px; border-radius: 10px; margin-bottom: 20px; color: white;">
        <h1 style="margin: 0; font-size: 2em;">📊 Predictive Outcome Analysis Results</h1>
        <p style="margin: 10px 0 0 0; opacity: 0.9;">Based on condition relationships in your dataset</p>
    </div>
    """
    display(HTML(header_html))

    # Input conditions summary
    input_html = f"""
    <div style="background-color: #f0f9ff; padding: 15px; border-radius: 8px; margin: 15px 0; border-left: 4px solid #3b82f6;">
        <h3 style="color: #1e40af; margin-top: 0;">📥 Input Conditions (Patient has):</h3>
        <ul style="list-style-type: none; padding-left: 0;">
    """
    for i, cond in enumerate(input_conditions, 1):
        input_html += f'<li style="padding: 5px 0;">✓ <strong>{cond}</strong></li>'
    input_html += "</ul></div>"
    display(HTML(input_html))

    # Output conditions summary
    output_html = f"""
    <div style="background-color: #fef3c7; padding: 15px; border-radius: 8px; margin: 15px 0; border-left: 4px solid #f59e0b;">
        <h3 style="color: #92400e; margin-top: 0;">📤 Output Conditions (Predicting):</h3>
        <ul style="list-style-type: none; padding-left: 0;">
    """
    for i, cond in enumerate(output_conditions, 1):
        output_html += f'<li style="padding: 5px 0;">→ <strong>{cond}</strong></li>'
    output_html += "</ul></div>"
    display(HTML(output_html))

    # Analyse each output condition
    for output_cond in output_conditions:
        # Outcome header
        outcome_html = f"""
        <div style="background-color: #f8f9fa; padding: 20px; border-radius: 8px; margin: 25px 0; border: 2px solid #dee2e6;">
            <h3 style="color: #495057; margin-top: 0;">🎯 Outcome: <span style="color: #0066cc;">{output_cond}</span></h3>
        """

        baseline = get_baseline_prevalence(output_cond, data)
        if baseline:
            outcome_html += f'<p style="margin: 5px 0; color: #6c757d;">Group average prevalence: <strong>{baseline:.1f}%</strong></p>'

        outcome_html += "</div>"
        display(HTML(outcome_html))

        # Create results table
        results = []

        for input_cond in input_conditions:
            prob, or_val, ci, sample, time_str = calculate_conditional_probability(
                input_cond, output_cond, data
            )

            if prob is not None:
                strength, stars = evidence_strength(sample)
                rel_risk = (prob / baseline) if baseline else None

                results.append({
                    'Input': input_cond,
                    'Probability': prob,
                    'Relative_Risk': rel_risk,
                    'OR': or_val,
                    'CI': ci,
                    'Sample': sample,
                    'Time': time_str,
                    'Stars': stars
                })

        if not results:
            no_data_html = """
            <div style="background-color: #fff3cd; padding: 15px; border-radius: 8px; border-left: 4px solid #ffc107; margin: 15px 0;">
                <p style="margin: 0; color: #856404;">ℹ️ No data available for any input conditions</p>
            </div>
            """
            display(HTML(no_data_html))
            continue

        # Sort by probability (highest first)
        results_sorted = sorted(results, key=lambda x: x['Probability'], reverse=True)

        # Create HTML table with styling similar to Streamlit app
        html_table = f"""
        <div style="background-color: #ffffff; padding: 15px; border-radius: 8px; margin: 15px 0; border: 1px solid #e2e8f0;">
            <h4 style="color: #2c3e50; margin-bottom: 15px;">Association Paths for {output_cond}</h4>
            <table style="width: 100%; border-collapse: collapse; font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;">
                <thead>
                    <tr style="background-color: #f8f9fa; border-bottom: 2px solid #dee2e6;">
                        <th style="padding: 12px; text-align: left; border: 1px solid #ddd;">Input Condition</th>
                        <th style="padding: 12px; text-align: center; border: 1px solid #ddd;">Probability</th>
                        <th style="padding: 12px; text-align: center; border: 1px solid #ddd;">Risk Level</th>
                        <th style="padding: 12px; text-align: center; border: 1px solid #ddd;">Odds Ratio</th>
                        <th style="padding: 12px; text-align: center; border: 1px solid #ddd;">Typical Time</th>
                        <th style="padding: 12px; text-align: center; border: 1px solid #ddd;">Evidence</th>
                    </tr>
                </thead>
                <tbody>
        """

        for i, r in enumerate(results_sorted):
            # Determine risk colour based on Odds Ratio (not Relative Risk)
            or_val = r['OR']
            if or_val >= 5:
                risk_colour = '#dc3545'  # High risk - red
                risk_label = 'HIGH'
            elif or_val >= 3:
                risk_colour = '#ffc107'  # Moderate - amber
                risk_label = 'MODERATE'
            elif or_val >= 2:
                risk_colour = '#28a745'  # Low - green
                risk_label = 'LOW'
            else:
                risk_colour = '#6c757d'  # Very low - grey
                risk_label = 'VERY LOW'

            star_display = '⭐' * r['Stars'] if r['Stars'] > 0 else '—'
            marker = '👉 ' if i == 0 else ''

            ci_str = f"[{r['CI'][0]:.1f}—{r['CI'][1]:.1f}]"
            rel_str = f"{r['Relative_Risk']:.1f}×" if r['Relative_Risk'] else "—"

            row_style = "background-color: #f8f9fa;" if i == 0 else ""

            html_table += f"""
                    <tr style="{row_style}">
                        <td style="padding: 10px; border: 1px solid #ddd;">{marker}<strong>{r['Input']}</strong></td>
                        <td style="padding: 10px; border: 1px solid #ddd; text-align: center;">
                            <span style="font-size: 1.2em; font-weight: bold; color: #2c3e50;">{r['Probability']:.1f}%</span>
                        </td>
                        <td style="padding: 10px; border: 1px solid #ddd; text-align: center;">
                            <span style="padding: 4px 8px; border-radius: 4px; background-color: {risk_colour}; color: white; font-weight: bold; font-size: 0.85em;">
                                {risk_label}
                            </span>
                            <br><small style="color: #6c757d;">{rel_str} group average</small>
                        </td>
                        <td style="padding: 10px; border: 1px solid #ddd; text-align: center;">
                            <strong>{r['OR']:.1f}</strong><br>
                            <small style="color: #6c757d;">{ci_str}</small>
                        </td>
                        <td style="padding: 10px; border: 1px solid #ddd; text-align: center; font-style: italic; color: #666;">
                            {r['Time']}<br>
                            <small style="color: #6c757d;">median ± IQR</small>
                        </td>
                        <td style="padding: 10px; border: 1px solid #ddd; text-align: center;">
                            {star_display}<br>
                            <small style="color: #6c757d;">n={r['Sample']}</small>
                        </td>
                    </tr>
            """

        html_table += """
                </tbody>
            </table>
        </div>
        """

        display(HTML(html_table))

        # Summary box
        if results_sorted:
            strongest = results_sorted[0]
            summary_html = f"""
            <div style="background-color: #e8f4f8; padding: 15px; border-radius: 8px; border-left: 4px solid #0066cc; margin: 15px 0;">
                <h4 style="color: #0066cc; margin-top: 0;">💡 Key Finding</h4>
                <p style="margin: 5px 0; font-size: 1.05em;">
                    <strong>Strongest Association:</strong> {strongest['Input']} ↔ {output_cond}
                </p>
                <ul style="margin: 10px 0; padding-left: 20px;">
                    <li><strong>{strongest['Probability']:.1f}%</strong> of patients with {strongest['Input']} also have {output_cond}</li>
                    <li>This represents <strong>{strongest['Relative_Risk']:.1f}× the group average</strong></li>
                    <li>Odds Ratio: <strong>{strongest['OR']:.1f}</strong> (Risk Level: <strong>{"HIGH" if strongest['OR'] >= 5 else "MODERATE" if strongest['OR'] >= 3 else "LOW"}</strong>)</li>
                    <li>Typical time between conditions: <strong>{strongest['Time']}</strong></li>
                    <li>Based on <strong>{strongest['Sample']} patients</strong></li>
                </ul>
                <p style="margin: 10px 0 5px 0; padding: 10px; background-color: #fff; border-radius: 4px; font-size: 0.9em; color: #666;">
                    ℹ️ <strong>Note:</strong> Risk levels based on Odds Ratio (High: OR≥5, Moderate: OR≥3, Low: OR≥2). These associations show co-occurrence within this age/gender group.
                </p>
            </div>
            """
            display(HTML(summary_html))

    completion_html = """
    <div style="background: linear-gradient(135deg, #11998e 0%, #38ef7d 100%); padding: 20px; border-radius: 10px; margin: 25px 0; color: white; text-align: center;">
        <h3 style="margin: 0;">✅ Analysis Complete</h3>
        <p style="margin: 10px 0 0 0; opacity: 0.9;">Review the results above for co-occurrence predictions</p>
    </div>
    """
    display(HTML(completion_html))

# ============================================================================
# STEP 4: Create Interactive UI
# ============================================================================

all_conditions = get_all_conditions(data)
condition_options = ['None'] + all_conditions

print("\n\n" + "="*80)
print("🔬 INTERACTIVE ANALYSIS TOOL")
print("="*80)
print("Select input and output conditions below, then click 'Run Analysis'")
print()

# Create dropdowns
input_style = {'description_width': '150px'}
output_style = {'description_width': '150px'}

input1_widget = widgets.Dropdown(
    options=condition_options,
    value='None',
    description='Input Condition 1:',
    style=input_style,
    layout=widgets.Layout(width='500px')
)

input2_widget = widgets.Dropdown(
    options=condition_options,
    value='None',
    description='Input Condition 2:',
    style=input_style,
    layout=widgets.Layout(width='500px')
)

input3_widget = widgets.Dropdown(
    options=condition_options,
    value='None',
    description='Input Condition 3:',
    style=input_style,
    layout=widgets.Layout(width='500px')
)

input4_widget = widgets.Dropdown(
    options=condition_options,
    value='None',
    description='Input Condition 4:',
    style=input_style,
    layout=widgets.Layout(width='500px')
)

input5_widget = widgets.Dropdown(
    options=condition_options,
    value='None',
    description='Input Condition 5:',
    style=input_style,
    layout=widgets.Layout(width='500px')
)

output1_widget = widgets.Dropdown(
    options=condition_options,
    value='None',
    description='Output Condition 1:',
    style=output_style,
    layout=widgets.Layout(width='500px')
)

output2_widget = widgets.Dropdown(
    options=condition_options,
    value='None',
    description='Output Condition 2:',
    style=output_style,
    layout=widgets.Layout(width='500px')
)

output3_widget = widgets.Dropdown(
    options=condition_options,
    value='None',
    description='Output Condition 3:',
    style=output_style,
    layout=widgets.Layout(width='500px')
)

output4_widget = widgets.Dropdown(
    options=condition_options,
    value='None',
    description='Output Condition 4:',
    style=output_style,
    layout=widgets.Layout(width='500px')
)

output5_widget = widgets.Dropdown(
    options=condition_options,
    value='None',
    description='Output Condition 5:',
    style=output_style,
    layout=widgets.Layout(width='500px')
)

# Display UI
display(HTML("<h3>📥 INPUT CONDITIONS (Patient Currently Has):</h3>"))
display(input1_widget)
display(input2_widget)
display(input3_widget)
display(input4_widget)
display(input5_widget)

display(HTML("<h3 style='margin-top: 30px;'>📤 OUTPUT CONDITIONS (To Predict):</h3>"))
display(output1_widget)
display(output2_widget)
display(output3_widget)
display(output4_widget)
display(output5_widget)

display(HTML("<h3 style='margin-top: 30px;'>▶️ Run Analysis:</h3>"))

# Create interactive analysis
interact_manual(
    analyze_outcomes,
    input1=input1_widget,
    input2=input2_widget,
    input3=input3_widget,
    input4=input4_widget,
    input5=input5_widget,
    output1=output1_widget,
    output2=output2_widget,
    output3=output3_widget,
    output4=output4_widget,
    output5=output5_widget
)

print("\n✅ Setup complete! Use the dropdowns above to select conditions.")

📊 PREDICTIVE OUTCOME ANALYSIS TOOL
Upload your preprocessed CSV file (e.g., SAIL_Males_below45_preprocessed.csv)



Saving CPRD_Females_45to64_preprocessed.csv to CPRD_Females_45to64_preprocessed (12).csv

✅ Loaded: CPRD_Females_45to64_preprocessed (12).csv
✅ Loaded individual conditions: CPRD_Females_45to64_preprocessed (12).csv

📈 Dataset Info:
   • Total relationships: 97
   • Total patients: 3,494
   • Unique conditions: 24


🔬 INTERACTIVE ANALYSIS TOOL
Select input and output conditions below, then click 'Run Analysis'



Dropdown(description='Input Condition 1:', layout=Layout(width='500px'), options=('None', 'Anaemia', 'Cancer',…

Dropdown(description='Input Condition 2:', layout=Layout(width='500px'), options=('None', 'Anaemia', 'Cancer',…

Dropdown(description='Input Condition 3:', layout=Layout(width='500px'), options=('None', 'Anaemia', 'Cancer',…

Dropdown(description='Input Condition 4:', layout=Layout(width='500px'), options=('None', 'Anaemia', 'Cancer',…

Dropdown(description='Input Condition 5:', layout=Layout(width='500px'), options=('None', 'Anaemia', 'Cancer',…

Dropdown(description='Output Condition 1:', layout=Layout(width='500px'), options=('None', 'Anaemia', 'Cancer'…

Dropdown(description='Output Condition 2:', layout=Layout(width='500px'), options=('None', 'Anaemia', 'Cancer'…

Dropdown(description='Output Condition 3:', layout=Layout(width='500px'), options=('None', 'Anaemia', 'Cancer'…

Dropdown(description='Output Condition 4:', layout=Layout(width='500px'), options=('None', 'Anaemia', 'Cancer'…

Dropdown(description='Output Condition 5:', layout=Layout(width='500px'), options=('None', 'Anaemia', 'Cancer'…

interactive(children=(Dropdown(description='Input Condition 1:', layout=Layout(width='500px'), options=('None'…


✅ Setup complete! Use the dropdowns above to select conditions.
