# Survival Analysis: METABRIC Breast Cancer Dataset
## Comparing Drug Treatment Outcomes Over Time

**Objective**: Compare survival outcomes between different drug treatments in breast cancer patients.

**Dataset**: METABRIC (Molecular Taxonomy of Breast Cancer International Consortium)
- Available on cBioPortal: https://www.cbioportal.org/study/summary?id=brca_metabric
- Also available through R package or Kaggle

**Survival Analysis Setup**:
- **Time variable**: Overall survival time (months)
- **Event**: Death (1 = died, 0 = censored/alive)
- **Primary comparison**: Hormone therapy vs Chemotherapy
- **Covariates**: Age, tumor characteristics, molecular subtypes, stage

In [None]:
# Install required packages
!pip install lifelines pandas numpy matplotlib seaborn scikit-learn

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from lifelines import KaplanMeierFitter, CoxPHFitter, WeibullAFTFitter
from lifelines.statistics import logrank_test, multivariate_logrank_test, pairwise_logrank_test
from lifelines.utils import median_survival_times
from lifelines.plotting import plot_lifetimes
import warnings
warnings.filterwarnings('ignore')

plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')
%matplotlib inline

## 1. Data Loading and Exploration

In [None]:
# Load METABRIC dataset
# Download from cBioPortal or use the CSV file
# File: brca_metabric_clinical_data.tsv
df = pd.read_csv('brca_metabric_clinical_data.tsv', sep='\t')

print(f"Dataset shape: {df.shape}")
print(f"\nColumns: {df.columns.tolist()[:20]}...")  # Show first 20 columns
print(f"\nTotal patients: {len(df)}")

In [None]:
# Key columns for survival analysis
key_cols = ['PATIENT_ID', 'AGE_AT_DIAGNOSIS', 'OS_STATUS', 'OS_MONTHS',
           'CANCER_TYPE', 'CANCER_TYPE_DETAILED', 'TUMOR_SIZE', 'TUMOR_STAGE',
           'LYMPH_NODES_EXAMINED_POSITIVE', 'NPI', 'CELLULARITY',
           'CHEMOTHERAPY', 'HORMONE_THERAPY', 'RADIO_THERAPY',
           'ER_STATUS', 'HER2_STATUS', 'PR_STATUS', 'THREEGENE',
           'CLAUDIN_SUBTYPE', 'INTCLUST']

# Filter to available columns
available_cols = [col for col in key_cols if col in df.columns]
df_analysis = df[available_cols].copy()

print(f"Selected columns: {len(available_cols)}")
print(f"Available columns: {available_cols}")
print(f"\nMissing values:")
print(df_analysis.isnull().sum()[df_analysis.isnull().sum() > 0])

In [None]:
# Explore treatment distributions
print("Treatment distributions:")
print(f"\nChemotherapy:")
print(df_analysis['CHEMOTHERAPY'].value_counts())
print(f"\nHormone Therapy:")
print(df_analysis['HORMONE_THERAPY'].value_counts())
print(f"\nRadio Therapy:")
print(df_analysis['RADIO_THERAPY'].value_counts())

## 2. Data Preprocessing for Survival Analysis

In [None]:
# Create survival dataset
survival_df = df_analysis.copy()

# Handle survival time and status
# OS_MONTHS is survival time
# OS_STATUS: 1:DECEASED or 0:LIVING
survival_df['duration'] = pd.to_numeric(survival_df['OS_MONTHS'], errors='coerce')
survival_df['event'] = survival_df['OS_STATUS'].map({'1:DECEASED': 1, '0:LIVING': 0})

# Handle missing values
survival_df = survival_df.dropna(subset=['duration', 'event'])

# Handle zero or negative duration
survival_df = survival_df[survival_df['duration'] > 0]

print(f"Survival dataset shape: {survival_df.shape}")
print(f"\nSurvival time statistics (months):")
print(survival_df['duration'].describe())
print(f"\nEvents (deaths): {survival_df['event'].sum()} ({survival_df['event'].mean():.1%})")
print(f"Censored (alive): {(1 - survival_df['event']).sum()} ({(1 - survival_df['event']).mean():.1%})")

In [None]:
# Clean treatment variables
survival_df['chemo'] = survival_df['CHEMOTHERAPY'].map({1: 'Yes', 0: 'No'})
survival_df['hormone'] = survival_df['HORMONE_THERAPY'].map({1: 'Yes', 0: 'No'})
survival_df['radio'] = survival_df['RADIO_THERAPY'].map({1: 'Yes', 0: 'No'})

# Create treatment combination variable
def get_treatment_combo(row):
    treatments = []
    if row['chemo'] == 'Yes':
        treatments.append('Chemo')
    if row['hormone'] == 'Yes':
        treatments.append('Hormone')
    if row['radio'] == 'Yes':
        treatments.append('Radio')
    
    if not treatments:
        return 'None'
    return '+'.join(treatments)

survival_df['treatment_combo'] = survival_df.apply(get_treatment_combo, axis=1)

print("\nTreatment combinations:")
print(survival_df['treatment_combo'].value_counts())

In [None]:
# Visualize survival time distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Overall survival time
axes[0].hist(survival_df['duration'], bins=50, edgecolor='black', alpha=0.7)
axes[0].set_xlabel('Survival Time (months)', fontsize=12)
axes[0].set_ylabel('Frequency', fontsize=12)
axes[0].set_title('Distribution of Survival Time', fontsize=14, fontweight='bold')
axes[0].grid(True, alpha=0.3)

# Survival time by event status
deceased = survival_df[survival_df['event'] == 1]['duration']
alive = survival_df[survival_df['event'] == 0]['duration']

axes[1].hist([deceased, alive], bins=50, label=['Deceased', 'Alive'], 
            edgecolor='black', alpha=0.7)
axes[1].set_xlabel('Survival Time (months)', fontsize=12)
axes[1].set_ylabel('Frequency', fontsize=12)
axes[1].set_title('Survival Time by Status', fontsize=14, fontweight='bold')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 3. Kaplan-Meier Analysis: Overall Survival

In [None]:
# Overall survival curve
kmf = KaplanMeierFitter()
kmf.fit(survival_df['duration'], survival_df['event'], label='All Patients')

fig, ax = plt.subplots(figsize=(12, 6))
kmf.plot_survival_function(ax=ax, ci_show=True)
plt.title('Kaplan-Meier Survival Curve: METABRIC Breast Cancer Patients', 
         fontsize=14, fontweight='bold')
plt.xlabel('Time (months)', fontsize=12)
plt.ylabel('Survival Probability', fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(f"Median survival time: {kmf.median_survival_time_:.1f} months")
print(f"\nSurvival probabilities:")
for years in [1, 3, 5, 10]:
    months = years * 12
    prob = kmf.predict(months)
    print(f"  {years}-year survival: {prob:.1%}")

## 4. Treatment Comparison: Chemotherapy vs No Chemotherapy

In [None]:
# Compare chemotherapy vs no chemotherapy
fig, ax = plt.subplots(figsize=(12, 6))

chemo_groups = survival_df['chemo'].dropna().unique()
for group in chemo_groups:
    mask = survival_df['chemo'] == group
    kmf_chemo = KaplanMeierFitter()
    kmf_chemo.fit(survival_df[mask]['duration'], 
                  survival_df[mask]['event'], 
                  label=f'Chemotherapy: {group}')
    kmf_chemo.plot_survival_function(ax=ax, ci_show=True)

plt.title('Survival Curves: Chemotherapy vs No Chemotherapy', fontsize=14, fontweight='bold')
plt.xlabel('Time (months)', fontsize=12)
plt.ylabel('Survival Probability', fontsize=12)
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Log-rank test
chemo_yes = survival_df[survival_df['chemo'] == 'Yes']
chemo_no = survival_df[survival_df['chemo'] == 'No']

result = logrank_test(
    chemo_yes['duration'], chemo_no['duration'],
    chemo_yes['event'], chemo_no['event']
)

print(f"\nLog-rank test (Chemotherapy):")
print(f"Test statistic: {result.test_statistic:.4f}")
print(f"p-value: {result.p_value:.4f}")
print(f"Significant difference: {'Yes' if result.p_value < 0.05 else 'No'}")

## 5. Treatment Comparison: Hormone Therapy vs No Hormone Therapy

In [None]:
# Compare hormone therapy vs no hormone therapy
fig, ax = plt.subplots(figsize=(12, 6))

hormone_groups = survival_df['hormone'].dropna().unique()
for group in hormone_groups:
    mask = survival_df['hormone'] == group
    kmf_hormone = KaplanMeierFitter()
    kmf_hormone.fit(survival_df[mask]['duration'], 
                    survival_df[mask]['event'], 
                    label=f'Hormone Therapy: {group}')
    kmf_hormone.plot_survival_function(ax=ax, ci_show=True)

plt.title('Survival Curves: Hormone Therapy vs No Hormone Therapy', fontsize=14, fontweight='bold')
plt.xlabel('Time (months)', fontsize=12)
plt.ylabel('Survival Probability', fontsize=12)
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Log-rank test
hormone_yes = survival_df[survival_df['hormone'] == 'Yes']
hormone_no = survival_df[survival_df['hormone'] == 'No']

result_hormone = logrank_test(
    hormone_yes['duration'], hormone_no['duration'],
    hormone_yes['event'], hormone_no['event']
)

print(f"\nLog-rank test (Hormone Therapy):")
print(f"Test statistic: {result_hormone.test_statistic:.4f}")
print(f"p-value: {result_hormone.p_value:.4f}")
print(f"Significant difference: {'Yes' if result_hormone.p_value < 0.05 else 'No'}")

## 6. Treatment Combination Analysis

In [None]:
# Compare major treatment combinations
fig, ax = plt.subplots(figsize=(14, 7))

# Select most common treatment combinations
top_combos = survival_df['treatment_combo'].value_counts().head(6).index

for combo in top_combos:
    mask = survival_df['treatment_combo'] == combo
    if mask.sum() > 10:  # Only plot if sufficient sample size
        kmf_combo = KaplanMeierFitter()
        kmf_combo.fit(survival_df[mask]['duration'], 
                      survival_df[mask]['event'], 
                      label=f'{combo} (n={mask.sum()})')
        kmf_combo.plot_survival_function(ax=ax, ci_show=False)

plt.title('Survival Curves by Treatment Combination', fontsize=14, fontweight='bold')
plt.xlabel('Time (months)', fontsize=12)
plt.ylabel('Survival Probability', fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Pairwise log-rank tests for top combinations
print("\nPairwise Log-Rank Tests for Treatment Combinations:")
combo_data = survival_df[survival_df['treatment_combo'].isin(top_combos)]
result_combos = pairwise_logrank_test(
    combo_data['duration'],
    combo_data['treatment_combo'],
    combo_data['event']
)
print(result_combos.summary)

## 7. Stratified Analysis: By ER Status

In [None]:
# Analyze treatment effects stratified by ER status
if 'ER_STATUS' in survival_df.columns:
    fig, axes = plt.subplots(1, 2, figsize=(16, 6))
    
    # ER Positive patients
    er_pos = survival_df[survival_df['ER_STATUS'] == 'Positive']
    for group in ['Yes', 'No']:
        mask = er_pos['hormone'] == group
        if mask.sum() > 10:
            kmf_er = KaplanMeierFitter()
            kmf_er.fit(er_pos[mask]['duration'], 
                      er_pos[mask]['event'], 
                      label=f'Hormone: {group}')
            kmf_er.plot_survival_function(ax=axes[0], ci_show=False)
    
    axes[0].set_title('ER Positive: Hormone Therapy Effect', fontsize=14, fontweight='bold')
    axes[0].set_xlabel('Time (months)', fontsize=12)
    axes[0].set_ylabel('Survival Probability', fontsize=12)
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)
    
    # ER Negative patients
    er_neg = survival_df[survival_df['ER_STATUS'] == 'Negative']
    for group in ['Yes', 'No']:
        mask = er_neg['chemo'] == group
        if mask.sum() > 10:
            kmf_er = KaplanMeierFitter()
            kmf_er.fit(er_neg[mask]['duration'], 
                      er_neg[mask]['event'], 
                      label=f'Chemo: {group}')
            kmf_er.plot_survival_function(ax=axes[1], ci_show=False)
    
    axes[1].set_title('ER Negative: Chemotherapy Effect', fontsize=14, fontweight='bold')
    axes[1].set_xlabel('Time (months)', fontsize=12)
    axes[1].set_ylabel('Survival Probability', fontsize=12)
    axes[1].legend()
    axes[1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

## 8. Cox Proportional Hazards Model

In [None]:
# Prepare features for Cox model
numeric_features = ['AGE_AT_DIAGNOSIS', 'TUMOR_SIZE', 'LYMPH_NODES_EXAMINED_POSITIVE', 'NPI']
categorical_features = ['TUMOR_STAGE', 'ER_STATUS', 'HER2_STATUS', 'PR_STATUS', 
                       'chemo', 'hormone', 'radio']

# Select available features
available_numeric = [f for f in numeric_features if f in survival_df.columns]
available_categorical = [f for f in categorical_features if f in survival_df.columns]

cox_features = available_numeric + available_categorical

# Create Cox dataset
cox_df = survival_df[cox_features + ['duration', 'event']].copy()

# Handle missing values for numeric features
for col in available_numeric:
    cox_df[col] = pd.to_numeric(cox_df[col], errors='coerce')
    cox_df[col] = cox_df[col].fillna(cox_df[col].median())

# Encode categorical variables
cox_df_encoded = pd.get_dummies(cox_df, columns=available_categorical, drop_first=True)

# Remove remaining missing values
cox_df_encoded = cox_df_encoded.dropna()

print(f"Cox dataset shape: {cox_df_encoded.shape}")
print(f"Features: {cox_df_encoded.shape[1] - 2}")

In [None]:
# Fit Cox model
cph = CoxPHFitter(penalizer=0.1)
cph.fit(cox_df_encoded, duration_col='duration', event_col='event')

print("Cox Proportional Hazards Model Summary:")
print(f"Concordance Index: {cph.concordance_index_:.4f}")
print(f"Log-likelihood: {cph.log_likelihood_:.4f}")
print(f"AIC: {cph.AIC_:.4f}")

In [None]:
# Display model results
summary = cph.summary
summary['hazard_ratio'] = np.exp(summary['coef'])
summary_sorted = summary.sort_values('p', ascending=True)

print("\nSignificant Factors (p < 0.05):")
significant = summary_sorted[summary_sorted['p'] < 0.05]
print(significant[['coef', 'hazard_ratio', 'p']].to_string())

In [None]:
# Visualize treatment effects from Cox model
treatment_effects = summary[summary.index.str.contains('chemo|hormone|radio')]

if len(treatment_effects) > 0:
    fig, ax = plt.subplots(figsize=(10, 6))
    
    y_pos = np.arange(len(treatment_effects))
    hazard_ratios = treatment_effects['hazard_ratio'].values
    labels = treatment_effects.index
    
    colors = ['red' if hr > 1 else 'green' for hr in hazard_ratios]
    ax.barh(y_pos, hazard_ratios - 1, color=colors, alpha=0.6, edgecolor='black')
    ax.axvline(0, color='black', linestyle='--', linewidth=2)
    ax.set_yticks(y_pos)
    ax.set_yticklabels(labels)
    ax.set_xlabel('Hazard Ratio - 1', fontsize=12)
    ax.set_title('Treatment Effects on Mortality Risk\n(Red: Increases risk, Green: Decreases risk)', 
                fontsize=14, fontweight='bold')
    plt.grid(True, alpha=0.3, axis='x')
    plt.tight_layout()
    plt.show()

## 9. Visualize All Prognostic Factors

In [None]:
# Forest plot of hazard ratios
fig, ax = plt.subplots(figsize=(10, 12))

y_pos = np.arange(len(significant))
hazard_ratios = significant['hazard_ratio'].values
ci_lower = np.exp(significant['coef'] - 1.96 * significant['se(coef)'])
ci_upper = np.exp(significant['coef'] + 1.96 * significant['se(coef)'])
labels = [label[:50] for label in significant.index]

# Plot points and error bars
colors = ['red' if hr > 1 else 'green' for hr in hazard_ratios]
ax.scatter(hazard_ratios, y_pos, c=colors, s=100, alpha=0.6, edgecolors='black', zorder=3)

for i, (hr, lower, upper) in enumerate(zip(hazard_ratios, ci_lower, ci_upper)):
    ax.plot([lower, upper], [i, i], 'k-', linewidth=1.5, zorder=2)

ax.axvline(1, color='black', linestyle='--', linewidth=2, zorder=1)
ax.set_yticks(y_pos)
ax.set_yticklabels(labels, fontsize=9)
ax.set_xlabel('Hazard Ratio (95% CI)', fontsize=12)
ax.set_title('Forest Plot: Prognostic Factors for Breast Cancer Survival', 
            fontsize=14, fontweight='bold')
ax.set_xscale('log')
plt.grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.show()

## 10. Risk Stratification by Treatment

In [None]:
# Calculate risk scores
risk_scores = cph.predict_partial_hazard(cox_df_encoded)

# Create risk groups
risk_terciles = np.percentile(risk_scores, [33, 67])
risk_groups = pd.cut(risk_scores, 
                     bins=[0, risk_terciles[0], risk_terciles[1], np.inf],
                     labels=['Low Risk', 'Medium Risk', 'High Risk'])

cox_df_encoded['risk_group'] = risk_groups

# Plot survival by risk group
fig, ax = plt.subplots(figsize=(12, 6))

for group in ['Low Risk', 'Medium Risk', 'High Risk']:
    mask = cox_df_encoded['risk_group'] == group
    kmf_risk = KaplanMeierFitter()
    kmf_risk.fit(cox_df_encoded[mask]['duration'], 
                 cox_df_encoded[mask]['event'], 
                 label=group)
    kmf_risk.plot_survival_function(ax=ax, ci_show=False)

plt.title('Survival Curves by Risk Stratification', fontsize=14, fontweight='bold')
plt.xlabel('Time (months)', fontsize=12)
plt.ylabel('Survival Probability', fontsize=12)
plt.legend(title='Risk Group')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("\nRisk Group Distribution:")
print(cox_df_encoded['risk_group'].value_counts())

print("\nMortality rates by risk group:")
mortality_by_risk = cox_df_encoded.groupby('risk_group')['event'].agg(['mean', 'count'])
print(mortality_by_risk)

## 11. Treatment Effect by Risk Group

In [None]:
# Analyze treatment effect in different risk groups
# Link back to original data
cox_df_encoded['original_idx'] = cox_df_encoded.index
merged_data = cox_df_encoded.merge(
    survival_df[['chemo', 'hormone']],
    left_on='original_idx',
    right_index=True,
    how='left'
)

fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# High-risk patients: Chemotherapy effect
high_risk = merged_data[merged_data['risk_group'] == 'High Risk']
for chemo_status in ['Yes', 'No']:
    mask = high_risk['chemo'] == chemo_status
    if mask.sum() > 10:
        kmf_hr = KaplanMeierFitter()
        kmf_hr.fit(high_risk[mask]['duration'], 
                  high_risk[mask]['event'], 
                  label=f'Chemo: {chemo_status}')
        kmf_hr.plot_survival_function(ax=axes[0], ci_show=False)

axes[0].set_title('High-Risk Patients: Chemotherapy Effect', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Time (months)', fontsize=12)
axes[0].set_ylabel('Survival Probability', fontsize=12)
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Medium/High-risk patients: Hormone therapy effect
med_high = merged_data[merged_data['risk_group'].isin(['Medium Risk', 'High Risk'])]
for hormone_status in ['Yes', 'No']:
    mask = med_high['hormone'] == hormone_status
    if mask.sum() > 10:
        kmf_mh = KaplanMeierFitter()
        kmf_mh.fit(med_high[mask]['duration'], 
                   med_high[mask]['event'], 
                   label=f'Hormone: {hormone_status}')
        kmf_mh.plot_survival_function(ax=axes[1], ci_show=False)

axes[1].set_title('Medium/High-Risk Patients: Hormone Therapy Effect', 
                 fontsize=14, fontweight='bold')
axes[1].set_xlabel('Time (months)', fontsize=12)
axes[1].set_ylabel('Survival Probability', fontsize=12)
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 12. Key Findings and Clinical Recommendations

In [None]:
print("=" * 80)
print("KEY FINDINGS: METABRIC BREAST CANCER TREATMENT ANALYSIS")
print("=" * 80)

print(f"\n1. Overall Survival Statistics:")
print(f"   - Total patients: {len(survival_df)}")
print(f"   - Median survival: {kmf.median_survival_time_:.1f} months")
print(f"   - 5-year survival: {kmf.predict(60):.1%}")
print(f"   - 10-year survival: {kmf.predict(120):.1%}")

print(f"\n2. Treatment Effects (Adjusted for Confounders):")
if len(treatment_effects) > 0:
    for idx, (name, row) in enumerate(treatment_effects.iterrows(), 1):
        hr = row['hazard_ratio']
        p = row['p']
        effect = "protective" if hr < 1 else "increases risk"
        sig = "***" if p < 0.001 else "**" if p < 0.01 else "*" if p < 0.05 else "n.s."
        print(f"   {idx}. {name}: HR={hr:.3f} ({effect}) {sig}")

print(f"\n3. Log-Rank Test Results:")
print(f"   - Chemotherapy: p={result.p_value:.4f} ({'Significant' if result.p_value < 0.05 else 'Not significant'})")
print(f"   - Hormone Therapy: p={result_hormone.p_value:.4f} ({'Significant' if result_hormone.p_value < 0.05 else 'Not significant'})")

print(f"\n4. Top Prognostic Factors:")
for idx, (factor, row) in enumerate(significant.head(5).iterrows(), 1):
    print(f"   {idx}. {factor}: HR={row['hazard_ratio']:.3f} (p={row['p']:.4f})")

print(f"\n5. Risk Stratification:")
for group in ['Low Risk', 'Medium Risk', 'High Risk']:
    mask = cox_df_encoded['risk_group'] == group
    mortality = cox_df_encoded[mask]['event'].mean()
    print(f"   {group}: {mortality:.1%} mortality rate")

print("\n" + "=" * 80)
print("CLINICAL RECOMMENDATIONS")
print("=" * 80)

print("\n1. TREATMENT SELECTION:")
print("   - Hormone therapy shows benefit in ER+ patients (standard of care)")
print("   - Chemotherapy benefit varies by risk stratification")
print("   - Combination therapy may be optimal for high-risk patients")
print("   - Consider molecular subtypes when making treatment decisions")

print("\n2. RISK-ADAPTED STRATEGIES:")
print("   - Low-risk: Consider hormone therapy alone if ER+")
print("   - Medium-risk: Evaluate chemotherapy benefit using genomic tests")
print("   - High-risk: Aggressive multi-modal therapy recommended")
print("   - Use Cox model scores to refine risk assessment")

print("\n3. MONITORING & FOLLOW-UP:")
print("   - Critical period: First 60 months (5 years)")
print("   - High-risk patients: More frequent surveillance")
print("   - Monitor for late recurrence even after 5 years")
print("   - Tailor follow-up intensity to individual risk")

print("\n4. PERSONALIZED MEDICINE:")
print("   - Integrate tumor characteristics (ER, PR, HER2 status)")
print("   - Consider age, tumor stage, and nodal status")
print("   - Use survival models for shared decision-making")
print("   - Balance treatment efficacy with quality of life")

print("\n5. RESEARCH PRIORITIES:")
print("   - Identify patients who can safely avoid chemotherapy")
print("   - Develop predictive biomarkers for treatment response")
print("   - Study long-term effects beyond 10 years")
print("   - Investigate optimal duration of hormone therapy")
print("=" * 80)

## Next Steps

1. **Genomic integration**: Incorporate gene expression data and PAM50 subtypes
2. **Treatment interactions**: Test statistical interactions between treatments and biomarkers
3. **Time-varying effects**: Check if treatment effects change over time
4. **Competing risks**: Model cancer-specific vs other-cause mortality
5. **Machine learning**: Random Survival Forests for non-linear relationships
6. **External validation**: Test model on independent datasets (TCGA, SEER)
7. **Causal inference**: Use propensity scores or instrumental variables
8. **Quality of life**: Incorporate toxicity and QoL outcomes