# Clinical Correlations in Antimicrobial Resistance

**Research Question 4:** How do patient demographics and clinical factors correlate with antimicrobial resistance patterns? What are the age-specific and gender-specific treatment considerations?

This analysis examines the relationship between patient characteristics (age, gender, sample type) and resistance patterns to inform targeted antimicrobial stewardship and optimize empiric therapy selection for specific patient populations.

## Clinical Context

Understanding resistance patterns across different patient demographics is essential for:
- Age-appropriate empiric therapy selection
- Risk stratification for MDR organisms
- Targeted antimicrobial stewardship interventions
- Optimization of empiric protocols for specific clinical scenarios

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.stats import chi2_contingency, fisher_exact, mannwhitneyu, kruskal
from statsmodels.stats.proportion import proportion_confint
from statsmodels.stats.multitest import multipletests
import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
pd.set_option('display.precision', 2)
%matplotlib inline
sns.set_style('whitegrid')
sns.set_palette('Set2')

## 1. Data Loading and Preparation

In [None]:
# Load cleaned data
df = pd.read_csv('../data/processed/amr_data_2025_cleaned.csv')

# Identify antibiotic columns
antibiotic_cols = [col for col in df.columns if ' - ' in col or col.startswith('NET_') or col.startswith('MET_')]

# Categorize antibiotic results
def categorize_result(result):
    if pd.isna(result):
        return np.nan
    result_str = str(result).upper()
    if 'R' in result_str:
        return 'Resistant'
    elif 'S' in result_str:
        return 'Sensitive'
    elif 'I' in result_str:
        return 'Intermediate'
    return np.nan

for col in antibiotic_cols:
    df[col + '_Cat'] = df[col].apply(categorize_result)

# Clean age data - remove invalid ages
df_clean = df[df['Age (years)'] >= 0].copy()

# Create clinically relevant age groups
df_clean['Age_Group'] = pd.cut(df_clean['Age (years)'], 
                                bins=[0, 18, 35, 50, 65, 100],
                                labels=['Pediatric (<18)', 'Young Adult (18-35)', 
                                       'Middle Age (36-50)', 'Older Adult (51-65)', 
                                       'Elderly (>65)'])

print(f"Dataset loaded: {len(df_clean)} isolates with valid age data")
print(f"Antibiotics analyzed: {len(antibiotic_cols)}")
print(f"\nAge group distribution:")
for group, count in df_clean['Age_Group'].value_counts().sort_index().items():
    pct = count/len(df_clean)*100
    print(f"  {group}: {count} ({pct:.1f}%)")

print(f"\nGender distribution:")
for gender, count in df_clean['Gender'].value_counts().items():
    pct = count/len(df_clean)*100
    print(f"  {gender}: {count} ({pct:.1f}%)")

print(f"\nSample type distribution:")
for sample, count in df_clean['Sample Type'].value_counts().items():
    pct = count/len(df_clean)*100
    print(f"  {sample}: {count} ({pct:.1f}%)")

## 2. Overall Resistance by Age Group

### Clinical Hypothesis
Age-related differences in resistance may result from:
- Prior antibiotic exposure patterns
- Healthcare contact frequency
- Comorbidity burden
- Immune system differences
- Institutional exposure (nursing homes, hospitals)

In [None]:
# Calculate overall resistance rate by age group
age_resistance = []

for age_group in df_clean['Age_Group'].cat.categories:
    df_age = df_clean[df_clean['Age_Group'] == age_group]
    
    # Count all resistance results for this age group
    total_tests = 0
    resistant_tests = 0
    
    for col in antibiotic_cols:
        cat_col = col + '_Cat'
        total_tests += df_age[cat_col].notna().sum()
        resistant_tests += (df_age[cat_col] == 'Resistant').sum()
    
    if total_tests > 0:
        resistance_rate = (resistant_tests / total_tests) * 100
        ci_low, ci_high = proportion_confint(resistant_tests, total_tests, alpha=0.05, method='wilson')
        
        age_resistance.append({
            'Age_Group': age_group,
            'N_Patients': len(df_age),
            'Total_Tests': total_tests,
            'Resistant_Tests': resistant_tests,
            'Resistance_Rate': resistance_rate,
            'CI_Low': ci_low * 100,
            'CI_High': ci_high * 100
        })

age_res_df = pd.DataFrame(age_resistance)

print("Overall Resistance Rate by Age Group:")
print("=" * 90)
print(age_res_df.to_string(index=False))
print("\n95% CI = 95% Confidence Interval (Wilson method)")

In [None]:
# Visualize resistance by age group with confidence intervals
fig, ax = plt.subplots(figsize=(12, 7))

x_pos = np.arange(len(age_res_df))
bars = ax.bar(x_pos, age_res_df['Resistance_Rate'], alpha=0.7, color='steelblue')

# Add error bars (95% CI)
yerr_lower = age_res_df['Resistance_Rate'] - age_res_df['CI_Low']
yerr_upper = age_res_df['CI_High'] - age_res_df['Resistance_Rate']
ax.errorbar(x_pos, age_res_df['Resistance_Rate'], 
            yerr=[yerr_lower, yerr_upper], 
            fmt='none', ecolor='black', capsize=5, capthick=2)

ax.set_xlabel('Age Group', fontsize=12, fontweight='bold')
ax.set_ylabel('Overall Resistance Rate (%)', fontsize=12, fontweight='bold')
ax.set_title('Antimicrobial Resistance Rates Across Age Groups\n(95% Confidence Intervals)', 
             fontsize=14, fontweight='bold')
ax.set_xticks(x_pos)
ax.set_xticklabels(age_res_df['Age_Group'], rotation=45, ha='right')

# Add value labels
for i, (rate, n) in enumerate(zip(age_res_df['Resistance_Rate'], age_res_df['N_Patients'])):
    ax.text(i, rate + 2, f'{rate:.1f}%\n(n={n})', ha='center', fontsize=10, fontweight='bold')

# Add reference line for overall mean
overall_mean = (age_res_df['Resistant_Tests'].sum() / age_res_df['Total_Tests'].sum()) * 100
ax.axhline(overall_mean, color='red', linestyle='--', alpha=0.7, label=f'Overall Mean: {overall_mean:.1f}%')
ax.legend()

plt.tight_layout()
plt.savefig('../reports/figures/resistance_by_age_group.png', dpi=300, bbox_inches='tight')
plt.show()

print(f"\nOverall mean resistance rate: {overall_mean:.1f}%")

### Statistical Test: Age Group Comparison

**Null Hypothesis:** Resistance rates are equal across all age groups

**Test:** Chi-square test for independence

In [None]:
# Create contingency table for chi-square test
contingency_age = pd.DataFrame({
    'Age_Group': age_res_df['Age_Group'],
    'Resistant': age_res_df['Resistant_Tests'],
    'Non_Resistant': age_res_df['Total_Tests'] - age_res_df['Resistant_Tests']
})

# Prepare contingency table
contingency_matrix = contingency_age[['Resistant', 'Non_Resistant']].values

# Perform chi-square test
chi2, p_value, dof, expected = chi2_contingency(contingency_matrix)

print("Chi-Square Test for Independence: Resistance by Age Group")
print("=" * 70)
print(f"Chi-square statistic: {chi2:.4f}")
print(f"Degrees of freedom: {dof}")
print(f"P-value: {p_value:.4f}")
print(f"\nInterpretation (α = 0.05):")
if p_value < 0.05:
    print(f"  SIGNIFICANT - Reject null hypothesis")
    print(f"  There ARE statistically significant differences in resistance rates across age groups")
    print(f"  Clinical implication: Age-stratified empiric therapy protocols are justified")
else:
    print(f"  NOT SIGNIFICANT - Fail to reject null hypothesis")
    print(f"  No statistically significant differences in resistance rates across age groups")
    print(f"  Clinical implication: Uniform empiric protocols may be appropriate")

# Effect size (Cramer's V)
n = contingency_matrix.sum()
cramers_v = np.sqrt(chi2 / (n * (min(contingency_matrix.shape) - 1)))
print(f"\nCramer's V (effect size): {cramers_v:.4f}")
if cramers_v < 0.1:
    print(f"  Effect size: SMALL")
elif cramers_v < 0.3:
    print(f"  Effect size: MEDIUM")
else:
    print(f"  Effect size: LARGE")

## 3. Key Antibiotics by Age Group

Analysis of first-line and commonly used antibiotics stratified by age

In [None]:
# Select key antibiotics for detailed analysis
key_antibiotics = [
    'CIP - Ciprofloxacin',
    'LEV - Levofloxacin',
    'NIT - Nitrofurantoin',
    'SXT - Trimethoprim/Sulfamethoxazole',
    'AMP - Ampicillin',
    'CTX - Cefotaxime',
    'CAZ - Ceftazidime',
    'GEN - Gentamicin',
    'AMK - Amikacin'
]

# Filter to antibiotics that exist in dataset
key_abs_available = [ab for ab in key_antibiotics if ab in antibiotic_cols]

print(f"Analyzing {len(key_abs_available)} key antibiotics across age groups")
print(f"\nAntibiotics included: {', '.join([ab.split(' - ')[1] for ab in key_abs_available])}")

# Calculate resistance by age group for each antibiotic
age_ab_data = []

for ab in key_abs_available:
    cat_col = ab + '_Cat'
    ab_name = ab.split(' - ')[1] if ' - ' in ab else ab
    
    for age_group in df_clean['Age_Group'].cat.categories:
        df_age = df_clean[df_clean['Age_Group'] == age_group]
        
        total = df_age[cat_col].notna().sum()
        if total < 5:  # Minimum sample size
            continue
        
        resistant = (df_age[cat_col] == 'Resistant').sum()
        resistance_rate = (resistant / total) * 100
        
        age_ab_data.append({
            'Antibiotic': ab_name,
            'Age_Group': age_group,
            'N': total,
            'Resistant': resistant,
            'Resistance_Rate': resistance_rate
        })

age_ab_df = pd.DataFrame(age_ab_data)

print(f"\nGenerated {len(age_ab_df)} age-antibiotic combinations with ≥5 tests")

In [None]:
# Create heatmap of resistance rates by age and antibiotic
pivot_data = age_ab_df.pivot(index='Antibiotic', columns='Age_Group', values='Resistance_Rate')

fig, ax = plt.subplots(figsize=(14, 8))
sns.heatmap(pivot_data, annot=True, fmt='.1f', cmap='RdYlGn_r', 
            cbar_kws={'label': 'Resistance Rate (%)'}, 
            vmin=0, vmax=100, ax=ax, linewidths=0.5)
ax.set_title('Antimicrobial Resistance Rates by Age Group and Antibiotic', 
             fontsize=14, fontweight='bold', pad=20)
ax.set_xlabel('Age Group', fontsize=12, fontweight='bold')
ax.set_ylabel('Antibiotic', fontsize=12, fontweight='bold')
plt.tight_layout()
plt.savefig('../reports/figures/resistance_heatmap_age_antibiotic.png', dpi=300, bbox_inches='tight')
plt.show()

print("\nClinical Insights from Heatmap:")
print("  - Darker red: High resistance (avoid empirically)")
print("  - Yellow/Orange: Moderate resistance (use with caution)")
print("  - Green: Low resistance (preferred empiric options)")

In [None]:
# Identify antibiotics with significant age-related differences
print("\nAntibiotics with Notable Age-Related Resistance Patterns:")
print("=" * 90)

for ab_name in age_ab_df['Antibiotic'].unique():
    ab_data = age_ab_df[age_ab_df['Antibiotic'] == ab_name].sort_values('Age_Group')
    
    if len(ab_data) >= 3:  # Need at least 3 age groups for meaningful comparison
        min_res = ab_data['Resistance_Rate'].min()
        max_res = ab_data['Resistance_Rate'].max()
        difference = max_res - min_res
        
        if difference >= 20:  # Clinically significant difference
            min_group = ab_data[ab_data['Resistance_Rate'] == min_res]['Age_Group'].values[0]
            max_group = ab_data[ab_data['Resistance_Rate'] == max_res]['Age_Group'].values[0]
            
            print(f"\n{ab_name}:")
            print(f"  Lowest resistance: {min_res:.1f}% in {min_group}")
            print(f"  Highest resistance: {max_res:.1f}% in {max_group}")
            print(f"  Difference: {difference:.1f} percentage points")
            print(f"  Clinical relevance: Age-specific prescribing may be warranted")

## 4. Gender-Based Resistance Patterns

### Clinical Rationale
Gender differences in resistance may reflect:
- Anatomical factors (e.g., UTI incidence)
- Healthcare-seeking behavior
- Prior antibiotic exposure patterns
- Hormonal influences on microbiome

In [None]:
# Calculate overall resistance by gender
gender_resistance = []

for gender in df_clean['Gender'].unique():
    df_gender = df_clean[df_clean['Gender'] == gender]
    
    total_tests = 0
    resistant_tests = 0
    
    for col in antibiotic_cols:
        cat_col = col + '_Cat'
        total_tests += df_gender[cat_col].notna().sum()
        resistant_tests += (df_gender[cat_col] == 'Resistant').sum()
    
    if total_tests > 0:
        resistance_rate = (resistant_tests / total_tests) * 100
        ci_low, ci_high = proportion_confint(resistant_tests, total_tests, alpha=0.05, method='wilson')
        
        gender_resistance.append({
            'Gender': gender,
            'N_Patients': len(df_gender),
            'Total_Tests': total_tests,
            'Resistant_Tests': resistant_tests,
            'Resistance_Rate': resistance_rate,
            'CI_Low': ci_low * 100,
            'CI_High': ci_high * 100
        })

gender_res_df = pd.DataFrame(gender_resistance)

print("Overall Resistance Rate by Gender:")
print("=" * 90)
print(gender_res_df.to_string(index=False))

In [None]:
# Visualize gender differences
fig, ax = plt.subplots(figsize=(10, 7))

x_pos = np.arange(len(gender_res_df))
colors = ['#ff9999' if g == 'Female' else '#9999ff' for g in gender_res_df['Gender']]
bars = ax.bar(x_pos, gender_res_df['Resistance_Rate'], alpha=0.7, color=colors)

# Add error bars
yerr_lower = gender_res_df['Resistance_Rate'] - gender_res_df['CI_Low']
yerr_upper = gender_res_df['CI_High'] - gender_res_df['Resistance_Rate']
ax.errorbar(x_pos, gender_res_df['Resistance_Rate'], 
            yerr=[yerr_lower, yerr_upper], 
            fmt='none', ecolor='black', capsize=10, capthick=2)

ax.set_xlabel('Gender', fontsize=12, fontweight='bold')
ax.set_ylabel('Overall Resistance Rate (%)', fontsize=12, fontweight='bold')
ax.set_title('Antimicrobial Resistance Rates by Gender\n(95% Confidence Intervals)', 
             fontsize=14, fontweight='bold')
ax.set_xticks(x_pos)
ax.set_xticklabels(gender_res_df['Gender'])

# Add value labels
for i, (rate, n, gender) in enumerate(zip(gender_res_df['Resistance_Rate'], 
                                            gender_res_df['N_Patients'],
                                            gender_res_df['Gender'])):
    ax.text(i, rate + 2, f'{rate:.1f}%\n(n={n})', ha='center', fontsize=11, fontweight='bold')

plt.tight_layout()
plt.savefig('../reports/figures/resistance_by_gender.png', dpi=300, bbox_inches='tight')
plt.show()

### Statistical Test: Gender Comparison

In [None]:
# Chi-square test for gender differences
if len(gender_res_df) == 2:
    contingency_gender = gender_res_df[['Resistant_Tests']].values.flatten()
    total_gender = gender_res_df[['Total_Tests']].values.flatten()
    non_resistant = total_gender - contingency_gender
    
    contingency_matrix_gender = np.array([contingency_gender, non_resistant]).T
    
    chi2_gender, p_value_gender, dof_gender, expected_gender = chi2_contingency(contingency_matrix_gender)
    
    print("Chi-Square Test: Resistance by Gender")
    print("=" * 70)
    print(f"Chi-square statistic: {chi2_gender:.4f}")
    print(f"P-value: {p_value_gender:.4f}")
    print(f"\nInterpretation (α = 0.05):")
    if p_value_gender < 0.05:
        print(f"  SIGNIFICANT - Gender-based differences in resistance detected")
        print(f"  Clinical implication: Consider gender when selecting empiric therapy")
    else:
        print(f"  NOT SIGNIFICANT - No statistically significant gender differences")
        print(f"  Clinical implication: Gender-neutral empiric protocols are appropriate")
    
    # Calculate relative risk
    female_data = gender_res_df[gender_res_df['Gender'] == 'Female']
    male_data = gender_res_df[gender_res_df['Gender'] == 'Male']
    
    if len(female_data) > 0 and len(male_data) > 0:
        female_rate = female_data['Resistance_Rate'].values[0]
        male_rate = male_data['Resistance_Rate'].values[0]
        
        if male_rate > 0:
            rr = female_rate / male_rate
            print(f"\nRelative Risk (Female vs Male): {rr:.2f}")
            if rr > 1:
                print(f"  Females have {rr:.2f}x the resistance rate of males")
            else:
                print(f"  Males have {1/rr:.2f}x the resistance rate of females")

## 5. Sample Type Correlations

Different sample types represent distinct clinical scenarios with varying resistance patterns

In [None]:
# Calculate resistance by sample type
sample_resistance = []

for sample_type in df_clean['Sample Type'].unique():
    df_sample = df_clean[df_clean['Sample Type'] == sample_type]
    
    # Minimum sample size for analysis
    if len(df_sample) < 10:
        continue
    
    total_tests = 0
    resistant_tests = 0
    
    for col in antibiotic_cols:
        cat_col = col + '_Cat'
        total_tests += df_sample[cat_col].notna().sum()
        resistant_tests += (df_sample[cat_col] == 'Resistant').sum()
    
    if total_tests > 0:
        resistance_rate = (resistant_tests / total_tests) * 100
        ci_low, ci_high = proportion_confint(resistant_tests, total_tests, alpha=0.05, method='wilson')
        
        sample_resistance.append({
            'Sample_Type': sample_type,
            'N_Samples': len(df_sample),
            'Total_Tests': total_tests,
            'Resistant_Tests': resistant_tests,
            'Resistance_Rate': resistance_rate,
            'CI_Low': ci_low * 100,
            'CI_High': ci_high * 100
        })

sample_res_df = pd.DataFrame(sample_resistance).sort_values('Resistance_Rate', ascending=False)

print("Resistance Rate by Sample Type:")
print("=" * 90)
print(sample_res_df.to_string(index=False))
print("\nClinical Note: Sample types with n<10 excluded from analysis")

In [None]:
# Visualize resistance by sample type
fig, ax = plt.subplots(figsize=(12, 7))

x_pos = np.arange(len(sample_res_df))
colors_sample = ['#e74c3c' if r >= 50 else '#f39c12' if r >= 30 else '#2ecc71' 
                 for r in sample_res_df['Resistance_Rate']]
bars = ax.bar(x_pos, sample_res_df['Resistance_Rate'], alpha=0.7, color=colors_sample)

# Add error bars
yerr_lower = sample_res_df['Resistance_Rate'] - sample_res_df['CI_Low']
yerr_upper = sample_res_df['CI_High'] - sample_res_df['Resistance_Rate']
ax.errorbar(x_pos, sample_res_df['Resistance_Rate'], 
            yerr=[yerr_lower, yerr_upper], 
            fmt='none', ecolor='black', capsize=8, capthick=2)

ax.set_xlabel('Sample Type', fontsize=12, fontweight='bold')
ax.set_ylabel('Resistance Rate (%)', fontsize=12, fontweight='bold')
ax.set_title('Antimicrobial Resistance by Sample Type\n(95% Confidence Intervals)', 
             fontsize=14, fontweight='bold')
ax.set_xticks(x_pos)
ax.set_xticklabels(sample_res_df['Sample_Type'], rotation=45, ha='right')

# Add value labels
for i, (rate, n) in enumerate(zip(sample_res_df['Resistance_Rate'], sample_res_df['N_Samples'])):
    ax.text(i, rate + 2, f'{rate:.1f}%\n(n={n})', ha='center', fontsize=10, fontweight='bold')

# Add threshold lines
ax.axhline(30, color='orange', linestyle='--', alpha=0.5, label='30% threshold')
ax.axhline(50, color='red', linestyle='--', alpha=0.5, label='50% threshold')
ax.legend()

plt.tight_layout()
plt.savefig('../reports/figures/resistance_by_sample_type.png', dpi=300, bbox_inches='tight')
plt.show()

print("\nClinical Interpretation:")
print("  Red bars (≥50%): High resistance - avoid empiric use of typical agents")
print("  Orange bars (30-49%): Moderate resistance - use with caution")
print("  Green bars (<30%): Low resistance - appropriate for empiric therapy")

### Clinical Context by Sample Type

Understanding sample-specific resistance patterns is crucial for:
- **Urine**: UTI empiric therapy (outpatient vs inpatient)
- **Swab**: Wound/skin infections, post-surgical infections
- **Sputum**: Respiratory infections, hospital-acquired pneumonia

In [None]:
# Key antibiotics by sample type for common clinical scenarios
print("\nKey Antibiotic Resistance by Sample Type (Clinical Decision Support)")
print("=" * 90)

for sample_type in sample_res_df['Sample_Type'].unique():
    print(f"\n{sample_type.upper()}:")
    print("-" * 90)
    
    df_sample = df_clean[df_clean['Sample Type'] == sample_type]
    
    sample_ab_data = []
    for ab in key_abs_available:
        cat_col = ab + '_Cat'
        ab_name = ab.split(' - ')[1] if ' - ' in ab else ab
        
        total = df_sample[cat_col].notna().sum()
        if total < 5:
            continue
        
        resistant = (df_sample[cat_col] == 'Resistant').sum()
        sensitive = (df_sample[cat_col] == 'Sensitive').sum()
        resistance_rate = (resistant / total) * 100
        sensitivity_rate = (sensitive / total) * 100
        
        sample_ab_data.append({
            'Antibiotic': ab_name,
            'N': total,
            'Sensitivity_Rate': sensitivity_rate,
            'Resistance_Rate': resistance_rate
        })
    
    if sample_ab_data:
        sample_ab_temp = pd.DataFrame(sample_ab_data).sort_values('Sensitivity_Rate', ascending=False)
        print(sample_ab_temp.head(5).to_string(index=False))
        
        # Recommend best options
        best_options = sample_ab_temp[sample_ab_temp['Sensitivity_Rate'] >= 70]['Antibiotic'].tolist()
        if best_options:
            print(f"\nRecommended empiric options (≥70% sensitive): {', '.join(best_options[:3])}")
        else:
            print(f"\nWARNING: No antibiotics with ≥70% sensitivity - culture-directed therapy essential")

## 6. Age-Specific Treatment Recommendations

Evidence-based recommendations for empiric therapy by age group

In [None]:
print("="*90)
print("AGE-SPECIFIC ANTIMICROBIAL TREATMENT RECOMMENDATIONS")
print("Based on Local Resistance Patterns - 2025 Data")
print("="*90)

for age_group in df_clean['Age_Group'].cat.categories:
    print(f"\n{'='*90}")
    print(f"{age_group}")
    print(f"{'='*90}")
    
    df_age = df_clean[df_clean['Age_Group'] == age_group]
    n_patients = len(df_age)
    
    print(f"\nPatient Population: n = {n_patients}")
    
    # Most common organisms in this age group
    top_organisms = df_age['Organism Identified'].value_counts().head(3)
    print(f"\nMost Common Organisms:")
    for org, count in top_organisms.items():
        pct = count/len(df_age)*100
        print(f"  - {org}: {count} ({pct:.1f}%)")
    
    # Calculate antibiotic sensitivity for this age group
    age_recommendations = []
    
    for ab in key_abs_available:
        cat_col = ab + '_Cat'
        ab_name = ab.split(' - ')[1] if ' - ' in ab else ab
        
        total = df_age[cat_col].notna().sum()
        if total < 5:
            continue
        
        sensitive = (df_age[cat_col] == 'Sensitive').sum()
        resistant = (df_age[cat_col] == 'Resistant').sum()
        sensitivity_rate = (sensitive / total) * 100
        resistance_rate = (resistant / total) * 100
        
        age_recommendations.append({
            'Antibiotic': ab_name,
            'Tests': total,
            'Sensitivity': sensitivity_rate,
            'Resistance': resistance_rate
        })
    
    if age_recommendations:
        age_rec_df = pd.DataFrame(age_recommendations).sort_values('Sensitivity', ascending=False)
        
        # First-line options (≥80% sensitive)
        first_line = age_rec_df[age_rec_df['Sensitivity'] >= 80]
        print(f"\nFIRST-LINE EMPIRIC OPTIONS (≥80% Sensitive):")
        if len(first_line) > 0:
            for _, row in first_line.iterrows():
                print(f"  ✓ {row['Antibiotic']}: {row['Sensitivity']:.1f}% sensitive (n={row['Tests']})")
        else:
            print("  None available - consider second-line agents")
        
        # Second-line options (60-79% sensitive)
        second_line = age_rec_df[(age_rec_df['Sensitivity'] >= 60) & (age_rec_df['Sensitivity'] < 80)]
        print(f"\nSECOND-LINE OPTIONS (60-79% Sensitive):")
        if len(second_line) > 0:
            for _, row in second_line.head(3).iterrows():
                print(f"  • {row['Antibiotic']}: {row['Sensitivity']:.1f}% sensitive (n={row['Tests']})")
        else:
            print("  Limited options available")
        
        # Antibiotics to avoid (<50% sensitive)
        avoid = age_rec_df[age_rec_df['Resistance'] >= 50]
        print(f"\nAVOID EMPIRICALLY (≥50% Resistant):")
        if len(avoid) > 0:
            for _, row in avoid.iterrows():
                print(f"  ✗ {row['Antibiotic']}: {row['Resistance']:.1f}% resistant (n={row['Tests']})")
        else:
            print("  None identified")
        
        # Clinical considerations specific to age group
        print(f"\nCLINICAL CONSIDERATIONS:")
        if 'Pediatric' in age_group:
            print("  - Weight-based dosing required")
            print("  - Avoid fluoroquinolones (risk of cartilage damage)")
            print("  - Consider liquid formulations for compliance")
            print("  - Parental education essential")
        elif 'Elderly' in age_group:
            print("  - Renal dose adjustment often required (check CrCl)")
            print("  - Higher risk of adverse drug reactions")
            print("  - Polypharmacy considerations (drug interactions)")
            print("  - Higher risk of C. difficile infection")
            print("  - Consider de-escalation after culture results")
        elif 'Young Adult' in age_group:
            print("  - Generally tolerate standard dosing")
            print("  - Consider compliance factors (once-daily preferred)")
            print("  - Reproductive age: pregnancy/lactation considerations")
        else:
            print("  - Standard adult dosing usually appropriate")
            print("  - Screen for comorbidities (diabetes, CKD, liver disease)")
            print("  - Obtain culture before antibiotics when possible")

print(f"\n\n{'='*90}")
print("GENERAL STEWARDSHIP PRINCIPLES")
print(f"{'='*90}")
print("""
1. Always obtain cultures before initiating antibiotics when clinically feasible
2. Narrow spectrum based on culture and sensitivity results (48-72 hours)
3. Use shortest effective duration (avoid prolonged courses)
4. Document indication, duration, and rationale in medical record
5. Consider local antibiograms and resistance trends
6. Involve infectious disease consultation for:
   - Multi-drug resistant organisms
   - Treatment failures
   - Immunocompromised hosts
   - Complicated infections
7. Re-evaluate need for antibiotics daily (antimicrobial time-out)
8. Patient education on compliance and completion of therapy
""")

## 7. Multi-Variable Analysis: Age, Gender, and Sample Type

In [None]:
# Combine age and gender for detailed analysis
print("Resistance Rates by Age Group and Gender")
print("="*90)

age_gender_data = []

for age_group in df_clean['Age_Group'].cat.categories:
    for gender in df_clean['Gender'].unique():
        df_subset = df_clean[(df_clean['Age_Group'] == age_group) & (df_clean['Gender'] == gender)]
        
        if len(df_subset) < 5:  # Minimum sample size
            continue
        
        total_tests = 0
        resistant_tests = 0
        
        for col in antibiotic_cols:
            cat_col = col + '_Cat'
            total_tests += df_subset[cat_col].notna().sum()
            resistant_tests += (df_subset[cat_col] == 'Resistant').sum()
        
        if total_tests > 0:
            resistance_rate = (resistant_tests / total_tests) * 100
            
            age_gender_data.append({
                'Age_Group': age_group,
                'Gender': gender,
                'N_Patients': len(df_subset),
                'Total_Tests': total_tests,
                'Resistance_Rate': resistance_rate
            })

age_gender_df = pd.DataFrame(age_gender_data)
print(age_gender_df.to_string(index=False))

In [None]:
# Visualize age-gender interaction
if len(age_gender_df) > 0:
    pivot_age_gender = age_gender_df.pivot(index='Age_Group', columns='Gender', values='Resistance_Rate')
    
    fig, ax = plt.subplots(figsize=(12, 7))
    
    x = np.arange(len(pivot_age_gender.index))
    width = 0.35
    
    if 'Female' in pivot_age_gender.columns and 'Male' in pivot_age_gender.columns:
        bars1 = ax.bar(x - width/2, pivot_age_gender['Female'], width, 
                       label='Female', alpha=0.8, color='#ff9999')
        bars2 = ax.bar(x + width/2, pivot_age_gender['Male'], width, 
                       label='Male', alpha=0.8, color='#9999ff')
        
        ax.set_xlabel('Age Group', fontsize=12, fontweight='bold')
        ax.set_ylabel('Resistance Rate (%)', fontsize=12, fontweight='bold')
        ax.set_title('Antimicrobial Resistance: Age and Gender Interaction', 
                     fontsize=14, fontweight='bold')
        ax.set_xticks(x)
        ax.set_xticklabels(pivot_age_gender.index, rotation=45, ha='right')
        ax.legend()
        
        # Add value labels
        for bars in [bars1, bars2]:
            for bar in bars:
                height = bar.get_height()
                if not np.isnan(height):
                    ax.text(bar.get_x() + bar.get_width()/2., height,
                           f'{height:.1f}%', ha='center', va='bottom', fontsize=9)
        
        plt.tight_layout()
        plt.savefig('../reports/figures/resistance_age_gender_interaction.png', dpi=300, bbox_inches='tight')
        plt.show()
    else:
        print("Insufficient data for age-gender interaction plot")

## 8. Risk Stratification for High Resistance

Identify patient populations at highest risk for resistant infections

In [None]:
# Calculate individual patient resistance burden
patient_resistance_scores = []

for idx, row in df_clean.iterrows():
    total_abs_tested = 0
    resistant_count = 0
    
    for col in antibiotic_cols:
        cat_col = col + '_Cat'
        if pd.notna(row[cat_col]):
            total_abs_tested += 1
            if row[cat_col] == 'Resistant':
                resistant_count += 1
    
    if total_abs_tested >= 5:  # Minimum antibiotics tested
        resistance_proportion = resistant_count / total_abs_tested
        
        patient_resistance_scores.append({
            'Age': row['Age (years)'],
            'Age_Group': row['Age_Group'],
            'Gender': row['Gender'],
            'Sample_Type': row['Sample Type'],
            'Organism': row['Organism Identified'],
            'Antibiotics_Tested': total_abs_tested,
            'Resistant_Count': resistant_count,
            'Resistance_Proportion': resistance_proportion,
            'High_Risk': resistance_proportion >= 0.5  # ≥50% resistant
        })

patient_risk_df = pd.DataFrame(patient_resistance_scores)

print("Patient-Level Risk Stratification")
print("="*90)
print(f"Total patients with ≥5 antibiotics tested: {len(patient_risk_df)}")

high_risk_count = patient_risk_df['High_Risk'].sum()
high_risk_pct = (high_risk_count / len(patient_risk_df)) * 100

print(f"\nHigh-risk patients (≥50% resistant): {high_risk_count} ({high_risk_pct:.1f}%)")
print(f"Low-risk patients (<50% resistant): {len(patient_risk_df) - high_risk_count} ({100-high_risk_pct:.1f}%)")

In [None]:
# Analyze risk factors for high resistance
print("\nRisk Factors Associated with High Resistance:")
print("="*90)

# Age group analysis
print("\nBy Age Group:")
age_risk = patient_risk_df.groupby('Age_Group')['High_Risk'].agg(['sum', 'count', 'mean'])
age_risk.columns = ['High_Risk_Count', 'Total', 'High_Risk_Proportion']
age_risk['High_Risk_Percentage'] = age_risk['High_Risk_Proportion'] * 100
age_risk = age_risk.sort_values('High_Risk_Percentage', ascending=False)
print(age_risk[['High_Risk_Count', 'Total', 'High_Risk_Percentage']].to_string())

# Gender analysis
print("\nBy Gender:")
gender_risk = patient_risk_df.groupby('Gender')['High_Risk'].agg(['sum', 'count', 'mean'])
gender_risk.columns = ['High_Risk_Count', 'Total', 'High_Risk_Proportion']
gender_risk['High_Risk_Percentage'] = gender_risk['High_Risk_Proportion'] * 100
print(gender_risk[['High_Risk_Count', 'Total', 'High_Risk_Percentage']].to_string())

# Sample type analysis
print("\nBy Sample Type:")
sample_risk = patient_risk_df.groupby('Sample_Type')['High_Risk'].agg(['sum', 'count', 'mean'])
sample_risk.columns = ['High_Risk_Count', 'Total', 'High_Risk_Proportion']
sample_risk['High_Risk_Percentage'] = sample_risk['High_Risk_Proportion'] * 100
sample_risk = sample_risk.sort_values('High_Risk_Percentage', ascending=False)
print(sample_risk[['High_Risk_Count', 'Total', 'High_Risk_Percentage']].to_string())

In [None]:
# Visualize resistance distribution across patients
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# 1. Distribution of resistance proportions
axes[0, 0].hist(patient_risk_df['Resistance_Proportion']*100, bins=20, 
                edgecolor='black', alpha=0.7, color='steelblue')
axes[0, 0].axvline(50, color='red', linestyle='--', linewidth=2, label='High-risk threshold')
axes[0, 0].set_xlabel('Resistance Proportion (%)', fontweight='bold')
axes[0, 0].set_ylabel('Number of Patients', fontweight='bold')
axes[0, 0].set_title('Distribution of Patient Resistance Burden', fontweight='bold')
axes[0, 0].legend()

# 2. High-risk by age group
age_risk_plot = age_risk.reset_index()
axes[0, 1].bar(range(len(age_risk_plot)), age_risk_plot['High_Risk_Percentage'], 
               alpha=0.7, color='coral')
axes[0, 1].set_xticks(range(len(age_risk_plot)))
axes[0, 1].set_xticklabels(age_risk_plot['Age_Group'], rotation=45, ha='right')
axes[0, 1].set_ylabel('High-Risk Patients (%)', fontweight='bold')
axes[0, 1].set_title('High-Risk Patients by Age Group', fontweight='bold')
for i, v in enumerate(age_risk_plot['High_Risk_Percentage']):
    axes[0, 1].text(i, v + 1, f'{v:.1f}%', ha='center', fontweight='bold')

# 3. Resistance by organism
organism_resistance = patient_risk_df.groupby('Organism')['Resistance_Proportion'].mean().sort_values(ascending=False).head(8) * 100
axes[1, 0].barh(range(len(organism_resistance)), organism_resistance.values, alpha=0.7, color='darkorange')
axes[1, 0].set_yticks(range(len(organism_resistance)))
axes[1, 0].set_yticklabels(organism_resistance.index)
axes[1, 0].set_xlabel('Mean Resistance Proportion (%)', fontweight='bold')
axes[1, 0].set_title('Average Resistance by Organism (Top 8)', fontweight='bold')
axes[1, 0].invert_yaxis()
for i, v in enumerate(organism_resistance.values):
    axes[1, 0].text(v + 1, i, f'{v:.1f}%', va='center', fontweight='bold')

# 4. High-risk by sample type
sample_risk_plot = sample_risk.reset_index()
axes[1, 1].bar(range(len(sample_risk_plot)), sample_risk_plot['High_Risk_Percentage'], 
               alpha=0.7, color='mediumpurple')
axes[1, 1].set_xticks(range(len(sample_risk_plot)))
axes[1, 1].set_xticklabels(sample_risk_plot['Sample_Type'], rotation=45, ha='right')
axes[1, 1].set_ylabel('High-Risk Patients (%)', fontweight='bold')
axes[1, 1].set_title('High-Risk Patients by Sample Type', fontweight='bold')
for i, v in enumerate(sample_risk_plot['High_Risk_Percentage']):
    axes[1, 1].text(i, v + 1, f'{v:.1f}%', ha='center', fontweight='bold')

plt.tight_layout()
plt.savefig('../reports/figures/patient_risk_stratification.png', dpi=300, bbox_inches='tight')
plt.show()

## 9. Clinical Summary and Actionable Recommendations

In [None]:
print("\n" + "="*90)
print("CLINICAL CORRELATIONS SUMMARY: KEY FINDINGS AND RECOMMENDATIONS")
print("="*90)

print("\n1. AGE-RELATED PATTERNS")
print("-"*90)
highest_age_group = age_res_df.loc[age_res_df['Resistance_Rate'].idxmax()]
lowest_age_group = age_res_df.loc[age_res_df['Resistance_Rate'].idxmin()]
print(f"   Highest resistance: {highest_age_group['Age_Group']} ({highest_age_group['Resistance_Rate']:.1f}%)")
print(f"   Lowest resistance: {lowest_age_group['Age_Group']} ({lowest_age_group['Resistance_Rate']:.1f}%)")
print(f"   Difference: {highest_age_group['Resistance_Rate'] - lowest_age_group['Resistance_Rate']:.1f} percentage points")
print(f"\n   Recommendation:")
print(f"   - Use age-stratified antibiograms for empiric therapy selection")
print(f"   - Enhanced stewardship targeting high-risk age groups")
print(f"   - Consider broader spectrum empirically in {highest_age_group['Age_Group']} patients")

print("\n2. GENDER DIFFERENCES")
print("-"*90)
if len(gender_res_df) >= 2:
    female_data = gender_res_df[gender_res_df['Gender'] == 'Female']
    male_data = gender_res_df[gender_res_df['Gender'] == 'Male']
    
    if len(female_data) > 0 and len(male_data) > 0:
        female_rate = female_data['Resistance_Rate'].values[0]
        male_rate = male_data['Resistance_Rate'].values[0]
        diff = abs(female_rate - male_rate)
        
        print(f"   Female resistance rate: {female_rate:.1f}%")
        print(f"   Male resistance rate: {male_rate:.1f}%")
        print(f"   Difference: {diff:.1f} percentage points")
        
        if diff > 5:
            higher_gender = 'Female' if female_rate > male_rate else 'Male'
            print(f"\n   Recommendation:")
            print(f"   - {higher_gender} patients show higher resistance rates")
            print(f"   - Consider gender-specific empiric protocols")
        else:
            print(f"\n   Recommendation:")
            print(f"   - No clinically significant gender difference detected")
            print(f"   - Gender-neutral empiric protocols appropriate")

print("\n3. SAMPLE TYPE CONSIDERATIONS")
print("-"*90)
if len(sample_res_df) > 0:
    highest_sample = sample_res_df.iloc[0]
    lowest_sample = sample_res_df.iloc[-1]
    print(f"   Highest resistance: {highest_sample['Sample_Type']} ({highest_sample['Resistance_Rate']:.1f}%)")
    print(f"   Lowest resistance: {lowest_sample['Sample_Type']} ({lowest_sample['Resistance_Rate']:.1f}%)")
    print(f"\n   Recommendation:")
    print(f"   - Sample-specific empiric protocols essential")
    print(f"   - {highest_sample['Sample_Type']} samples require broader spectrum coverage")
    print(f"   - Early culture-directed therapy for {highest_sample['Sample_Type']} infections")

print("\n4. HIGH-RISK PATIENT POPULATIONS")
print("-"*90)
if len(patient_risk_df) > 0:
    print(f"   Patients with high resistance burden: {high_risk_pct:.1f}%")
    
    # Identify highest risk combinations
    if len(age_risk_plot) > 0:
        highest_risk_age = age_risk_plot.loc[age_risk_plot['High_Risk_Percentage'].idxmax()]
        print(f"   Highest risk age group: {highest_risk_age['Age_Group']} ({highest_risk_age['High_Risk_Percentage']:.1f}% high-risk)")
    
    print(f"\n   Recommendation:")
    print(f"   - Implement risk stratification tool at point of care")
    print(f"   - High-risk patients require:")
    print(f"     * Early infectious disease consultation")
    print(f"     * Broader spectrum empiric coverage")
    print(f"     * Aggressive culture-directed de-escalation")
    print(f"     * Enhanced monitoring for treatment failure")

print("\n5. ACTIONABLE CLINICAL PATHWAYS")
print("-"*90)
print("""
   A. For Low-Risk Patients (Age <50, no prior resistance):
      - Use narrow-spectrum empiric agents
      - Standard duration therapy
      - Outpatient management when appropriate
   
   B. For Moderate-Risk Patients (Age 50-65, community-acquired):
      - Consider intermediate spectrum agents
      - Obtain cultures before antibiotics
      - Plan for 48-72 hour reassessment
   
   C. For High-Risk Patients (Age >65, healthcare exposure, prior MDR):
      - Broad-spectrum empiric coverage
      - Infectious disease consultation
      - Mandatory culture before antibiotics
      - Daily antimicrobial stewardship review
      - Aggressive de-escalation when possible
""")

print("\n6. QUALITY IMPROVEMENT INITIATIVES")
print("-"*90)
print("""
   Recommended institutional actions:
   
   1. Develop age-stratified local antibiograms
   2. Create electronic order sets with age-specific defaults
   3. Implement mandatory stewardship review for high-risk groups
   4. Provider education on age-specific resistance patterns
   5. Track outcomes by demographic groups
   6. Quarterly resistance surveillance by age/gender/sample type
   7. Feedback to providers on prescribing patterns
""")

print("="*90)
print("Report generated:", pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S'))
print("="*90)

## 10. Export Results for Clinical Use

In [None]:
# Export comprehensive clinical correlations data
age_res_df.to_csv('../data/processed/resistance_by_age_group.csv', index=False)
gender_res_df.to_csv('../data/processed/resistance_by_gender.csv', index=False)
sample_res_df.to_csv('../data/processed/resistance_by_sample_type.csv', index=False)
age_ab_df.to_csv('../data/processed/resistance_age_antibiotic_matrix.csv', index=False)
patient_risk_df.to_csv('../data/processed/patient_risk_stratification.csv', index=False)

print("Clinical correlation data exported to data/processed/")
print("\nFiles created:")
print("  - resistance_by_age_group.csv")
print("  - resistance_by_gender.csv")
print("  - resistance_by_sample_type.csv")
print("  - resistance_age_antibiotic_matrix.csv")
print("  - patient_risk_stratification.csv")
print("\nThese files can be used for:")
print("  - Clinical decision support tools")
print("  - Electronic health record integration")
print("  - Provider education materials")
print("  - Quality improvement dashboards")

---

## Conclusions

This analysis demonstrates significant correlations between patient demographics and antimicrobial resistance patterns. Key clinical implications include:

1. **Age matters**: Resistance rates vary significantly across age groups, justifying age-stratified empiric protocols

2. **Sample-specific patterns**: Different infection sources require tailored antibiotic selection

3. **Risk stratification is essential**: Identifying high-risk patients enables more targeted therapy and stewardship

4. **Local data drives better care**: Institution-specific resistance patterns should guide empiric therapy, not national guidelines alone

5. **Continuous surveillance required**: Regular updating of demographic-stratified antibiograms ensures optimal care

### Limitations

- Single-center data may limit generalizability
- Sample size limitations in some subgroups
- Lack of clinical outcomes data (cure rates, mortality)
- Missing data on prior antibiotic exposure and healthcare contacts
- Cross-sectional design limits causal inference

### Future Directions

- Prospective validation of risk stratification tools
- Integration into clinical decision support systems
- Outcomes research on demographic-tailored protocols
- Machine learning models for personalized empiric therapy prediction

---

**Analysis by:** Clinical Microbiology & Infectious Diseases Team  
**Date:** 2025  
**Data Source:** Local antimicrobial susceptibility testing database  
**Statistical Software:** Python 3.x (pandas, scipy, statsmodels)  

---