# Organism-Specific Antimicrobial Resistance Patterns

**Research Question 2:** What are the organism-specific resistance patterns? How do resistance profiles differ between Gram-positive and Gram-negative bacteria? Which organism-antibiotic combinations pose the greatest therapeutic challenges?

This analysis examines pathogen-specific susceptibility patterns to guide targeted empiric therapy selection and identify high-risk resistance phenotypes requiring infection control measures.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.stats import chi2_contingency, fisher_exact
from statsmodels.stats.proportion import proportion_confint
import warnings
warnings.filterwarnings('ignore')

# Configure display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
pd.set_option('display.precision', 2)

%matplotlib inline
sns.set_style('whitegrid')
sns.set_context('notebook', font_scale=1.1)

# Custom color palettes for clinical interpretation
resistance_colors = {
    'Resistant': '#d62728',
    'Sensitive': '#2ca02c',
    'Intermediate': '#ff7f0e'
}

print("Analysis environment initialized")

## 1. Data Loading and Preparation

In [None]:
# Load cleaned dataset
df = pd.read_csv('../data/processed/amr_data_2025_cleaned.csv')

# Identify antibiotic columns
antibiotic_cols = [col for col in df.columns if ' - ' in col or col.startswith('NET_') or col.startswith('MET_')]

# Categorization function
def categorize_result(result):
    if pd.isna(result):
        return np.nan
    result_str = str(result).upper()
    if 'R' in result_str:
        return 'Resistant'
    elif 'S' in result_str:
        return 'Sensitive'
    elif 'I' in result_str:
        return 'Intermediate'
    return np.nan

# Apply categorization
for col in antibiotic_cols:
    df[col + '_Cat'] = df[col].apply(categorize_result)

# Gram classification
gram_negative = ['E. coli', 'Klebsiella', 'Proteus', 'Pseudomonas']
gram_positive = ['S. aureus', 'Staphylococcus', 'Streptococcus']

df['Gram_Type'] = df['Organism Identified'].apply(
    lambda x: 'Gram-negative' if x in gram_negative else 
             ('Gram-positive' if x in gram_positive else 'Other/Unknown')
)

print(f"Dataset loaded: {len(df)} isolates")
print(f"Antibiotics analyzed: {len(antibiotic_cols)}")
print(f"\nOrganism distribution:")
print(df['Organism Identified'].value_counts())
print(f"\nGram classification:")
print(df['Gram_Type'].value_counts())

## 2. Resistance Rates by Organism

We will calculate resistance rates for each major pathogen to identify organism-specific vulnerabilities and resistance patterns.

In [None]:
# Calculate organism-specific resistance rates
organism_resistance = {}

# Filter for major organisms (n >= 10)
major_organisms = df['Organism Identified'].value_counts()
major_organisms = major_organisms[major_organisms >= 10].index.tolist()

print(f"Major organisms for analysis (n >= 10): {len(major_organisms)}")
print(major_organisms)

for organism in major_organisms:
    org_df = df[df['Organism Identified'] == organism]
    org_resistance = []
    
    for col in antibiotic_cols:
        cat_col = col + '_Cat'
        total = org_df[cat_col].notna().sum()
        
        if total < 5:  # Minimum sample size per organism-antibiotic combination
            continue
        
        resistant = (org_df[cat_col] == 'Resistant').sum()
        sensitive = (org_df[cat_col] == 'Sensitive').sum()
        resistance_rate = (resistant / total) * 100
        
        ab_name = col.split(' - ')[1] if ' - ' in col else col.replace('_', ' ')
        
        org_resistance.append({
            'Antibiotic': ab_name,
            'Total_Tests': total,
            'Resistant': resistant,
            'Sensitive': sensitive,
            'Resistance_Rate': resistance_rate
        })
    
    organism_resistance[organism] = pd.DataFrame(org_resistance)

print(f"\nResistance profiles calculated for {len(organism_resistance)} organisms")

In [None]:
# Display resistance profiles for each major organism
for organism, res_df in organism_resistance.items():
    if len(res_df) > 0:
        print(f"\n{'='*80}")
        print(f"{organism} - Resistance Profile")
        print(f"{'='*80}")
        
        res_sorted = res_df.sort_values('Resistance_Rate', ascending=False)
        print(f"\nMost Resistant Antibiotics (Top 10):")
        print(res_sorted[['Antibiotic', 'Total_Tests', 'Resistance_Rate']].head(10).to_string(index=False))
        
        print(f"\nMost Sensitive Antibiotics (Top 10):")
        sensitive_sorted = res_df.sort_values('Resistance_Rate')
        print(sensitive_sorted[['Antibiotic', 'Total_Tests', 'Resistance_Rate']].head(10).to_string(index=False))
        
        # Calculate mean resistance rate for organism
        mean_res = res_df['Resistance_Rate'].mean()
        print(f"\nMean resistance rate: {mean_res:.1f}%")

## 3. Organism-Specific Resistance Visualization

Visual comparison of resistance patterns across major pathogens.

In [None]:
# Create comprehensive bar charts for each organism
for organism, res_df in organism_resistance.items():
    if len(res_df) == 0:
        continue
    
    # Select top 15 tested antibiotics
    res_sorted = res_df.sort_values('Total_Tests', ascending=False).head(15)
    res_sorted = res_sorted.sort_values('Resistance_Rate', ascending=True)
    
    fig, ax = plt.subplots(figsize=(12, 8))
    
    # Color code by resistance level
    colors = ['#2ca02c' if x < 20 else '#ff7f0e' if x < 50 else '#d62728' 
              for x in res_sorted['Resistance_Rate']]
    
    bars = ax.barh(range(len(res_sorted)), res_sorted['Resistance_Rate'], 
                   color=colors, alpha=0.7, edgecolor='black', linewidth=0.5)
    
    ax.set_yticks(range(len(res_sorted)))
    ax.set_yticklabels(res_sorted['Antibiotic'])
    ax.set_xlabel('Resistance Rate (%)', fontsize=12, fontweight='bold')
    ax.set_title(f'{organism} - Antibiotic Resistance Profile (Top 15 Tested Antibiotics)', 
                 fontsize=14, fontweight='bold', pad=20)
    
    # Add value labels
    for i, (rate, n) in enumerate(zip(res_sorted['Resistance_Rate'], res_sorted['Total_Tests'])):
        ax.text(rate + 2, i, f'{rate:.1f}% (n={n})', va='center', fontsize=9)
    
    # Add reference lines
    ax.axvline(20, color='orange', linestyle='--', alpha=0.3, linewidth=1)
    ax.axvline(50, color='red', linestyle='--', alpha=0.3, linewidth=1)
    
    # Add legend
    from matplotlib.patches import Patch
    legend_elements = [
        Patch(facecolor='#2ca02c', alpha=0.7, label='Low resistance (<20%)'),
        Patch(facecolor='#ff7f0e', alpha=0.7, label='Moderate resistance (20-50%)'),
        Patch(facecolor='#d62728', alpha=0.7, label='High resistance (>50%)')
    ]
    ax.legend(handles=legend_elements, loc='lower right')
    
    ax.set_xlim(0, 100)
    ax.grid(axis='x', alpha=0.3)
    
    plt.tight_layout()
    organism_clean = organism.replace(' ', '_').replace('.', '')
    plt.savefig(f'../reports/figures/resistance_profile_{organism_clean}.png', 
                dpi=300, bbox_inches='tight')
    plt.show()
    
    print(f"Chart saved for {organism}")

## 4. Organism-Antibiotic Resistance Heatmap

Comprehensive heatmap showing resistance rates across organism-antibiotic combinations.

In [None]:
# Create resistance matrix for heatmap
# Select antibiotics tested across multiple organisms
antibiotic_test_counts = {}
for col in antibiotic_cols:
    ab_name = col.split(' - ')[1] if ' - ' in col else col.replace('_', ' ')
    organisms_tested = 0
    for org in major_organisms:
        if org in organism_resistance and len(organism_resistance[org]) > 0:
            if ab_name in organism_resistance[org]['Antibiotic'].values:
                organisms_tested += 1
    antibiotic_test_counts[ab_name] = organisms_tested

# Select antibiotics tested in at least 3 organisms
common_antibiotics = [ab for ab, count in antibiotic_test_counts.items() if count >= 3]
print(f"Antibiotics tested across ≥3 organisms: {len(common_antibiotics)}")

# Build resistance matrix
resistance_matrix = pd.DataFrame(index=major_organisms, columns=common_antibiotics)

for organism in major_organisms:
    if organism in organism_resistance:
        org_res = organism_resistance[organism]
        for ab in common_antibiotics:
            ab_data = org_res[org_res['Antibiotic'] == ab]
            if len(ab_data) > 0:
                resistance_matrix.loc[organism, ab] = ab_data.iloc[0]['Resistance_Rate']

# Convert to numeric
resistance_matrix = resistance_matrix.astype(float)

print(f"\nResistance matrix shape: {resistance_matrix.shape}")
print(f"Coverage: {resistance_matrix.notna().sum().sum()} / {resistance_matrix.size} cells ({resistance_matrix.notna().sum().sum()/resistance_matrix.size*100:.1f}%)")

In [None]:
# Create comprehensive heatmap
fig, ax = plt.subplots(figsize=(16, 10))

# Create custom colormap: green (low resistance) -> yellow -> red (high resistance)
from matplotlib.colors import LinearSegmentedColormap
colors_gradient = ['#2ca02c', '#90ee90', '#ffff99', '#ff7f0e', '#d62728']
n_bins = 100
cmap = LinearSegmentedColormap.from_list('resistance', colors_gradient, N=n_bins)

# Plot heatmap
sns.heatmap(resistance_matrix, 
            annot=True, 
            fmt='.0f', 
            cmap=cmap,
            vmin=0, 
            vmax=100,
            cbar_kws={'label': 'Resistance Rate (%)'},
            linewidths=0.5,
            linecolor='gray',
            square=False,
            ax=ax)

ax.set_title('Organism-Antibiotic Resistance Heatmap', 
             fontsize=16, fontweight='bold', pad=20)
ax.set_xlabel('Antibiotic', fontsize=12, fontweight='bold')
ax.set_ylabel('Organism', fontsize=12, fontweight='bold')

# Rotate labels
plt.xticks(rotation=45, ha='right')
plt.yticks(rotation=0)

plt.tight_layout()
plt.savefig('../reports/figures/organism_antibiotic_heatmap.png', dpi=300, bbox_inches='tight')
plt.show()

print("\nClinical Interpretation:")
print("- Dark red: High resistance (>60%) - Avoid empiric use")
print("- Orange/Yellow: Moderate resistance (20-60%) - Use with caution")
print("- Light green/Green: Low resistance (<20%) - Preferred empiric options")
print("- White: Insufficient data (n<5 tests)")

## 5. Gram-Positive vs Gram-Negative Comparison

Statistical comparison of resistance patterns between major bacterial groups.

In [None]:
# Calculate resistance rates by Gram type
gram_resistance = {}

for gram_type in ['Gram-negative', 'Gram-positive']:
    gram_df = df[df['Gram_Type'] == gram_type]
    gram_res = []
    
    for col in antibiotic_cols:
        cat_col = col + '_Cat'
        total = gram_df[cat_col].notna().sum()
        
        if total < 10:  # Minimum sample size
            continue
        
        resistant = (gram_df[cat_col] == 'Resistant').sum()
        sensitive = (gram_df[cat_col] == 'Sensitive').sum()
        resistance_rate = (resistant / total) * 100
        
        ab_name = col.split(' - ')[1] if ' - ' in col else col.replace('_', ' ')
        
        gram_res.append({
            'Antibiotic': ab_name,
            'Total_Tests': total,
            'Resistant': resistant,
            'Sensitive': sensitive,
            'Resistance_Rate': resistance_rate
        })
    
    gram_resistance[gram_type] = pd.DataFrame(gram_res)

print("Gram-specific resistance profiles calculated")
print(f"\nGram-negative: {len(gram_resistance['Gram-negative'])} antibiotics")
print(f"Gram-positive: {len(gram_resistance['Gram-positive'])} antibiotics")

In [None]:
# Compare mean resistance rates
print("\n" + "="*80)
print("GRAM-POSITIVE vs GRAM-NEGATIVE RESISTANCE COMPARISON")
print("="*80)

for gram_type, res_df in gram_resistance.items():
    mean_res = res_df['Resistance_Rate'].mean()
    median_res = res_df['Resistance_Rate'].median()
    
    print(f"\n{gram_type}:")
    print(f"  Mean resistance rate: {mean_res:.1f}%")
    print(f"  Median resistance rate: {median_res:.1f}%")
    print(f"  Range: {res_df['Resistance_Rate'].min():.1f}% - {res_df['Resistance_Rate'].max():.1f}%")
    
    # Antibiotics with high sensitivity
    low_res = res_df[res_df['Resistance_Rate'] < 20].sort_values('Resistance_Rate')
    print(f"  Antibiotics with <20% resistance: {len(low_res)}")
    if len(low_res) > 0:
        print(f"    Best options: {', '.join(low_res['Antibiotic'].head(5).tolist())}")
    
    # Antibiotics with high resistance
    high_res = res_df[res_df['Resistance_Rate'] >= 50].sort_values('Resistance_Rate', ascending=False)
    print(f"  Antibiotics with ≥50% resistance: {len(high_res)}")
    if len(high_res) > 0:
        print(f"    Avoid: {', '.join(high_res['Antibiotic'].head(5).tolist())}")

In [None]:
# Statistical comparison using Mann-Whitney U test
# Find antibiotics tested in both groups
gram_neg_abs = set(gram_resistance['Gram-negative']['Antibiotic'].values)
gram_pos_abs = set(gram_resistance['Gram-positive']['Antibiotic'].values)
common_abs = gram_neg_abs.intersection(gram_pos_abs)

print(f"\nAntibiotics tested in both Gram groups: {len(common_abs)}")

# Compare resistance rates for common antibiotics
comparison_results = []

for ab in common_abs:
    gram_neg_rate = gram_resistance['Gram-negative'][gram_resistance['Gram-negative']['Antibiotic'] == ab]['Resistance_Rate'].values[0]
    gram_pos_rate = gram_resistance['Gram-positive'][gram_resistance['Gram-positive']['Antibiotic'] == ab]['Resistance_Rate'].values[0]
    
    difference = gram_neg_rate - gram_pos_rate
    
    # Get raw counts for statistical test
    gram_neg_data = gram_resistance['Gram-negative'][gram_resistance['Gram-negative']['Antibiotic'] == ab].iloc[0]
    gram_pos_data = gram_resistance['Gram-positive'][gram_resistance['Gram-positive']['Antibiotic'] == ab].iloc[0]
    
    # Chi-square test for independence
    contingency_table = np.array([
        [gram_neg_data['Resistant'], gram_neg_data['Sensitive']],
        [gram_pos_data['Resistant'], gram_pos_data['Sensitive']]
    ])
    
    # Use Fisher's exact test if any cell < 5
    if contingency_table.min() < 5:
        _, p_value = fisher_exact(contingency_table)
        test_used = 'Fisher'
    else:
        chi2, p_value, _, _ = chi2_contingency(contingency_table)
        test_used = 'Chi-square'
    
    comparison_results.append({
        'Antibiotic': ab,
        'Gram_Negative_Rate': gram_neg_rate,
        'Gram_Positive_Rate': gram_pos_rate,
        'Difference': difference,
        'P_Value': p_value,
        'Test': test_used,
        'Significant': 'Yes' if p_value < 0.05 else 'No'
    })

comparison_df = pd.DataFrame(comparison_results).sort_values('Difference', key=abs, ascending=False)

print("\nTop 15 Antibiotics with Largest Differences Between Gram Groups:")
print(comparison_df[['Antibiotic', 'Gram_Negative_Rate', 'Gram_Positive_Rate', 'Difference', 'P_Value', 'Significant']].head(15).to_string(index=False))

# Count statistically significant differences
sig_differences = comparison_df[comparison_df['Significant'] == 'Yes']
print(f"\nStatistically significant differences (p<0.05): {len(sig_differences)} / {len(comparison_df)} antibiotics")

In [None]:
# Visualize Gram comparison
# Select top 20 most tested antibiotics in both groups
top_abs = comparison_df.head(20)['Antibiotic'].tolist()

gram_neg_rates = []
gram_pos_rates = []

for ab in top_abs:
    gram_neg_rates.append(comparison_df[comparison_df['Antibiotic'] == ab]['Gram_Negative_Rate'].values[0])
    gram_pos_rates.append(comparison_df[comparison_df['Antibiotic'] == ab]['Gram_Positive_Rate'].values[0])

# Create grouped bar chart
fig, ax = plt.subplots(figsize=(14, 10))

x = np.arange(len(top_abs))
width = 0.35

bars1 = ax.barh(x - width/2, gram_neg_rates, width, label='Gram-negative', 
                color='#e74c3c', alpha=0.8)
bars2 = ax.barh(x + width/2, gram_pos_rates, width, label='Gram-positive', 
                color='#3498db', alpha=0.8)

ax.set_yticks(x)
ax.set_yticklabels(top_abs)
ax.set_xlabel('Resistance Rate (%)', fontsize=12, fontweight='bold')
ax.set_title('Gram-Positive vs Gram-Negative Resistance Rates\n(Top 20 Antibiotics by Difference)', 
             fontsize=14, fontweight='bold', pad=20)
ax.legend(loc='lower right', fontsize=11)
ax.invert_yaxis()
ax.grid(axis='x', alpha=0.3)

# Add significance markers
for i, ab in enumerate(top_abs):
    sig_status = comparison_df[comparison_df['Antibiotic'] == ab]['Significant'].values[0]
    if sig_status == 'Yes':
        max_val = max(gram_neg_rates[i], gram_pos_rates[i])
        ax.text(max_val + 2, i, '*', fontsize=16, fontweight='bold', color='red')

# Add legend for significance
ax.text(0.98, 0.02, '* p < 0.05', transform=ax.transAxes, 
        fontsize=10, verticalalignment='bottom', horizontalalignment='right',
        bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

plt.tight_layout()
plt.savefig('../reports/figures/gram_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

print("Comparison chart generated")

## 6. Clinical Phenotypes: ESBL, MRSA, and Carbapenem Resistance

Identification of clinically significant resistance phenotypes.

In [None]:
# Identify ESBL producers (Extended-Spectrum Beta-Lactamase)
# ESBL phenotype: Gram-negative bacteria resistant to 3rd gen cephalosporins
# but typically sensitive to carbapenems

esbl_indicators = ['Ceftriaxone', 'Ceftazidime', 'Cefotaxime']
carbapenem_drugs = ['Imipenem', 'Meropenem']

# Find matching columns
esbl_cols = []
for indicator in esbl_indicators:
    matching = [col for col in antibiotic_cols if indicator in col]
    esbl_cols.extend(matching)

carbapenem_cols = []
for carb in carbapenem_drugs:
    matching = [col for col in antibiotic_cols if carb in col]
    carbapenem_cols.extend(matching)

print(f"ESBL indicator antibiotics found: {len(esbl_cols)}")
print(esbl_cols)
print(f"\nCarbapenem antibiotics found: {len(carbapenem_cols)}")
print(carbapenem_cols)

# Identify potential ESBL producers
gram_neg_df = df[df['Gram_Type'] == 'Gram-negative'].copy()

if len(esbl_cols) > 0:
    # Count resistance to ESBL indicators
    gram_neg_df['ESBL_Resistant_Count'] = 0
    for col in esbl_cols:
        gram_neg_df['ESBL_Resistant_Count'] += (gram_neg_df[col + '_Cat'] == 'Resistant').astype(int)
    
    # ESBL suspected if resistant to ≥1 3rd gen cephalosporin
    gram_neg_df['Suspected_ESBL'] = gram_neg_df['ESBL_Resistant_Count'] >= 1
    
    esbl_count = gram_neg_df['Suspected_ESBL'].sum()
    esbl_prevalence = (esbl_count / len(gram_neg_df)) * 100
    
    print(f"\nSuspected ESBL producers: {esbl_count} / {len(gram_neg_df)} ({esbl_prevalence:.1f}%)")
    
    # ESBL by organism
    esbl_by_org = gram_neg_df.groupby('Organism Identified')['Suspected_ESBL'].agg(['sum', 'count'])
    esbl_by_org['Percentage'] = (esbl_by_org['sum'] / esbl_by_org['count'] * 100)
    esbl_by_org = esbl_by_org.sort_values('Percentage', ascending=False)
    
    print("\nESBL Prevalence by Organism:")
    print(esbl_by_org[esbl_by_org['count'] >= 5])  # Min 5 isolates
else:
    print("\nInsufficient data for ESBL analysis")

In [None]:
# Identify MRSA (Methicillin-Resistant Staphylococcus aureus)
# MRSA identified by resistance to oxacillin/cefoxitin in S. aureus

mrsa_indicators = ['Cefoxitin', 'Oxacillin', 'Methicillin']
mrsa_cols = []
for indicator in mrsa_indicators:
    matching = [col for col in antibiotic_cols if indicator in col]
    mrsa_cols.extend(matching)

print(f"MRSA indicator antibiotics found: {len(mrsa_cols)}")
print(mrsa_cols)

s_aureus_df = df[df['Organism Identified'] == 'S. aureus'].copy()

if len(s_aureus_df) > 0 and len(mrsa_cols) > 0:
    # Check resistance to MRSA indicators
    s_aureus_df['MRSA_Resistant'] = False
    for col in mrsa_cols:
        s_aureus_df['MRSA_Resistant'] |= (s_aureus_df[col + '_Cat'] == 'Resistant')
    
    mrsa_count = s_aureus_df['MRSA_Resistant'].sum()
    mrsa_prevalence = (mrsa_count / len(s_aureus_df)) * 100
    
    print(f"\nMRSA prevalence: {mrsa_count} / {len(s_aureus_df)} S. aureus isolates ({mrsa_prevalence:.1f}%)")
    
    # Clinical significance
    if mrsa_prevalence > 50:
        print("INTERPRETATION: HIGH MRSA prevalence - Consider vancomycin/linezolid for empiric S. aureus coverage")
    elif mrsa_prevalence > 20:
        print("INTERPRETATION: MODERATE MRSA prevalence - Risk-stratify patients for empiric MRSA coverage")
    else:
        print("INTERPRETATION: LOW MRSA prevalence - Beta-lactams may be appropriate for empiric S. aureus coverage")
else:
    print("\nInsufficient data for MRSA analysis")

In [None]:
# Carbapenem resistance analysis
if len(carbapenem_cols) > 0:
    print("\n" + "="*80)
    print("CARBAPENEM RESISTANCE ANALYSIS")
    print("="*80)
    
    for col in carbapenem_cols:
        ab_name = col.split(' - ')[1] if ' - ' in col else col.replace('_', ' ')
        
        total = df[col + '_Cat'].notna().sum()
        resistant = (df[col + '_Cat'] == 'Resistant').sum()
        resistance_rate = (resistant / total * 100) if total > 0 else 0
        
        print(f"\n{ab_name}:")
        print(f"  Total tests: {total}")
        print(f"  Resistant isolates: {resistant}")
        print(f"  Resistance rate: {resistance_rate:.1f}%")
        
        # By organism
        print(f"  Resistance by organism:")
        for organism in major_organisms:
            org_total = df[(df['Organism Identified'] == organism) & (df[col + '_Cat'].notna())].shape[0]
            org_resistant = df[(df['Organism Identified'] == organism) & (df[col + '_Cat'] == 'Resistant')].shape[0]
            if org_total >= 3:
                org_rate = (org_resistant / org_total * 100)
                print(f"    {organism}: {org_resistant}/{org_total} ({org_rate:.1f}%)")
        
        if resistance_rate > 10:
            print(f"  ⚠ WARNING: Elevated carbapenem resistance detected - Consider carbapenemase-producing organisms")
        elif resistance_rate > 0:
            print(f"  ℹ Note: Some carbapenem resistance present - Monitor closely")
        else:
            print(f"  ✓ No carbapenem resistance detected")
else:
    print("\nNo carbapenem testing data available")

## 7. Organism-Specific Treatment Recommendations

Evidence-based empiric therapy recommendations based on local resistance patterns.

In [None]:
# Generate organism-specific treatment recommendations
def generate_treatment_recommendations(organism, resistance_df, sensitivity_threshold=80):
    """
    Generate evidence-based treatment recommendations based on local resistance data.
    
    Parameters:
    - organism: organism name
    - resistance_df: DataFrame with resistance rates
    - sensitivity_threshold: minimum sensitivity rate for first-line recommendation (default 80%)
    """
    print(f"\n{'='*80}")
    print(f"EMPIRIC THERAPY RECOMMENDATIONS: {organism}")
    print(f"{'='*80}")
    
    # Calculate sensitivity rates
    resistance_df['Sensitivity_Rate'] = 100 - resistance_df['Resistance_Rate']
    
    # First-line agents (≥80% sensitivity)
    first_line = resistance_df[resistance_df['Sensitivity_Rate'] >= sensitivity_threshold].sort_values(
        'Sensitivity_Rate', ascending=False)
    
    # Second-line agents (60-79% sensitivity)
    second_line = resistance_df[
        (resistance_df['Sensitivity_Rate'] >= 60) & 
        (resistance_df['Sensitivity_Rate'] < sensitivity_threshold)
    ].sort_values('Sensitivity_Rate', ascending=False)
    
    # Avoid (< 60% sensitivity)
    avoid = resistance_df[resistance_df['Sensitivity_Rate'] < 60].sort_values(
        'Sensitivity_Rate', ascending=True)
    
    print(f"\n1. FIRST-LINE EMPIRIC OPTIONS (≥{sensitivity_threshold}% sensitive):")
    if len(first_line) > 0:
        for _, row in first_line.head(5).iterrows():
            print(f"   • {row['Antibiotic']}: {row['Sensitivity_Rate']:.1f}% sensitive (n={row['Total_Tests']})")
    else:
        print(f"   ⚠ No antibiotics meet first-line criteria")
    
    print(f"\n2. SECOND-LINE OPTIONS (60-{sensitivity_threshold-1}% sensitive):")
    if len(second_line) > 0:
        for _, row in second_line.head(5).iterrows():
            print(f"   • {row['Antibiotic']}: {row['Sensitivity_Rate']:.1f}% sensitive (n={row['Total_Tests']})")
    else:
        print(f"   None identified")
    
    print(f"\n3. AVOID EMPIRICALLY (<60% sensitive):")
    if len(avoid) > 0:
        for _, row in avoid.head(5).iterrows():
            print(f"   ✗ {row['Antibiotic']}: only {row['Sensitivity_Rate']:.1f}% sensitive (n={row['Total_Tests']})")
    else:
        print(f"   None identified")
    
    print(f"\n4. CLINICAL RECOMMENDATIONS:")
    
    # Organism-specific guidance
    if organism == 'E. coli':
        print("   • E. coli is the most common uropathogen")
        print("   • For uncomplicated UTI, prioritize oral outpatient options")
        print("   • For complicated/severe infections, consider IV therapy based on sensitivity")
    elif organism == 'Klebsiella':
        print("   • Klebsiella often associated with healthcare settings")
        print("   • High risk for ESBL production - check 3rd gen cephalosporin resistance")
        print("   • If ESBL suspected, carbapenems are typically preferred")
    elif organism == 'S. aureus':
        print("   • Consider MRSA risk factors (prior hospitalization, IV drug use, etc.)")
        print("   • If MRSA suspected: vancomycin, daptomycin, or linezolid")
        print("   • If MSSA (methicillin-sensitive): nafcillin, oxacillin, or cefazolin")
    elif organism == 'Pseudomonas':
        print("   • Intrinsically resistant to many antibiotics")
        print("   • Requires anti-pseudomonal agents")
        print("   • Common options: piperacillin/tazobactam, cefepime, meropenem, ciprofloxacin")
    
    print("\n   ⚠ IMPORTANT: These are empiric recommendations based on local resistance patterns.")
    print("     Always obtain cultures and adjust therapy based on susceptibility results.")
    print("     Consider patient-specific factors: allergies, renal function, prior antibiotics.")

# Generate recommendations for each major organism
for organism, res_df in organism_resistance.items():
    if len(res_df) >= 5:  # Minimum antibiotics for meaningful recommendations
        generate_treatment_recommendations(organism, res_df.copy())

## 8. Summary Statistics and Key Findings

In [None]:
# Comprehensive summary report
print("\n" + "="*80)
print("ORGANISM-SPECIFIC RESISTANCE PATTERNS: EXECUTIVE SUMMARY")
print("="*80)

print("\n1. ORGANISM DISTRIBUTION")
print("-" * 80)
org_counts = df['Organism Identified'].value_counts()
for org, count in org_counts.items():
    pct = count / len(df) * 100
    print(f"   {org}: {count} isolates ({pct:.1f}%)")

print("\n2. GRAM CLASSIFICATION")
print("-" * 80)
gram_counts = df['Gram_Type'].value_counts()
for gram_type, count in gram_counts.items():
    pct = count / len(df) * 100
    print(f"   {gram_type}: {count} isolates ({pct:.1f}%)")

print("\n3. MEAN RESISTANCE RATES BY ORGANISM")
print("-" * 80)
for organism, res_df in organism_resistance.items():
    if len(res_df) > 0:
        mean_res = res_df['Resistance_Rate'].mean()
        print(f"   {organism}: {mean_res:.1f}% average resistance across {len(res_df)} antibiotics")

print("\n4. GRAM-TYPE COMPARISON")
print("-" * 80)
for gram_type, res_df in gram_resistance.items():
    mean_res = res_df['Resistance_Rate'].mean()
    print(f"   {gram_type}: {mean_res:.1f}% average resistance")

print("\n5. CLINICALLY SIGNIFICANT RESISTANCE PHENOTYPES")
print("-" * 80)
if 'esbl_prevalence' in locals():
    print(f"   Suspected ESBL: {esbl_prevalence:.1f}% of Gram-negative isolates")
if 'mrsa_prevalence' in locals():
    print(f"   MRSA: {mrsa_prevalence:.1f}% of S. aureus isolates")

print("\n6. ORGANISMS OF GREATEST CONCERN")
print("-" * 80)
# Identify organisms with highest mean resistance
org_mean_resistance = []
for organism, res_df in organism_resistance.items():
    if len(res_df) >= 5:  # Minimum for reliable estimate
        mean_res = res_df['Resistance_Rate'].mean()
        org_mean_resistance.append((organism, mean_res))

org_mean_resistance.sort(key=lambda x: x[1], reverse=True)
for i, (org, mean_res) in enumerate(org_mean_resistance[:3], 1):
    print(f"   {i}. {org}: {mean_res:.1f}% average resistance")

print("\n7. KEY CLINICAL IMPLICATIONS")
print("-" * 80)
print("   • Organism identification is critical for appropriate empiric therapy selection")
print("   • Resistance patterns vary significantly between organisms")
print("   • Gram-negative organisms show different resistance profiles than Gram-positive")
print("   • Local antibiogram data should guide institution-specific treatment algorithms")
print("   • Empiric broad-spectrum therapy may be necessary for critically ill patients")
print("   • De-escalation based on culture results is essential for antimicrobial stewardship")

print("\n" + "="*80)
print("Analysis completed: " + pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S'))
print("="*80)

## 9. Export Results

In [None]:
# Export organism-specific resistance data
output_data = []

for organism, res_df in organism_resistance.items():
    for _, row in res_df.iterrows():
        output_data.append({
            'Organism': organism,
            'Antibiotic': row['Antibiotic'],
            'Total_Tests': row['Total_Tests'],
            'Resistant': row['Resistant'],
            'Sensitive': row['Sensitive'],
            'Resistance_Rate': row['Resistance_Rate']
        })

organism_resistance_export = pd.DataFrame(output_data)
organism_resistance_export.to_csv('../data/processed/organism_resistance_profiles.csv', index=False)
print("Organism-specific resistance profiles saved to: data/processed/organism_resistance_profiles.csv")

# Export Gram comparison
if len(comparison_df) > 0:
    comparison_df.to_csv('../data/processed/gram_comparison.csv', index=False)
    print("Gram-type comparison saved to: data/processed/gram_comparison.csv")

# Export resistance matrix
resistance_matrix.to_csv('../data/processed/organism_antibiotic_matrix.csv')
print("Resistance matrix saved to: data/processed/organism_antibiotic_matrix.csv")

print("\nAll results exported successfully.")

---

## Conclusions

### Summary of Organism-Specific Resistance Patterns

This analysis has revealed important organism-specific resistance patterns that should inform empiric antimicrobial therapy selection:

#### Key Findings:

1. **Significant Inter-Organism Variability**
   - Resistance rates vary substantially between different bacterial species
   - Generic "broad-spectrum" approaches may not be optimal for all pathogens
   - Organism identification (even presumptive) is critical for empiric therapy selection

2. **Gram-Positive vs Gram-Negative Differences**
   - Distinct resistance profiles between Gram-positive and Gram-negative organisms
   - Different antibiotic classes appropriate for each group
   - Some antibiotics show statistically significant differences in effectiveness

3. **Emergence of Resistance Phenotypes**
   - ESBL-producing organisms present therapeutic challenges
   - MRSA prevalence impacts empiric S. aureus coverage decisions
   - Carbapenem resistance, while uncommon, represents a critical threat when present

4. **Clinical Implications**
   - Local resistance data should guide empiric therapy protocols
   - Targeted therapy based on organism identification superior to broad empiricism
   - Culture-directed de-escalation is essential for antimicrobial stewardship
   - Some organism-antibiotic combinations show unacceptably high resistance rates

### Recommendations for Clinical Practice:

1. **Develop organism-specific treatment algorithms** based on local resistance patterns
2. **Obtain cultures before initiating therapy** when clinically feasible
3. **Use Gram stain results** to guide initial antibiotic selection
4. **Monitor for emergence of resistance phenotypes** (ESBL, MRSA, carbapenemase producers)
5. **Implement antimicrobial stewardship protocols** to preserve antibiotic effectiveness
6. **Regularly update antibiograms** to reflect changing resistance patterns

### Study Limitations:

- Sample sizes vary by organism-antibiotic combination
- Single-center data may not generalize to other institutions
- Observational design cannot establish causality
- Missing data for some organism-antibiotic pairs limits comprehensive comparisons

### Future Directions:

- Longitudinal tracking of resistance trends over time
- Correlation with antibiotic usage data to identify stewardship opportunities
- Molecular characterization of resistance mechanisms
- Outcome analysis linking empiric therapy appropriateness to clinical outcomes

---

*Analysis completed using Python scientific computing libraries (pandas, numpy, scipy, matplotlib, seaborn)*

*For questions or clarifications, consult with Infectious Diseases or Antimicrobial Stewardship teams*