# CDMSCI-199: Add Missing Exchanges and Re-run FBA

## Objective

Systematically add the 33 missing exchange reactions identified in Notebook 03 and re-run FBA simulations to quantify improvement.

## Background

From Notebooks 02 and 03, we identified:
- 571 False Negatives (all with biomass_flux = 0)
- 33 missing exchange reactions across 23 organisms for:
  - Fe2+ (cpd10515): 14 organisms missing
  - Ni2+ (cpd00244): 14 organisms missing
  - Molybdate (cpd11574): 5 organisms missing

Missing exchanges correlate with higher FN rates:
- Fe2+: +12.5% higher FN rate
- Molybdate: +19.2% higher FN rate
- Ni2+: +5.0% higher FN rate

## Approach

1. Load missing exchanges list
2. For each affected organism:
   - Load gap-filled model
   - Add missing exchange reaction(s)
   - Save corrected model
3. Re-run FBA simulations for ALL organism-carbon combinations (to avoid selective bias)
4. Compare before/after performance

## Expected Outcomes

- Reduce FN count from 571 to ~230-285
- Improve recall from 43% to ~65-75%
- Improve accuracy from 63% to ~78-82%

## Setup and Imports

In [10]:
import pandas as pd
import numpy as np
import cobra
from pathlib import Path
import json
import warnings
warnings.filterwarnings('ignore')

print(f"COBRApy version: {cobra.__version__}")
print(f"Pandas version: {pd.__version__}")
print(f"NumPy version: {np.__version__}")

COBRApy version: 0.29.1
Pandas version: 2.2.2
NumPy version: 1.26.4


## Load Missing Exchanges Data

In [11]:
# Load missing exchanges details
missing_exchanges = pd.read_csv('results/missing_exchanges_details.csv')

print(f"Missing exchanges: {len(missing_exchanges)}")
print(f"\nBreakdown by compound:")
print(missing_exchanges.groupby('compound_name')['orgId'].count())
print(f"\nUnique organisms affected: {missing_exchanges['orgId'].nunique()}")
print(f"\nFirst 10 missing exchanges:")
print(missing_exchanges.head(10))

Missing exchanges: 33

Breakdown by compound:
compound_name
Fe2+         14
Molybdate     5
Ni2+         14
Name: orgId, dtype: int64

Unique organisms affected: 28

First 10 missing exchanges:
                 orgId                                organism compound_id  \
0                 ANA3                    Shewanella sp. ANA-3    cpd00244   
1                BFirm          Burkholderia phytofirmans PsJN    cpd10515   
2               Bifido           Bifidobacterium breve UCC2003    cpd10515   
3               Bifido           Bifidobacterium breve UCC2003    cpd11574   
4                Brev2    Brevundimonas sp. GW460-12-10-14-LB2    cpd00244   
5                Brev2    Brevundimonas sp. GW460-12-10-14-LB2    cpd11574   
6               Btheta   Bacteroides thetaiotaomicron VPI-5482    cpd00244   
7               Btheta   Bacteroides thetaiotaomicron VPI-5482    cpd11574   
8              Burk376  Paraburkholderia bryophila 376MFSha3.1    cpd10515   
9  Burkholderia_OAS925    

## Define Exchange Reaction IDs

ModelSEED exchange reactions follow the pattern: `EX_{compound_id}_e0`

In [12]:
# Define exchange reaction IDs for each compound
exchange_reaction_ids = {
    'cpd10515': 'EX_cpd10515_e0',  # Fe2+
    'cpd00244': 'EX_cpd00244_e0',  # Ni2+
    'cpd11574': 'EX_cpd11574_e0',  # Molybdate
}

# Map compound IDs to names for reporting
compound_names = {
    'cpd10515': 'Fe2+',
    'cpd00244': 'Ni2+',
    'cpd11574': 'Molybdate',
}

print("Exchange reactions to add:")
for cpd_id, rxn_id in exchange_reaction_ids.items():
    name = compound_names[cpd_id]
    print(f"  {name} ({cpd_id}): {rxn_id}")

Exchange reactions to add:
  Fe2+ (cpd10515): EX_cpd10515_e0
  Ni2+ (cpd00244): EX_cpd00244_e0
  Molybdate (cpd11574): EX_cpd11574_e0


## Load Organism Metadata

In [13]:
# Load organism metadata
organism_metadata = pd.read_csv('results/organism_metadata.csv')

print(f"Total organisms: {len(organism_metadata)}")
print(f"\nColumns: {list(organism_metadata.columns)}")
print(f"\nFirst 5 organisms:")
print(organism_metadata[['orgId', 'organism', 'genome_id']].head())

Total organisms: 44

Columns: ['orgId', 'organism', 'genome_id']

First 5 organisms:
             orgId                               organism        genome_id
0  acidovorax_3H11              Acidovorax sp. GW101-3H11  acidovorax_3H11
1           azobra          Azospirillum brasilense Sp245           azobra
2           Btheta  Bacteroides thetaiotaomicron VPI-5482           Btheta
3           Bifido          Bifidobacterium breve UCC2003           Bifido
4            Brev2   Brevundimonas sp. GW460-12-10-14-LB2            Brev2


## Create Models with Missing Exchanges Directory

In [14]:
# Create directory for models with missing exchanges added
models_with_exchanges_dir = Path('models_missing_exchanges')
models_with_exchanges_dir.mkdir(exist_ok=True)

print(f"Models with missing exchanges added will be saved to: {models_with_exchanges_dir}")

Models with missing exchanges added will be saved to: models_missing_exchanges


## Function to Add Exchange Reaction

Add a new exchange reaction to a model if it doesn't already exist.

In [11]:
def add_exchange_reaction(model, compound_id, exchange_rxn_id, compound_name):
    """
    Add an exchange reaction to a COBRA model.
    
    Parameters:
    -----------
    model : cobra.Model
        The model to modify
    compound_id : str
        ModelSEED compound ID (e.g., 'cpd10515')
    exchange_rxn_id : str
        Exchange reaction ID (e.g., 'EX_cpd10515_e0')
    compound_name : str
        Human-readable name (e.g., 'Fe2+')
    
    Returns:
    --------
    bool : True if added, False if already exists
    """
    # Check if exchange already exists
    if exchange_rxn_id in model.reactions:
        return False
    
    # Find the metabolite (usually in extracellular compartment _e0)
    metabolite_id = f"{compound_id}_e0"
    
    if metabolite_id not in model.metabolites:
        # Metabolite doesn't exist, create it
        met = cobra.Metabolite(
            id=metabolite_id,
            name=compound_name,
            compartment='e0'
        )
        # Note: COBRA will add the metabolite when we add the reaction
    else:
        met = model.metabolites.get_by_id(metabolite_id)
    
    # Create exchange reaction
    # Exchange reactions allow uptake and secretion: metabolite_e0 <=>
    exchange_rxn = cobra.Reaction(
        id=exchange_rxn_id,
        name=f"{compound_name} exchange",
        lower_bound=-100.0,  # Allow uptake up to 100 mmol/gDW/h
        upper_bound=100.0     # Allow secretion up to 100 mmol/gDW/h
    )
    
    # Add metabolite to reaction (exchange reactions have single metabolite)
    exchange_rxn.add_metabolites({met: -1.0})  # Negative for uptake convention
    
    # Add reaction to model
    model.add_reactions([exchange_rxn])
    
    return True

print("Function defined: add_exchange_reaction()")

Function defined: add_exchange_reaction()


## Add Missing Exchanges to Models

Process each organism and add missing exchange reactions.

In [None]:
# Track corrections
correction_log = []

# Get unique organisms that need corrections
organisms_to_correct = missing_exchanges['orgId'].unique()

print(f"Processing {len(organisms_to_correct)} organisms...")
print("="*80)

for org_id in organisms_to_correct:
    # Get organism info
    org_info = organism_metadata[organism_metadata['orgId'] == org_id]
    if len(org_info) == 0:
        print(f"WARNING: Organism {org_id} not found in metadata")
        continue
    
    organism_name = org_info.iloc[0]['organism']
    genome_id = org_info.iloc[0]['genome_id']
    
    # Load gap-filled model
    model_path = Path(f"../CDMSCI-198-build-models/models/{genome_id}_gapfilled.json")
    
    if not model_path.exists():
        print(f"WARNING: Model not found: {model_path}")
        continue
    
    print(f"\n{organism_name} ({org_id}):")
    
    # Load model
    model = cobra.io.load_json_model(str(model_path))
    original_num_reactions = len(model.reactions)
    
    # Get missing compounds for this organism
    missing_for_org = missing_exchanges[missing_exchanges['orgId'] == org_id]
    
    exchanges_added = []
    
    # Add each missing exchange
    for idx, row in missing_for_org.iterrows():
        cpd_id = row['compound_id']
        cpd_name = row['compound_name']
        exchange_rxn_id = exchange_reaction_ids[cpd_id]
        
        added = add_exchange_reaction(model, cpd_id, exchange_rxn_id, cpd_name)
        
        if added:
            exchanges_added.append(cpd_name)
            print(f"  + Added: {exchange_rxn_id} ({cpd_name})")
        else:
            print(f"  - Already exists: {exchange_rxn_id} ({cpd_name})")
    
    # Save corrected model
    corrected_model_path = models_with_exchanges_dir / f"{genome_id}_gapfilled_corrected.json"
    cobra.io.save_json_model(model, str(corrected_model_path))
    
    new_num_reactions = len(model.reactions)
    
    print(f"  Reactions: {original_num_reactions} → {new_num_reactions} (+{new_num_reactions - original_num_reactions})")
    print(f"  Saved to: {corrected_model_path.name}")
    
    # Log correction
    correction_log.append({
        'orgId': org_id,
        'organism': organism_name,
        'genome_id': genome_id,
        'exchanges_added': ', '.join(exchanges_added),
        'num_exchanges_added': len(exchanges_added),
        'original_reactions': original_num_reactions,
        'corrected_reactions': new_num_reactions,
    })

print("\n" + "="*80)
print(f"\nCompleted! Corrected {len(correction_log)} models.")

Processing 28 organisms...

Shewanella sp. ANA-3 (ANA3):
  + Added: EX_cpd00244_e0 (Ni2+)
  Reactions: 1348 → 1349 (+1)
  Saved to: ANA3_gapfilled_corrected.json

Burkholderia phytofirmans PsJN (BFirm):
  + Added: EX_cpd10515_e0 (Fe2+)
  Reactions: 1568 → 1569 (+1)
  Saved to: BFirm_gapfilled_corrected.json

Bifidobacterium breve UCC2003 (Bifido):


## Save Correction Log

In [None]:
# Save correction log
correction_df = pd.DataFrame(correction_log)
correction_df.to_csv('results/model_corrections_log.csv', index=False)

print("Model correction log:")
print(correction_df.to_string(index=False))
print(f"\nSaved to: results/model_corrections_log.csv")

## Re-run FBA Simulations

Run FBA for all organism-carbon combinations using models with missing exchanges added.

**Important**: We re-run ALL simulations (not just affected ones) to avoid selective bias in the comparison.

In [8]:
# Load simulatable sources (already filtered in Notebook 01)
simulatable = pd.read_csv('results/simulatable_carbon_sources.csv')

print(f"Carbon sources to simulate: {len(simulatable)}")
print(f"Organisms to simulate: {len(organism_metadata)}")
print(f"Total simulations: {len(simulatable) * len(organism_metadata):,}")

Carbon sources to simulate: 121
Organisms to simulate: 44
Total simulations: 5,324


## Run Simulations with Models with Missing Exchanges

In [15]:
import time
from cobra.exceptions import OptimizationError

# Storage for results
results_corrected = []

# Media directory
media_dir = Path('../CDMSCI-197-media-formulations/media')

# Track progress
total_sims = len(organism_metadata) * len(simulatable)
start_time = time.time()

print(f"Starting FBA simulations with CORRECTED models...")
print(f"Total simulations: {total_sims:,}")
print("="*80)

sim_count = 0

for org_idx, org_row in organism_metadata.iterrows():
    org_id = org_row['orgId']
    organism = org_row['organism']
    genome_id = org_row['genome_id']
    
    # Load corrected model if it exists, otherwise use original
    corrected_model_path = models_with_exchanges_dir / f"{genome_id}_gapfilled_corrected.json"
    original_model_path = Path(f"../CDMSCI-198-build-models/models/{genome_id}_gapfilled.json")
    
    if corrected_model_path.exists():
        model_path = corrected_model_path
        model_type = 'corrected'
    else:
        model_path = original_model_path
        model_type = 'original'
    
    # Load model
    model = cobra.io.load_json_model(str(model_path))
    
    # Run FBA for each carbon source
    for cs_idx, cs_row in simulatable.iterrows():
        carbon_source = cs_row['carbon_source']
        media_filename = cs_row['media_file']
        media_path = media_dir / media_filename
        
        # Load media
        with open(media_path, 'r') as f:
            media = json.load(f)
        
        # Apply media to model
        for rxn_id, bounds in media.items():
            if rxn_id in model.reactions:
                rxn = model.reactions.get_by_id(rxn_id)
                rxn.lower_bound = bounds[0]
                rxn.upper_bound = bounds[1]
        
        # Run FBA
        try:
            solution = model.optimize()
            biomass_flux = solution.objective_value
            status = solution.status
        except OptimizationError as e:
            biomass_flux = 0.0
            status = 'failed'
        
        # Prediction (threshold 0.001)
        prediction = 1 if biomass_flux > 0.001 else 0
        
        # Store result
        results_corrected.append({
            'organism': organism,
            'orgId': org_id,
            'carbon_source': carbon_source,
            'media_filename': media_filename,
            'biomass_flux': biomass_flux,
            'status': status,
            'prediction': prediction,
            'model_type': model_type,
        })
        
        sim_count += 1
    
    # Progress update every 5 organisms
    if (org_idx + 1) % 5 == 0:
        elapsed = time.time() - start_time
        rate = sim_count / elapsed
        remaining = (total_sims - sim_count) / rate
        print(f"Progress: {sim_count:,}/{total_sims:,} ({100*sim_count/total_sims:.1f}%) - "
              f"Elapsed: {elapsed/60:.1f}min, Remaining: {remaining/60:.1f}min")

elapsed_time = time.time() - start_time

print("\n" + "="*80)
print(f"Simulations complete!")
print(f"Total time: {elapsed_time/60:.1f} minutes ({elapsed_time/3600:.2f} hours)")
print(f"Simulations: {len(results_corrected):,}")

Starting FBA simulations with CORRECTED models...
Total simulations: 5,324


KeyError: 'carbon_source'

## Save Corrected Results

In [None]:
# Convert to DataFrame and save
results_corrected_df = pd.DataFrame(results_corrected)
results_corrected_df.to_csv('results/fba_simulation_results_corrected.csv', index=False)

print(f"Saved corrected results: results/fba_simulation_results_corrected.csv")
print(f"\nResults shape: {results_corrected_df.shape}")
print(f"\nFirst 5 rows:")
print(results_corrected_df.head())

## Compare Before vs After: Overall Metrics

Calculate confusion matrix and metrics for models with missing exchanges added.

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score

# Load experimental data
experimental_data = pd.read_csv('../CDMSCI-196-carbon-sources/results/combined_growth_matrix_filtered.csv', index_col=0)

# Convert to binary
experimental_binary = experimental_data.replace({'Growth': 1, 'No Growth': 0})

print(f"Experimental data shape: {experimental_binary.shape}")
print(f"Valid experimental data points: {experimental_binary.notna().sum().sum():,}")

In [None]:
# Create corrected prediction matrix
fba_corrected_matrix = results_corrected_df.pivot(
    index='carbon_source', 
    columns='organism', 
    values='prediction'
)

print(f"Corrected prediction matrix shape: {fba_corrected_matrix.shape}")

In [None]:
# Align matrices
common_sources = experimental_binary.index.intersection(fba_corrected_matrix.index)
common_organisms = experimental_binary.columns.intersection(fba_corrected_matrix.columns)

exp_aligned = experimental_binary.loc[common_sources, common_organisms]
fba_corrected_aligned = fba_corrected_matrix.loc[common_sources, common_organisms]

print(f"Aligned shape: {exp_aligned.shape}")
print(f"Valid comparisons: {exp_aligned.notna().sum().sum():,}")

In [None]:
# Extract valid comparison data for models with missing exchanges added
exp_flat = exp_aligned.values.flatten()
fba_corrected_flat = fba_corrected_aligned.values.flatten()

valid_mask = ~np.isnan(exp_flat)

y_true_corrected = exp_flat[valid_mask].astype(int)
y_pred_corrected = fba_corrected_flat[valid_mask].astype(int)

print(f"Valid comparisons: {len(y_true_corrected):,}")

In [None]:
# Calculate corrected metrics
cm_corrected = confusion_matrix(y_true_corrected, y_pred_corrected)
tn_c, fp_c, fn_c, tp_c = cm_corrected.ravel()

accuracy_c = accuracy_score(y_true_corrected, y_pred_corrected)
precision_c = precision_score(y_true_corrected, y_pred_corrected, zero_division=0)
recall_c = recall_score(y_true_corrected, y_pred_corrected, zero_division=0)
f1_c = f1_score(y_true_corrected, y_pred_corrected, zero_division=0)
specificity_c = tn_c / (tn_c + fp_c) if (tn_c + fp_c) > 0 else 0

print("="*80)
print("CORRECTED MODEL PERFORMANCE")
print("="*80)
print()
print(f"                      Predicted")
print(f"                 No Growth    Growth")
print(f"Experimental")
print(f"  No Growth      {tn_c:6d}      {fp_c:6d}")
print(f"  Growth         {fn_c:6d}      {tp_c:6d}")
print()
print(f"True Negatives (TN):  {tn_c:,}")
print(f"False Positives (FP): {fp_c:,}")
print(f"False Negatives (FN): {fn_c:,}")
print(f"True Positives (TP):  {tp_c:,}")
print()
print(f"Accuracy:    {accuracy_c:.4f} ({accuracy_c*100:.2f}%)")
print(f"Precision:   {precision_c:.4f} ({precision_c*100:.2f}%)")
print(f"Recall:      {recall_c:.4f} ({recall_c*100:.2f}%)")
print(f"F1-Score:    {f1_c:.4f}")
print(f"Specificity: {specificity_c:.4f} ({specificity_c*100:.2f}%)")

## Compare Before vs After

Load original results and compare side-by-side.

In [None]:
# Load original metrics
with open('results/classification_metrics.json', 'r') as f:
    original_metrics = json.load(f)

# Create comparison
comparison = pd.DataFrame({
    'Metric': ['True Positives', 'True Negatives', 'False Positives', 'False Negatives', 
               'Accuracy', 'Precision', 'Recall', 'F1-Score', 'Specificity'],
    'Original': [
        original_metrics['true_positives'],
        original_metrics['true_negatives'],
        original_metrics['false_positives'],
        original_metrics['false_negatives'],
        f"{original_metrics['accuracy']:.4f}",
        f"{original_metrics['precision']:.4f}",
        f"{original_metrics['recall']:.4f}",
        f"{original_metrics['f1_score']:.4f}",
        f"{original_metrics['specificity']:.4f}",
    ],
    'Corrected': [
        tp_c,
        tn_c,
        fp_c,
        fn_c,
        f"{accuracy_c:.4f}",
        f"{precision_c:.4f}",
        f"{recall_c:.4f}",
        f"{f1_c:.4f}",
        f"{specificity_c:.4f}",
    ],
})

# Calculate changes
comparison['Change'] = [
    f"{tp_c - original_metrics['true_positives']:+d}",
    f"{tn_c - original_metrics['true_negatives']:+d}",
    f"{fp_c - original_metrics['false_positives']:+d}",
    f"{fn_c - original_metrics['false_negatives']:+d}",
    f"{accuracy_c - original_metrics['accuracy']:+.4f}",
    f"{precision_c - original_metrics['precision']:+.4f}",
    f"{recall_c - original_metrics['recall']:+.4f}",
    f"{f1_c - original_metrics['f1_score']:+.4f}",
    f"{specificity_c - original_metrics['specificity']:+.4f}",
]

print("="*80)
print("BEFORE vs AFTER COMPARISON")
print("="*80)
print()
print(comparison.to_string(index=False))
print()

# Calculate improvement percentages
fn_reduction = original_metrics['false_negatives'] - fn_c
fn_reduction_pct = 100 * fn_reduction / original_metrics['false_negatives']
recall_improvement = recall_c - original_metrics['recall']
recall_improvement_pct = 100 * recall_improvement / original_metrics['recall']
accuracy_improvement = accuracy_c - original_metrics['accuracy']
accuracy_improvement_pct = 100 * accuracy_improvement / original_metrics['accuracy']

print("KEY IMPROVEMENTS:")
print(f"  False Negatives: {original_metrics['false_negatives']} → {fn_c} ({fn_reduction:+d}, {fn_reduction_pct:+.1f}%)")
print(f"  Recall: {original_metrics['recall']:.4f} → {recall_c:.4f} ({recall_improvement:+.4f}, {recall_improvement_pct:+.1f}%)")
print(f"  Accuracy: {original_metrics['accuracy']:.4f} → {accuracy_c:.4f} ({accuracy_improvement:+.4f}, {accuracy_improvement_pct:+.1f}%)")

## Save Corrected Metrics

In [None]:
# Save corrected metrics
corrected_metrics = {
    'total_comparisons': int(len(y_true_corrected)),
    'true_positives': int(tp_c),
    'true_negatives': int(tn_c),
    'false_positives': int(fp_c),
    'false_negatives': int(fn_c),
    'accuracy': float(accuracy_c),
    'precision': float(precision_c),
    'recall': float(recall_c),
    'f1_score': float(f1_c),
    'specificity': float(specificity_c)
}

with open('results/classification_metrics_corrected.json', 'w') as f:
    json.dump(corrected_metrics, f, indent=2)

print(f"Saved corrected metrics to: results/classification_metrics_corrected.json")

# Save comparison
comparison.to_csv('results/before_after_comparison.csv', index=False)
print(f"Saved comparison to: results/before_after_comparison.csv")

## Per-Organism Improvement Analysis

Analyze which organisms improved the most.

In [None]:
# Calculate per-organism metrics for models with missing exchanges added
organism_metrics_corrected = []

for organism in common_organisms:
    exp_org = exp_aligned[organism]
    fba_org = fba_corrected_aligned[organism]
    
    valid_mask = exp_org.notna()
    y_true_org = exp_org[valid_mask].astype(int)
    y_pred_org = fba_org[valid_mask].astype(int)
    
    if len(y_true_org) == 0:
        continue
    
    cm_org = confusion_matrix(y_true_org, y_pred_org, labels=[0, 1])
    tn_o, fp_o, fn_o, tp_o = cm_org.ravel()
    
    accuracy_org = accuracy_score(y_true_org, y_pred_org)
    recall_org = recall_score(y_true_org, y_pred_org, zero_division=0)
    
    organism_metrics_corrected.append({
        'organism': organism,
        'n_comparisons': len(y_true_org),
        'TP': int(tp_o),
        'TN': int(tn_o),
        'FP': int(fp_o),
        'FN': int(fn_o),
        'accuracy': accuracy_org,
        'recall': recall_org,
    })

organism_corrected_df = pd.DataFrame(organism_metrics_corrected)

# Load original organism metrics
organism_original_df = pd.read_csv('results/per_organism_accuracy.csv')

# Merge and compare
organism_comparison = organism_original_df[['organism', 'accuracy', 'recall', 'FN']].merge(
    organism_corrected_df[['organism', 'accuracy', 'recall', 'FN']],
    on='organism',
    suffixes=('_original', '_corrected')
)

# Calculate improvements
organism_comparison['accuracy_change'] = organism_comparison['accuracy_corrected'] - organism_comparison['accuracy_original']
organism_comparison['recall_change'] = organism_comparison['recall_corrected'] - organism_comparison['recall_original']
organism_comparison['FN_change'] = organism_comparison['FN_corrected'] - organism_comparison['FN_original']

# Sort by accuracy improvement
organism_comparison = organism_comparison.sort_values('accuracy_change', ascending=False)

print("TOP 10 MOST IMPROVED ORGANISMS:")
print(organism_comparison.head(10)[['organism', 'accuracy_original', 'accuracy_corrected', 'accuracy_change', 'FN_original', 'FN_corrected', 'FN_change']].to_string(index=False))

# Save
organism_comparison.to_csv('results/per_organism_improvement.csv', index=False)
print(f"\nSaved to: results/per_organism_improvement.csv")

## Summary Report

In [None]:
print("="*80)
print("CDMSCI-199 NOTEBOOK 04: MODEL CORRECTION SUMMARY")
print("="*80)
print()
print("MODELS CORRECTED:")
print(f"  Organisms: {len(correction_df)}")
print(f"  Total exchanges added: {correction_df['num_exchanges_added'].sum()}")
print()
print("OVERALL IMPROVEMENT:")
print(f"  Accuracy: {original_metrics['accuracy']:.4f} → {accuracy_c:.4f} ({accuracy_improvement:+.4f}, {accuracy_improvement_pct:+.1f}%)")
print(f"  Recall: {original_metrics['recall']:.4f} → {recall_c:.4f} ({recall_improvement:+.4f}, {recall_improvement_pct:+.1f}%)")
print(f"  False Negatives: {original_metrics['false_negatives']} → {fn_c} ({fn_reduction:+d}, {fn_reduction_pct:+.1f}%)")
print()
print("MOST IMPROVED ORGANISMS:")
for idx, row in organism_comparison.head(5).iterrows():
    print(f"  {row['organism']}:")
    print(f"    Accuracy: {row['accuracy_original']:.3f} → {row['accuracy_corrected']:.3f} ({row['accuracy_change']:+.3f})")
    print(f"    FN: {row['FN_original']} → {row['FN_corrected']} ({row['FN_change']:+d})")
print()
print("FILES GENERATED:")
print("  1. models_missing_exchanges/ - Directory with 23 models with missing exchanges added")
print("  2. results/model_corrections_log.csv - Log of corrections made")
print("  3. results/fba_simulation_results_corrected.csv - New simulation results")
print("  4. results/classification_metrics_corrected.json - New metrics")
print("  5. results/before_after_comparison.csv - Side-by-side comparison")
print("  6. results/per_organism_improvement.csv - Per-organism changes")
print()
print("Analysis complete! Ready for Notebook 05 (comprehensive final analysis).")