# CDMSCI-199: Run FBA Simulations

## Objective

Run Flux Balance Analysis (FBA) simulations to predict growth of 44 organisms on 121 carbon sources.

**Scope**: 121 carbon sources × 44 organisms = 5,324 FBA simulations

## Workflow

1. Load 44 gap-filled organism models from CDMSCI-198
2. Load 121 media formulations from CDMSCI-197
3. For each organism × carbon source combination:
   - Apply media to model
   - Run FBA (optimize biomass)
   - Record biomass flux and prediction
4. Save results as CSV and binary matrix

## Outputs

- `results/fba_simulation_results.csv` - All 5,324 simulation results
- `results/insilico_growth_predictions.csv` - Binary matrix (121 × 44)
- `results/biomass_flux_matrix.csv` - Continuous flux values (121 × 44)

## Setup and Imports

In [1]:
import cobra
import pandas as pd
import numpy as np
import json
from pathlib import Path
from tqdm import tqdm
import warnings
warnings.filterwarnings('ignore')

print(f"COBRApy version: {cobra.__version__}")
print(f"Pandas version: {pd.__version__}")

COBRApy version: 0.29.1
Pandas version: 2.2.2


## Load Simulatable Carbon Sources

Load the 121 carbon sources that have both experimental data and media files.

In [2]:
# Load simulatable sources from overlap analysis
simulatable_df = pd.read_csv('results/simulatable_carbon_sources.csv')

print(f"Simulatable carbon sources: {len(simulatable_df)}")
print(f"\nColumns: {list(simulatable_df.columns)}")
print(f"\nFirst 5 sources:")
print(simulatable_df.head())

Simulatable carbon sources: 121

Columns: ['experimental_name', 'media_name', 'media_filename', 'standardized']

First 5 sources:
    experimental_name          media_name           media_filename  \
0     1,2-Propanediol      12-Propanediol      12-Propanediol.json   
1       1,3-Butandiol        13-Butandiol        13-Butandiol.json   
2          1-Pentanol          1-Pentanol          1-Pentanol.json   
3    2-Deoxy-D-Ribose    2-Deoxy-D-Ribose    2-Deoxy-D-Ribose.json   
4  2-methyl-1-butanol  2-methyl-1-butanol  2-methyl-1-butanol.json   

      standardized  
0    12propanediol  
1      13butandiol  
2        1pentanol  
3    2deoxydribose  
4  2methyl1butanol  


## Load Organism List

Get the list of 44 organisms from the experimental data.

In [3]:
# Load experimental data to get organism names
experimental_data = pd.read_csv('../CDMSCI-196-carbon-sources/results/combined_growth_matrix_filtered.csv', index_col=0)

organisms = list(experimental_data.columns)
print(f"Total organisms: {len(organisms)}")
print(f"\nFirst 5 organisms:")
for i, org in enumerate(organisms[:5], 1):
    print(f"  {i}. {org}")

Total organisms: 44

First 5 organisms:
  1. Shewanella sp. ANA-3
  2. Burkholderia phytofirmans PsJN
  3. Bifidobacterium breve UCC2003
  4. Brevundimonas sp. GW460-12-10-14-LB2
  5. Bacteroides thetaiotaomicron VPI-5482


## Create Organism Name to Model ID Mapping

Model filenames use short `orgId` codes (e.g., `ANA3`, `BFirm`) but experimental data uses full species names.
Load the organism metadata to create the mapping.

In [4]:
# Load organism metadata from CDMSCI-196
organism_metadata = pd.read_csv('../CDMSCI-196-carbon-sources/results/organism_metadata.csv')

print(f"Organism metadata loaded: {len(organism_metadata)} organisms")
print(f"\nColumns: {list(organism_metadata.columns)}")
print(f"\nFirst 5 rows:")
print(organism_metadata[['orgId', 'Species_Name']].head())
print()

# Create mapping: Species_Name -> orgId
name_to_orgid = dict(zip(organism_metadata['Species_Name'], organism_metadata['orgId']))

# Map organism names to orgIds
organism_ids = [name_to_orgid.get(org) for org in organisms]

# Check for unmapped organisms
unmapped = [org for org, org_id in zip(organisms, organism_ids) if org_id is None]
if unmapped:
    print(f"WARNING: {len(unmapped)} organisms not found in metadata:")
    for org in unmapped:
        print(f"  - {org}")
else:
    print(f"All {len(organisms)} organisms successfully mapped to orgIds!")
    print(f"\nExample mappings:")
    for i in range(min(5, len(organisms))):
        print(f"  {organisms[i]} → {organism_ids[i]}")

Organism metadata loaded: 57 organisms

Columns: ['orgId', 'Full_Species_Name', 'NCBI_TaxID', 'division', 'genus', 'species', 'strain', 'Species_Name']

First 5 rows:
    orgId                           Species_Name
0    ANA3                   Shewanella sp. ANA-3
1   BFirm         Burkholderia phytofirmans PsJN
2  Bifido          Bifidobacterium breve UCC2003
3   Brev2   Brevundimonas sp. GW460-12-10-14-LB2
4  Btheta  Bacteroides thetaiotaomicron VPI-5482

All 44 organisms successfully mapped to orgIds!

Example mappings:
  Shewanella sp. ANA-3 → ANA3
  Burkholderia phytofirmans PsJN → BFirm
  Bifidobacterium breve UCC2003 → Bifido
  Brevundimonas sp. GW460-12-10-14-LB2 → Brev2
  Bacteroides thetaiotaomicron VPI-5482 → Btheta


## Verify Model Files Exist

Check that all 44 organism models are available from CDMSCI-198.

In [5]:
models_dir = Path('../CDMSCI-198-build-models/models')

# Check for model files using orgIds
missing_models = []
available_models = []
organism_name_map = {}  # Maps orgId -> Species_Name

for org_name, org_id in zip(organisms, organism_ids):
    if org_id is None:
        missing_models.append(org_name)
        continue
    
    model_path = models_dir / f"{org_id}_gapfilled.json"
    if model_path.exists():
        available_models.append(org_id)
        organism_name_map[org_id] = org_name
    else:
        missing_models.append(org_name)

print(f"Available models: {len(available_models)} / {len(organisms)}")

if missing_models:
    print(f"\nWARNING: Missing {len(missing_models)} models:")
    for org in missing_models:
        print(f"  - {org}")
else:
    print(f"\nAll {len(organisms)} organism models are available.")
    print(f"\nExample model files:")
    for i, org_id in enumerate(available_models[:5]):
        print(f"  {org_id}_gapfilled.json → {organism_name_map[org_id]}")

Available models: 44 / 44

All 44 organism models are available.

Example model files:
  ANA3_gapfilled.json → Shewanella sp. ANA-3
  BFirm_gapfilled.json → Burkholderia phytofirmans PsJN
  Bifido_gapfilled.json → Bifidobacterium breve UCC2003
  Brev2_gapfilled.json → Brevundimonas sp. GW460-12-10-14-LB2
  Btheta_gapfilled.json → Bacteroides thetaiotaomicron VPI-5482


## Verify Media Files Exist

Check that all 121 media formulations are available from CDMSCI-197.

In [6]:
media_dir = Path('../CDMSCI-197-media-formulations/media')

# Check for media files
missing_media = []
available_media = []

for idx, row in simulatable_df.iterrows():
    media_file = row['media_filename']
    media_path = media_dir / media_file
    if media_path.exists():
        available_media.append(media_file)
    else:
        missing_media.append(media_file)

print(f"Available media files: {len(available_media)} / {len(simulatable_df)}")

if missing_media:
    print(f"\nWARNING: Missing {len(missing_media)} media files:")
    for media in missing_media:
        print(f"  - {media}")
else:
    print(f"\nAll {len(simulatable_df)} media formulations are available.")

Available media files: 121 / 121

All 121 media formulations are available.


## Define Growth Threshold

Growth threshold matches the gap-filling threshold from CDMSCI-198.

In [7]:
GROWTH_THRESHOLD = 0.001  # h^-1 (biomass flux threshold)

print(f"Growth threshold: {GROWTH_THRESHOLD} h^-1")
print(f"\nClassification:")
print(f"  - Biomass flux > {GROWTH_THRESHOLD} → Growth (1)")
print(f"  - Biomass flux ≤ {GROWTH_THRESHOLD} → No Growth (0)")

Growth threshold: 0.001 h^-1

Classification:
  - Biomass flux > 0.001 → Growth (1)
  - Biomass flux ≤ 0.001 → No Growth (0)


## Helper Function: Convert Media Format

Media files contain ModelSEED compound IDs (e.g., `cpd00007`) but models use exchange reaction IDs (e.g., `EX_cpd00007_e0`).
This function converts the media dictionary format.

In [8]:
def convert_media_to_model_format(media_dict, model):
    """
    Convert media dictionary from ModelSEED format to model exchange reaction format.
    
    Media files have format: {'cpd00007': [-10, 100], 'cpd00001': [-10, 100], ...}
    Models need format: {'EX_cpd00007_e0': 10, 'EX_cpd00001_e0': 10, ...}
    
    Args:
        media_dict: Dictionary with ModelSEED compound IDs as keys
        model: COBRApy model object
    
    Returns:
        tuple: (model_media dict, list of missing compound IDs)
    """
    model_media = {}
    missing_exchanges = []
    
    for cpd_id, bounds in media_dict.items():
        # Extract uptake rate (first value in bounds, make positive)
        uptake_rate = abs(bounds[0]) if isinstance(bounds, list) else abs(bounds)
        
        # Try to find the exchange reaction for this compound
        # Common formats: EX_cpd_e0, EX_cpd_e, etc.
        possible_ids = [
            f"EX_{cpd_id}_e0",
            f"EX_{cpd_id}_e",
            f"EX_{cpd_id}(e)",
            f"{cpd_id}_e0",
        ]
        
        found = False
        for ex_id in possible_ids:
            if ex_id in model.reactions:
                model_media[ex_id] = uptake_rate
                found = True
                break
        
        if not found:
            missing_exchanges.append(cpd_id)
    
    # Note: It's normal to have some missing exchanges if the model doesn't have
    # transport reactions for all compounds in the media (e.g., trace metals)
    
    return model_media, missing_exchanges

print("Media converter function defined.")
print("This will convert ModelSEED compound IDs to model exchange reaction IDs.")
print("Returns: (converted_media_dict, list_of_missing_compound_ids)")

Media converter function defined.
This will convert ModelSEED compound IDs to model exchange reaction IDs.
Returns: (converted_media_dict, list_of_missing_compound_ids)


## Test Single Simulation

Test FBA on one organism × carbon source to verify workflow.

In [9]:
# Test with first organism and first carbon source
test_org_id = available_models[0]
test_org_name = organism_name_map[test_org_id]
test_source = simulatable_df.iloc[0]

print(f"Test organism: {test_org_name}")
print(f"Test organism ID: {test_org_id}")
print(f"Test carbon source: {test_source['experimental_name']}")
print(f"Test media file: {test_source['media_filename']}")
print()

# Load model
model_path = models_dir / f"{test_org_id}_gapfilled.json"
print(f"Loading model: {model_path}")
model = cobra.io.load_json_model(str(model_path))
print(f"Model loaded: {model.id}")
print(f"  Reactions: {len(model.reactions)}")
print(f"  Metabolites: {len(model.metabolites)}")
print(f"  Genes: {len(model.genes)}")
print()

# Load media
media_path = media_dir / test_source['media_filename']
print(f"Loading media: {media_path}")
with open(media_path) as f:
    media_dict = json.load(f)
print(f"Media compounds: {len(media_dict)}")
print()

# Convert media format
print("Converting media format...")
model_media, missing = convert_media_to_model_format(media_dict, model)
print(f"Converted media: {len(model_media)} exchange reactions")
if missing:
    print(f"  Missing {len(missing)} compounds (model lacks transport reactions)")
print()

# Apply media and run FBA
print("Applying media to model...")
model.medium = model_media
print("Running FBA...")
solution = model.optimize()

print(f"\nResults:")
print(f"  Status: {solution.status}")
print(f"  Biomass flux: {solution.objective_value:.6f} h^-1")
print(f"  Prediction: {'Growth (1)' if solution.objective_value > GROWTH_THRESHOLD else 'No Growth (0)'}")
print()
print("Test successful!")

Test organism: Shewanella sp. ANA-3
Test organism ID: ANA3
Test carbon source: 1,2-Propanediol
Test media file: 12-Propanediol.json

Loading model: ../CDMSCI-198-build-models/models/ANA3_gapfilled.json
Model loaded: ANA3
  Reactions: 1348
  Metabolites: 1194
  Genes: 946

Loading media: ../CDMSCI-197-media-formulations/media/12-Propanediol.json
Media compounds: 20

Converting media format...
Converted media: 18 exchange reactions
  Missing 2 compounds (model lacks transport reactions)

Applying media to model...
Running FBA...

Results:
  Status: optimal
  Biomass flux: 0.000000 h^-1
  Prediction: No Growth (0)

Test successful!


## Run 5% Test Simulations

Run a random 5% subset of simulations (~266 simulations) to test the workflow and analyze patterns before running all 5,324 simulations.

Estimated runtime: ~5-10 minutes

In [10]:
import random

# Set random seed for reproducibility
random.seed(42)

# Create list of all organism-carbon source combinations
all_combinations = []
for org_id in available_models:
    for idx, row in simulatable_df.iterrows():
        all_combinations.append((org_id, idx))

# Sample 5%
test_size = int(len(all_combinations) * 0.05)
test_combinations = random.sample(all_combinations, test_size)

print(f"Total possible simulations: {len(all_combinations):,}")
print(f"5% test set: {len(test_combinations):,} simulations")
print(f"Random seed: 42 (for reproducibility)")
print()

# Initialize results
results = []
failed_simulations = 0
successful_simulations = 0

# Run test simulations
print("Running 5% test simulations...")
for org_id, row_idx in tqdm(test_combinations, desc="5% Test"):
    org_name = organism_name_map[org_id]
    row = simulatable_df.iloc[row_idx]
    carbon_source = row['experimental_name']
    media_file = row['media_filename']
    
    # Load model
    model_path = models_dir / f"{org_id}_gapfilled.json"
    try:
        model = cobra.io.load_json_model(str(model_path))
    except Exception as e:
        results.append({
            'organism': org_name,
            'orgId': org_id,
            'carbon_source': carbon_source,
            'media_filename': media_file,
            'biomass_flux': np.nan,
            'status': 'model_load_failed',
            'prediction': np.nan,
            'missing_compounds': np.nan,
            'num_missing': np.nan
        })
        failed_simulations += 1
        continue
    
    # Load media
    media_path = media_dir / media_file
    try:
        with open(media_path) as f:
            media_dict = json.load(f)
        
        # Convert media format
        model_media, missing = convert_media_to_model_format(media_dict, model)
    except Exception as e:
        results.append({
            'organism': org_name,
            'orgId': org_id,
            'carbon_source': carbon_source,
            'media_filename': media_file,
            'biomass_flux': np.nan,
            'status': 'media_load_failed',
            'prediction': np.nan,
            'missing_compounds': np.nan,
            'num_missing': np.nan
        })
        failed_simulations += 1
        continue
    
    # Run FBA
    try:
        model.medium = model_media
        solution = model.optimize()
        
        # Record result
        biomass_flux = solution.objective_value
        prediction = 1 if biomass_flux > GROWTH_THRESHOLD else 0
        
        results.append({
            'organism': org_name,
            'orgId': org_id,
            'carbon_source': carbon_source,
            'media_filename': media_file,
            'biomass_flux': biomass_flux,
            'status': solution.status,
            'prediction': prediction,
            'missing_compounds': ','.join(missing) if missing else '',
            'num_missing': len(missing)
        })
        successful_simulations += 1
        
    except Exception as e:
        results.append({
            'organism': org_name,
            'orgId': org_id,
            'carbon_source': carbon_source,
            'media_filename': media_file,
            'biomass_flux': np.nan,
            'status': 'fba_failed',
            'prediction': np.nan,
            'missing_compounds': np.nan,
            'num_missing': np.nan
        })
        failed_simulations += 1

print()
print(f"5% Test complete!")
print(f"  Successful: {successful_simulations:,}")
print(f"  Failed: {failed_simulations:,}")
print(f"  Total: {len(results):,}")

# Convert to DataFrame
results_df = pd.DataFrame(results)

# Save test results
results_df.to_csv('results/fba_simulation_results_5pct_test.csv', index=False)
print(f"\nSaved to: results/fba_simulation_results_5pct_test.csv")

Total possible simulations: 5,324
5% test set: 266 simulations
Random seed: 42 (for reproducibility)

Running 5% test simulations...


5% Test: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 266/266 [01:53<00:00,  2.35it/s]


5% Test complete!
  Successful: 266
  Failed: 0
  Total: 266

Saved to: results/fba_simulation_results_5pct_test.csv





## Analyze 5% Test Results

Quick analysis of test results before running full simulation.

In [11]:
print("=" * 80)
print("5% TEST RESULTS ANALYSIS")
print("=" * 80)
print()

# Status distribution
print("Simulation Status:")
print(results_df['status'].value_counts())
print()

# Successful simulations
successful = results_df[results_df['status'] == 'optimal']
print(f"Successful simulations: {len(successful):,} / {len(results_df):,} ({100*len(successful)/len(results_df):.1f}%)")
print()

# Biomass flux distribution
print("Biomass flux statistics (h^-1):")
print(successful['biomass_flux'].describe())
print()

# Growth predictions
print("Growth predictions:")
prediction_counts = successful['prediction'].value_counts().sort_index()
for pred, count in prediction_counts.items():
    label = 'Growth' if pred == 1 else 'No Growth'
    print(f"  {label} ({int(pred)}): {count:,} ({100*count/len(successful):.1f}%)")
print()

# Missing compounds analysis
print("Missing compounds analysis:")
print(f"  Average missing per simulation: {successful['num_missing'].mean():.1f}")
print(f"  Max missing: {successful['num_missing'].max():.0f}")
print(f"  Simulations with 0 missing: {(successful['num_missing'] == 0).sum()} ({100*(successful['num_missing'] == 0).sum()/len(successful):.1f}%)")
print()

# Most common missing compounds
all_missing = []
for missing_str in successful['missing_compounds']:
    if pd.notna(missing_str) and missing_str:
        all_missing.extend(missing_str.split(','))

if all_missing:
    from collections import Counter
    missing_counts = Counter(all_missing)
    print("Top 10 most common missing compounds:")
    for cpd_id, count in missing_counts.most_common(10):
        print(f"  {cpd_id}: {count} times ({100*count/len(successful):.1f}% of simulations)")
else:
    print("No missing compounds in any simulation!")
print()

print("5% test looks good! Ready to run full simulation.")

5% TEST RESULTS ANALYSIS

Simulation Status:
status
optimal    266
Name: count, dtype: int64

Successful simulations: 266 / 266 (100.0%)

Biomass flux statistics (h^-1):
count    266.000000
mean       0.108144
std        0.219005
min        0.000000
25%        0.000000
50%        0.000000
75%        0.000000
max        0.844537
Name: biomass_flux, dtype: float64

Growth predictions:
  No Growth (0): 207 (77.8%)
  Growth (1): 59 (22.2%)

Missing compounds analysis:
  Average missing per simulation: 1.5
  Max missing: 3
  Simulations with 0 missing: 39 (14.7%)

Top 10 most common missing compounds:
  cpd00244: 91 times (34.2% of simulations)
  cpd10515: 85 times (32.0% of simulations)
  cpd11574: 35 times (13.2% of simulations)
  cpd16873: 5 times (1.9% of simulations)
  cpd00217: 5 times (1.9% of simulations)
  cpd00508: 5 times (1.9% of simulations)
  cpd00453: 5 times (1.9% of simulations)
  cpd03198: 4 times (1.5% of simulations)
  cpd16586: 4 times (1.5% of simulations)
  cpd01113: 

## Run Full FBA Simulations

Run all 5,324 FBA simulations. This will skip any simulations already completed in the 5% test.

Estimated runtime: 1-2 hours (approximately 1 second per simulation)

In [12]:
# Load existing results if they exist
existing_results_path = Path('results/fba_simulation_results_5pct_test.csv')
if existing_results_path.exists():
    print("Loading existing 5% test results...")
    existing_df = pd.read_csv(existing_results_path)
    results = existing_df.to_dict('records')
    
    # Create set of completed simulations
    completed = set()
    for result in results:
        key = (result['orgId'], result['carbon_source'])
        completed.add(key)
    
    print(f"Loaded {len(results):,} existing results")
    print(f"Will skip {len(completed):,} already-completed simulations")
    print()
else:
    print("No existing results found, starting from scratch.")
    results = []
    completed = set()

# Calculate remaining simulations
total_simulations = len(available_models) * len(simulatable_df)
remaining_simulations = total_simulations - len(completed)

print(f"Total simulations: {total_simulations:,}")
print(f"Already completed: {len(completed):,}")
print(f"Remaining: {remaining_simulations:,}")
print()

if remaining_simulations == 0:
    print("All simulations already completed!")
else:
    # Progress tracking
    failed_simulations = 0
    successful_simulations = 0
    skipped_simulations = 0
    
    # Run remaining simulations
    print(f"Running {remaining_simulations:,} remaining simulations...")
    print()
    
    for org_id in tqdm(available_models, desc="Organisms"):
        org_name = organism_name_map[org_id]
        
        # Load model once per organism
        model_path = models_dir / f"{org_id}_gapfilled.json"
        try:
            model = cobra.io.load_json_model(str(model_path))
        except Exception as e:
            # Record failures for all carbon sources
            for idx, row in simulatable_df.iterrows():
                carbon_source = row['experimental_name']
                key = (org_id, carbon_source)
                if key in completed:
                    skipped_simulations += 1
                    continue
                
                results.append({
                    'organism': org_name,
                    'orgId': org_id,
                    'carbon_source': carbon_source,
                    'media_filename': row['media_filename'],
                    'biomass_flux': np.nan,
                    'status': 'model_load_failed',
                    'prediction': np.nan,
                    'missing_compounds': np.nan,
                    'num_missing': np.nan
                })
                failed_simulations += 1
            continue
        
        # Iterate through carbon sources
        for idx, row in simulatable_df.iterrows():
            carbon_source = row['experimental_name']
            media_file = row['media_filename']
            
            # Skip if already completed
            key = (org_id, carbon_source)
            if key in completed:
                skipped_simulations += 1
                continue
            
            media_path = media_dir / media_file
            
            # Load media
            try:
                with open(media_path) as f:
                    media_dict = json.load(f)
                
                # Convert media format
                model_media, missing = convert_media_to_model_format(media_dict, model)
            except Exception as e:
                results.append({
                    'organism': org_name,
                    'orgId': org_id,
                    'carbon_source': carbon_source,
                    'media_filename': media_file,
                    'biomass_flux': np.nan,
                    'status': 'media_load_failed',
                    'prediction': np.nan,
                    'missing_compounds': np.nan,
                    'num_missing': np.nan
                })
                failed_simulations += 1
                continue
            
            # Run FBA
            try:
                model.medium = model_media
                solution = model.optimize()
                
                # Record result
                biomass_flux = solution.objective_value
                prediction = 1 if biomass_flux > GROWTH_THRESHOLD else 0
                
                results.append({
                    'organism': org_name,
                    'orgId': org_id,
                    'carbon_source': carbon_source,
                    'media_filename': media_file,
                    'biomass_flux': biomass_flux,
                    'status': solution.status,
                    'prediction': prediction,
                    'missing_compounds': ','.join(missing) if missing else '',
                    'num_missing': len(missing)
                })
                successful_simulations += 1
                
            except Exception as e:
                results.append({
                    'organism': org_name,
                    'orgId': org_id,
                    'carbon_source': carbon_source,
                    'media_filename': media_file,
                    'biomass_flux': np.nan,
                    'status': 'fba_failed',
                    'prediction': np.nan,
                    'missing_compounds': np.nan,
                    'num_missing': np.nan
                })
                failed_simulations += 1
    
    print()
    print(f"Full simulation complete!")
    print(f"  Successful: {successful_simulations:,}")
    print(f"  Failed: {failed_simulations:,}")
    print(f"  Skipped (already done): {skipped_simulations:,}")
    print(f"  Total results: {len(results):,}")

# Convert to DataFrame
results_df = pd.DataFrame(results)

# Save full results
results_df.to_csv('results/fba_simulation_results.csv', index=False)
print(f"\nSaved to: results/fba_simulation_results.csv")

Loading existing 5% test results...
Loaded 266 existing results
Will skip 266 already-completed simulations

Total simulations: 5,324
Already completed: 266
Remaining: 5,058

Running 5,058 remaining simulations...



Organisms: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 44/44 [01:19<00:00,  1.81s/it]


Full simulation complete!
  Successful: 5,058
  Failed: 0
  Skipped (already done): 266
  Total results: 5,324

Saved to: results/fba_simulation_results.csv





## Save Detailed Results

Save all simulation results to CSV.

In [13]:
# Convert to DataFrame
results_df = pd.DataFrame(results)

print(f"Results DataFrame shape: {results_df.shape}")
print(f"\nColumns: {list(results_df.columns)}")
print(f"\nFirst 10 results:")
print(results_df.head(10))

# Save to CSV
output_path = 'results/fba_simulation_results.csv'
results_df.to_csv(output_path, index=False)
print(f"\nSaved to: {output_path}")

Results DataFrame shape: (5324, 9)

Columns: ['organism', 'orgId', 'carbon_source', 'media_filename', 'biomass_flux', 'status', 'prediction', 'missing_compounds', 'num_missing']

First 10 results:
                                  organism                 orgId  \
0  Rhodanobacter denitrificans FW104-10B01   rhodanobacter_10B01   
1          Phocaeicola vulgatus CL09T03C04  Bvulgatus_CL09T03C04   
2           Burkholderia phytofirmans PsJN                 BFirm   
3                 Escherichia coli BW25113                  Keio   
4            Dyella japonica UNC79MFTsu3.2              Dyella79   
5            Dinoroseobacter shibae DFL-12                  Dino   
6            Caulobacter crescentus NA1000                 Caulo   
7         Paraburkholderia graminis OAS925   Burkholderia_OAS925   
8            Azospirillum brasilense Sp245                azobra   
9   Paraburkholderia bryophila 376MFSha3.1               Burk376   

                          carbon_source  \
0          

## Analyze Simulation Success Rate

In [14]:
# Check status distribution
print("Simulation Status Distribution:")
print(results_df['status'].value_counts())
print()

# Check for failed simulations
failed = results_df[results_df['status'].isin(['model_load_failed', 'media_load_failed', 'fba_failed'])]
if len(failed) > 0:
    print(f"WARNING: {len(failed)} simulations failed")
    print(f"\nFailed simulations by status:")
    print(failed['status'].value_counts())
else:
    print("All simulations completed successfully!")

Simulation Status Distribution:
status
optimal    5324
Name: count, dtype: int64

All simulations completed successfully!


## Analyze Biomass Flux Distribution

In [15]:
# Filter to successful simulations
successful = results_df[results_df['status'] == 'optimal']

print(f"Successful simulations: {len(successful):,}")
print()

# Biomass flux statistics
print("Biomass flux statistics (h^-1):")
print(successful['biomass_flux'].describe())
print()

# Prediction distribution
print("Growth predictions:")
prediction_counts = successful['prediction'].value_counts().sort_index()
for pred, count in prediction_counts.items():
    label = 'Growth' if pred == 1 else 'No Growth'
    print(f"  {label} ({int(pred)}): {count:,} ({100*count/len(successful):.1f}%)")

Successful simulations: 5,324

Biomass flux statistics (h^-1):
count    5.324000e+03
mean     1.172703e-01
std      2.496681e-01
min     -2.510842e-14
25%      0.000000e+00
50%      0.000000e+00
75%      0.000000e+00
max      1.902357e+00
Name: biomass_flux, dtype: float64

Growth predictions:
  No Growth (0): 4,185 (78.6%)
  Growth (1): 1,139 (21.4%)


## Create Binary Prediction Matrix

Pivot results into carbon sources (rows) × organisms (columns) matrix with binary predictions.

In [16]:
# Create pivot table with predictions
prediction_matrix = results_df.pivot(index='carbon_source', columns='organism', values='prediction')

print(f"Prediction matrix shape: {prediction_matrix.shape}")
print(f"  Carbon sources (rows): {prediction_matrix.shape[0]}")
print(f"  Organisms (columns): {prediction_matrix.shape[1]}")
print()

# Check for missing values
missing = prediction_matrix.isna().sum().sum()
print(f"Missing predictions: {missing} / {prediction_matrix.shape[0] * prediction_matrix.shape[1]}")
print()

# Save to CSV
output_path = 'results/insilico_growth_predictions.csv'
prediction_matrix.to_csv(output_path)
print(f"Saved to: {output_path}")

Prediction matrix shape: (121, 44)
  Carbon sources (rows): 121
  Organisms (columns): 44

Missing predictions: 0 / 5324

Saved to: results/insilico_growth_predictions.csv


## Create Biomass Flux Matrix

Pivot results into carbon sources (rows) × organisms (columns) matrix with continuous biomass flux values.

In [17]:
# Create pivot table with biomass flux
flux_matrix = results_df.pivot(index='carbon_source', columns='organism', values='biomass_flux')

print(f"Biomass flux matrix shape: {flux_matrix.shape}")
print(f"  Carbon sources (rows): {flux_matrix.shape[0]}")
print(f"  Organisms (columns): {flux_matrix.shape[1]}")
print()

# Save to CSV
output_path = 'results/biomass_flux_matrix.csv'
flux_matrix.to_csv(output_path)
print(f"Saved to: {output_path}")

Biomass flux matrix shape: (121, 44)
  Carbon sources (rows): 121
  Organisms (columns): 44

Saved to: results/biomass_flux_matrix.csv


## Summary

Quick preview of what we'll compare to experimental data in Notebook 2.

In [18]:
print("=" * 80)
print("FBA SIMULATION SUMMARY")
print("=" * 80)
print()
print(f"Total simulations run: {len(results_df):,}")
print(f"Successful (optimal): {len(successful):,}")
print(f"Failed: {len(results_df) - len(successful):,}")
print()
print(f"Growth predictions:")
for pred, count in prediction_counts.items():
    label = 'Growth' if pred == 1 else 'No Growth'
    print(f"  {label}: {count:,} ({100*count/len(successful):.1f}%)")
print()
print(f"Outputs generated:")
print(f"  1. results/fba_simulation_results.csv - All {len(results_df):,} results")
print(f"  2. results/insilico_growth_predictions.csv - Binary matrix ({prediction_matrix.shape[0]} × {prediction_matrix.shape[1]})")
print(f"  3. results/biomass_flux_matrix.csv - Continuous flux matrix ({flux_matrix.shape[0]} × {flux_matrix.shape[1]})")
print()
print("Ready for Notebook 2: Compare predictions to experimental data!")

FBA SIMULATION SUMMARY

Total simulations run: 5,324
Successful (optimal): 5,324
Failed: 0

Growth predictions:
  No Growth: 4,185 (78.6%)
  Growth: 1,139 (21.4%)

Outputs generated:
  1. results/fba_simulation_results.csv - All 5,324 results
  2. results/insilico_growth_predictions.csv - Binary matrix (121 × 44)
  3. results/biomass_flux_matrix.csv - Continuous flux matrix (121 × 44)

Ready for Notebook 2: Compare predictions to experimental data!
