# Virtual Lab: AI-Guided Metabolic Engineering with Pathway Design

This notebook demonstrates how LLM agents collaborate to design **heterologous metabolic pathways** and engineer strains for target compound production.

## Overview

This workflow includes:
1. **Project Planning** - Define target compound and goals
2. **Model Selection** - Choose appropriate organism/model
3. **Pathway Design** ⭐ - Design pathway if target not in model
4. **Model Modification** - Add reactions to model
5. **Gene Knockout Analysis** - Optimize production
6. **Strain Design** - Combine pathway + knockouts
7. **Validation Planning** - Experimental protocols

---

In [None]:
import sys
sys.path.append('../src')
sys.path.append('.')
sys.path.append('scripts')

from virtual_lab import run_meeting
from metabolic_constants import (
    METABOLIC_TEAM,
    MODEL_SETUP_TEAM,
    PRODUCTION_TEAM,
    PI,
    PATHWAY_DESIGNER,
    METABOLIC_ENGINEER,
    COMPUTATIONAL_BIOLOGIST,
    SYSTEMS_BIOLOGIST,
    MODEL_CRITIC,
    FEASIBILITY_CRITIC
)

import pandas as pd
from pathlib import Path

# Import pathway design tools
import cobra
from pathway_designer_tools import PathwayDesigner, EXAMPLE_PATHWAYS

# Create discussions directory
discussions_dir = Path('discussions_pathway')
discussions_dir.mkdir(exist_ok=True)

print("Virtual Lab Metabolic Engineering with Pathway Design initialized!")
print(f"Available example pathways: {list(EXAMPLE_PATHWAYS.keys())}")

## Step 1: Project Planning - Target Compound Selection

The team decides what to produce and whether pathway engineering is needed.

In [None]:
planning_agenda = """
# Metabolic Engineering Project: Target Compound Production

## Project Goal
Engineer E. coli to produce a target compound: **1,3-propanediol (1,3-PDO)**
(Can be changed to: ethanol, succinate, or any other compound)

## Background
1,3-PDO is a valuable chemical used in:
- Polytrimethylene terephthalate (PTT) polymer production
- Cosmetics and personal care products
- Industrial solvents

## Key Questions
1. Does E. coli naturally produce 1,3-PDO?
2. If not, what heterologous pathway is needed?
3. What are the precursors (substrate)?
4. What enzymes are required?
5. Which organism has the pathway naturally?

## Discussion Points
1. **Native vs. Heterologous**: Is a new pathway needed?
2. **Substrate Selection**: What to use as starting material (glucose, glycerol, etc.)?
3. **Pathway Route**: What reactions are needed?
4. **Cofactor Requirements**: NADH, NADPH, ATP balance?
5. **Model Selection**: Which E. coli model to use?

## Search PubMed
Please search for:
- "1,3-propanediol production E. coli"
- "glycerol dehydratase 1,3-propanediol"
- "Klebsiella pneumoniae dha operon"
- "metabolic engineering 1,3-PDO"

## Deliverables
- Clear understanding of pathway requirements
- Identified source organism for genes
- List of reactions to add
- Selected metabolic model
"""

planning_meeting = run_meeting(
    agent_team=[PI, PATHWAY_DESIGNER, METABOLIC_ENGINEER, SYSTEMS_BIOLOGIST],
    agenda=planning_agenda,
    model="gpt-4o-2024-08-06",
    temperature=0.7,
    num_rounds=3
)

(discussions_dir / 'project_planning').mkdir(exist_ok=True)
planning_meeting.save(discussions_dir / 'project_planning')

## Step 2: Pathway Design

The Pathway Designer agent designs the heterologous pathway.

In [None]:
pathway_design_task = """
# Heterologous Pathway Design for 1,3-Propanediol

## Task
Design a complete heterologous pathway for 1,3-PDO production in E. coli.

## Based on Planning Discussion
The team has decided on:
- Target: 1,3-propanediol
- Substrate: Glycerol (from glycolysis or external)
- Source organism: Klebsiella pneumoniae (dha genes)

## Your Task: Design the Pathway

### 1. Identify Required Reactions
List each reaction needed:
- Reaction 1: Substrate → Intermediate
- Reaction 2: Intermediate → Product
- etc.

### 2. Enzyme Selection
For each reaction, specify:
- Enzyme name
- EC number
- Source organism
- Gene ID

### 3. Stoichiometry and Cofactors
For each reaction:
- Write balanced equation
- Include cofactors (NAD+, NADH, ATP, etc.)
- Consider H+, H2O balance

### 4. Search PubMed
For key enzymes:
- Find characterization studies
- Check expression in E. coli
- Identify any known issues

## Example Pathway (for reference)
Glycerol → 3-Hydroxypropionaldehyde → 1,3-Propanediol

Reaction 1: Glycerol dehydratase (dhaB1, dhaB2, dhaB3)
  Glycerol → 3-HPA + H2O

Reaction 2: 1,3-propanediol oxidoreductase (dhaT)
  3-HPA + NADH + H+ → 1,3-PDO + NAD+

## Deliverable
Complete pathway specification ready to add to metabolic model:
- List of reactions with IDs
- Stoichiometry for each
- Gene-protein-reaction associations
- Cofactor requirements
"""

pathway_design_meeting = run_meeting(
    agent=PATHWAY_DESIGNER,
    task=pathway_design_task,
    critic=MODEL_CRITIC,
    model="gpt-4o-2024-08-06",
    temperature=0.5,
    num_rounds=2
)

(discussions_dir / 'pathway_design').mkdir(exist_ok=True)
pathway_design_meeting.save(discussions_dir / 'pathway_design')

## Step 3: Add Pathway to Model

The Computational Biologist implements the pathway in the model.

In [None]:
# Load model
print("Loading E. coli model...")
model = cobra.io.load_model("textbook")  # or "iML1515"
print(f"Model: {model.id}")
print(f"Reactions: {len(model.reactions)}")
print(f"Genes: {len(model.genes)}")

# Initialize pathway designer
designer = PathwayDesigner(model)

# Add 1,3-PDO pathway (example)
print("\n" + "="*80)
print("ADDING 1,3-PROPANEDIOL PATHWAY")
print("="*80)

# Step 1: Add metabolites if needed
print("\nStep 1: Adding metabolites...")

# Check if glycerol exists
if not designer.check_metabolite_exists("glyc_c"):
    glyc = designer.add_metabolite("glyc_c", "Glycerol", "C3H8O3", "c")
else:
    glyc = model.metabolites.get_by_id("glyc_c")
    print(f"  Glycerol already exists: {glyc.name}")

# Add 3-HPA (intermediate)
hpa = designer.add_metabolite(
    "3hpald_c",
    "3-Hydroxypropionaldehyde",
    "C3H6O2",
    "c"
)

# Add 1,3-PDO (product)
pdo = designer.add_metabolite(
    "13ppd_c",
    "1,3-Propanediol",
    "C3H8O2",
    "c"
)

# Step 2: Add reactions
print("\nStep 2: Adding pathway reactions...")

# Reaction 1: Glycerol → 3-HPA + H2O
rxn1 = designer.add_reaction(
    reaction_id="PDO_DhaB",
    name="Glycerol dehydratase",
    metabolites={
        glyc: -1,
        hpa: 1,
        model.metabolites.get_by_id("h2o_c"): 1
    },
    lower_bound=0,
    upper_bound=1000,
    gene_reaction_rule="dhaB1 and dhaB2 and dhaB3"  # K. pneumoniae genes
)

# Reaction 2: 3-HPA + NADH + H+ → 1,3-PDO + NAD+
rxn2 = designer.add_reaction(
    reaction_id="PDO_DhaT",
    name="1,3-propanediol oxidoreductase",
    metabolites={
        hpa: -1,
        model.metabolites.get_by_id("nadh_c"): -1,
        model.metabolites.get_by_id("h_c"): -1,
        pdo: 1,
        model.metabolites.get_by_id("nad_c"): 1
    },
    lower_bound=0,
    upper_bound=1000,
    gene_reaction_rule="dhaT"  # K. pneumoniae gene
)

# Step 3: Add exchange reaction for product
print("\nStep 3: Adding exchange reaction...")
ex_pdo = designer.add_exchange_reaction(
    pdo,
    lower_bound=0,
    upper_bound=1000
)

# Step 4: Add glycerol uptake if needed
if not designer.check_reaction_exists("EX_glyc_e"):
    print("\nStep 4: Adding glycerol uptake...")
    # Add extracellular glycerol
    glyc_e = designer.add_metabolite("glyc_e", "Glycerol", "C3H8O3", "e")
    # Transport
    designer.add_transport_reaction(glyc_e, glyc)
    # Exchange
    designer.add_exchange_reaction(glyc_e, lower_bound=-10, upper_bound=0)

# Summary
print("\n" + "="*80)
print("PATHWAY ADDITION COMPLETE")
print("="*80)
print(f"\nAdded {len(designer.added_reactions)} reactions")
print("\nSummary:")
print(designer.get_summary().to_string(index=False))

## Step 4: Test Pathway Feasibility

Check if the pathway can produce the target.

In [None]:
# Test production
feasibility = designer.test_pathway_feasibility("13ppd_c")

if feasibility["feasible"]:
    print("\n✓ SUCCESS: Pathway can produce 1,3-propanediol!")
    print(f"  Production rate: {feasibility['production_flux']:.4f} mmol/gDW/h")
    print(f"  Growth rate: {feasibility['growth_rate']:.4f} 1/h")
else:
    print("\n✗ ISSUE: Pathway cannot produce target")
    print("  Need to troubleshoot...")

## Step 5: Optimize with Gene Knockouts

Now that pathway is added, identify knockouts to enhance production.

In [None]:
knockout_optimization_task = """
# Gene Knockout Optimization for 1,3-PDO Production

## Context
We have added the 1,3-PDO pathway to the E. coli model.
The pathway is FUNCTIONAL and can produce 1,3-PDO.

## Task
Identify gene knockouts that will:
1. **Increase 1,3-PDO production**
2. **Couple production with growth** (ideal)
3. **Minimize competing pathways**

## Analysis to Perform

### Option 1: Use our metabolic_target_finder.py
```bash
# First save the modified model
import cobra
cobra.io.write_sbml_model(model, "ecoli_with_pdo_pathway.xml")

# Then run analysis
python scripts/metabolic_target_finder.py \
    --model_file ecoli_with_pdo_pathway.xml \
    --output_dir ../pdo_production_analysis \
    --ko_methods single production \
    --target_metabolite 13ppd_c \
    --visualization
```

### Option 2: Manual analysis in Python
Test knockouts that might help:
- **Competing pathways**: Genes using glycerol for other purposes
- **Redox balance**: Genes affecting NADH availability
- **Growth coupling**: Force production through biomass

## Discussion Points
1. Which pathways compete for glycerol?
2. How to increase NADH availability?
3. Should we knock out alternative carbon sinks?
4. What about fermentation vs. respiration?

## Deliverable
- Top 10 gene knockout candidates
- Expected production improvement
- Metabolic rationale for each
"""

knockout_meeting = run_meeting(
    agent_team=PRODUCTION_TEAM,
    agenda=knockout_optimization_task,
    model="gpt-4o-2024-08-06",
    temperature=0.6,
    num_rounds=3
)

(discussions_dir / 'knockout_optimization').mkdir(exist_ok=True)
knockout_meeting.save(discussions_dir / 'knockout_optimization')

## Step 6: Final Strain Design

Combine heterologous pathway + optimized knockouts.

In [None]:
final_design_task = """
# Final Engineered Strain Design

## Components

### 1. Heterologous Pathway
Genes to ADD (from K. pneumoniae):
- dhaB1, dhaB2, dhaB3: Glycerol dehydratase complex
- dhaT: 1,3-propanediol oxidoreductase

### 2. Gene Knockouts
[Based on previous optimization analysis]
- Knockout 1: [gene] - rationale
- Knockout 2: [gene] - rationale
- etc.

## Strain Construction Strategy

### Stage 1: Add Pathway
1. Clone dha genes into expression vector
2. Optimize codon usage for E. coli
3. Select appropriate promoter (constitutive vs. inducible)
4. Transform into E. coli
5. Verify 1,3-PDO production

### Stage 2: Add Knockouts
1. Use CRISPR-Cas9 or lambda red recombination
2. Start with single knockouts
3. Test each individually
4. Combine best performers
5. Measure production improvements

### Stage 3: Optimization
1. Fermentation condition optimization
2. Fed-batch strategies
3. Evolutionary adaptation if needed

## Expected Performance
Based on computational predictions:
- Wildtype (with pathway): X mmol/gDW/h
- Optimized strain: Y mmol/gDW/h (Z-fold improvement)

## Deliverable
Complete strain specification:
- Genotype
- Plasmids/integrated genes
- Construction protocol
- Testing plan
"""

final_design_meeting = run_meeting(
    agent=METABOLIC_ENGINEER,
    task=final_design_task,
    critic=FEASIBILITY_CRITIC,
    model="gpt-4o-2024-08-06",
    temperature=0.5,
    num_rounds=2
)

(discussions_dir / 'final_strain_design').mkdir(exist_ok=True)
final_design_meeting.save(discussions_dir / 'final_strain_design')

## Summary

This workflow demonstrates **complete metabolic engineering** with:

### ✅ Pathway Design
- AI agents identify needed heterologous pathways
- Search literature for enzyme sources
- Design balanced reactions with cofactors

### ✅ Model Modification
- Add new metabolites and reactions
- Implement gene-protein-reaction rules
- Test pathway feasibility

### ✅ Production Optimization
- Identify gene knockouts to enhance production
- Couple production with growth
- Eliminate competing pathways

### ✅ Integrated Strain Design
- Combine heterologous genes + knockouts
- Plan construction strategy
- Predict performance improvements

---

**Key Advantage**: AI agents bring literature knowledge and expertise to design biologically sound pathways, not just optimize existing ones!

**Saved in**: `discussions_pathway/` directory