# Virtual Lab: AI-Guided Metabolic Engineering

This notebook demonstrates how LLM agents collaborate to identify gene knockout targets using genome-scale metabolic models.

## Overview

The Virtual Lab assembles a team of AI agents with different expertise:
- **Principal Investigator**: Coordinates the project
- **Metabolic Engineer**: Designs knockout strategies
- **Systems Biologist**: Analyzes metabolic networks
- **Computational Biologist**: Implements COBRApy simulations
- **Experimental Biologist**: Assesses feasibility

The agents:
1. **Discuss** metabolic engineering strategies
2. **Select** appropriate metabolic model
3. **Design** gene knockout experiments
4. **Interpret** FBA/FVA results
5. **Propose** validation experiments

---

In [None]:
import sys
sys.path.append('../src')
sys.path.append('.')

from virtual_lab import run_meeting
from metabolic_constants import (
    METABOLIC_TEAM,
    MODEL_SETUP_TEAM,
    ANALYSIS_TEAM,
    PRODUCTION_TEAM,
    DRUG_DISCOVERY_TEAM,
    INTERPRETATION_TEAM,
    PI,
    METABOLIC_ENGINEER,
    COMPUTATIONAL_BIOLOGIST,
    MODEL_CRITIC,
    FEASIBILITY_CRITIC
)

import pandas as pd
import json
from pathlib import Path

# Create discussions directory
discussions_dir = Path('discussions')
discussions_dir.mkdir(exist_ok=True)

print("Virtual Lab Metabolic Engineering initialized!")
print(f"Team size: {len(METABOLIC_TEAM)} agents")

## Step 1: Project Planning Meeting

The team discusses the metabolic engineering goal and strategy.

In [None]:
planning_agenda = """
# Metabolic Engineering Project Planning

## Project Goal
Design E. coli strains for improved bioproduction using gene knockout strategies.
(Can be adapted for: antibiotic targets, cancer metabolism, etc.)

## Approach
Use genome-scale metabolic model (GEM) with constraint-based modeling:
- Flux Balance Analysis (FBA)
- Gene knockout simulations
- Essential gene identification
- Growth-coupled production analysis

## Discussion Points
1. What is our specific engineering goal? (e.g., produce succinate, ethanol, etc.)
2. Which metabolic model should we use? (iML1515, textbook, custom)
3. What types of knockouts should we consider? (single, double, multiple)
4. How do we balance growth vs. production?
5. What experimental validation will be needed?

## Available Models
- textbook: E. coli core (72 genes, fast)
- iML1515: E. coli genome-scale (1,877 genes, comprehensive)
- iJO1366: Previous E. coli model
- Recon3D: Human (for drug targets)
- Custom SBML models

## Deliverables
- Clear engineering objective
- Selected metabolic model
- Analysis workflow plan
"""

planning_meeting = run_meeting(
    agent_team=MODEL_SETUP_TEAM,
    agenda=planning_agenda,
    model="gpt-4o-2024-08-06",
    temperature=0.7,
    num_rounds=3
)

(discussions_dir / 'project_planning').mkdir(exist_ok=True)
planning_meeting.save(discussions_dir / 'project_planning')

## Step 2: Model Selection and Validation

The team evaluates the chosen metabolic model.

In [None]:
model_agenda = """
# Metabolic Model Selection and Validation

## Context
Based on planning, we need to select and validate a metabolic model.

## Model Considerations
1. **Coverage**: Does it include pathways of interest?
2. **Validation**: Is it well-validated experimentally?
3. **Curation**: Quality of annotations and gene-protein-reaction mappings
4. **Computational cost**: Balance between detail and speed

## Discussion Points
1. Which model best fits our project goals?
2. Are there any gaps or limitations in the model?
3. Do we need to add/modify reactions?
4. What growth conditions should we simulate?

## Search PubMed
Please search for:
- Publications about the selected model
- Experimental validation of model predictions
- Prior metabolic engineering studies with this organism

## Deliverable
Justified model selection with known limitations and growth conditions.
"""

model_meeting = run_meeting(
    agent_team=MODEL_SETUP_TEAM,
    agenda=model_agenda,
    model="gpt-4o-2024-08-06",
    temperature=0.7,
    num_rounds=2
)

(discussions_dir / 'model_selection').mkdir(exist_ok=True)
model_meeting.save(discussions_dir / 'model_selection')

## Step 3: Execute Metabolic Analysis

The computational biologist runs the metabolic model analysis.

In [None]:
analysis_request = """
# Metabolic Model Analysis Implementation

## Task
Please implement the metabolic modeling analysis using our metabolic_target_finder.py script.

## Command to Run (Example with E. coli core model)
```bash
python scripts/metabolic_target_finder.py \
    --model_id textbook \
    --output_dir ../metabolic_results_agent \
    --ko_methods single essential fva \
    --growth_threshold 0.1 \
    --visualization
```

## For Production Engineering (if applicable)
```bash
python scripts/metabolic_target_finder.py \
    --model_id iML1515 \
    --output_dir ../metabolic_results_agent \
    --ko_methods single production \
    --target_metabolite succ_c \
    --visualization
```

## After Running
1. Examine wildtype growth rate
2. Review essential genes
3. Identify growth-reducing non-essential genes (potential targets)
4. Analyze FVA results for flexible reactions
5. Check production coupling (if applicable)

Please provide:
- Model statistics summary
- Number of essential vs. non-essential genes
- Top 20 knockout targets
- Key findings from FVA
"""

analysis_meeting = run_meeting(
    agent=COMPUTATIONAL_BIOLOGIST,
    task=analysis_request,
    critic=MODEL_CRITIC,
    model="gpt-4o-2024-08-06",
    temperature=0.3,
    num_rounds=2
)

(discussions_dir / 'analysis_execution').mkdir(exist_ok=True)
analysis_meeting.save(discussions_dir / 'analysis_execution')

## Step 4: Results Interpretation

The team interprets knockout predictions and flux distributions.

In [None]:
interpretation_agenda = """
# Metabolic Modeling Results Interpretation

## Results Summary
[Computational biologist will have provided analysis results]

Example findings:
- Essential genes: X genes
- Growth-reducing targets: Y genes  
- Top targets: gene1, gene2, gene3, etc.
(Actual genes will come from analysis)

## Discussion Points
1. **Biological Interpretation**: Why do these knockouts affect growth?
2. **Metabolic Mechanisms**: What pathways are impacted?
3. **Alternative Pathways**: Are there metabolic bypasses?
4. **Bottlenecks**: What are the key limiting reactions?
5. **Engineering Strategy**: Which knockouts are most promising?

## Tasks for Team Members

**Systems Biologist**: 
- Explain network-level effects of knockouts
- Identify affected subsystems and pathways
- Describe flux rerouting

**Metabolic Engineer**:
- Assess engineering potential
- Rank targets by promise
- Propose combination strategies

**Experimental Biologist**:
- Search PubMed for prior knockout studies
- Assess experimental feasibility
- Identify potential challenges

## Search PubMed
For top gene targets, search for:
- Prior knockout/knockdown studies
- Gene function and regulation
- Metabolic engineering applications

## Deliverable
Comprehensive interpretation with prioritized knockout targets.
"""

interpretation_meeting = run_meeting(
    agent_team=INTERPRETATION_TEAM,
    agenda=interpretation_agenda,
    model="gpt-4o-2024-08-06",
    temperature=0.7,
    num_rounds=3
)

(discussions_dir / 'results_interpretation').mkdir(exist_ok=True)
interpretation_meeting.save(discussions_dir / 'results_interpretation')

## Step 5: Strain Design Strategy

The metabolic engineer proposes specific strain designs.

In [None]:
design_agenda = """
# Strain Design and Engineering Strategy

## Context
Based on computational predictions, we need to design specific strains.

## Design Considerations
1. **Single vs. Multiple Knockouts**: Start simple or combinatorial?
2. **Growth vs. Production Trade-off**: Balance viability and yield
3. **Genetic Stability**: Will knockouts be evolutionarily stable?
4. **Metabolic Burden**: Impact on cell fitness
5. **Compensatory Pathways**: Risk of evolution circumventing knockout

## Discussion Points
1. Which knockouts should be prioritized for experimental testing?
2. Should we test single knockouts first or go directly to combinations?
3. What backup strategies if primary targets don't work?
4. How can we stabilize the engineered phenotype?
5. What are the practical genetic engineering considerations?

## For Bioproduction Projects
- How to couple growth with production?
- What fermentation conditions optimize yield?
- Are there any regulatory bottlenecks?

## For Drug Discovery Projects
- Which essential genes are most druggable?
- Are there synthetic lethal pairs?
- How specific are targets (avoid host toxicity)?

## Deliverable
Prioritized list of strain designs with experimental protocols.
"""

design_meeting = run_meeting(
    agent=METABOLIC_ENGINEER,
    task=design_agenda,
    critic=FEASIBILITY_CRITIC,
    model="gpt-4o-2024-08-06",
    temperature=0.6,
    num_rounds=2
)

(discussions_dir / 'strain_design').mkdir(exist_ok=True)
design_meeting.save(discussions_dir / 'strain_design')

## Step 6: Experimental Validation Plan

The team plans experimental validation.

In [None]:
validation_agenda = """
# Experimental Validation Plan

## Current Status
We have computational predictions for gene knockouts.

## Validation Experiments
1. **Construct Strains**: Generate knockouts (CRISPR, lambda red, transposon)
2. **Growth Assays**: Measure growth rates in different media
3. **Production Assays**: Quantify target metabolite (HPLC, GC-MS)
4. **Omics Validation**: Transcriptomics/proteomics to verify flux predictions
5. **Evolutionary Stability**: Serial passage experiments

## Discussion Points
1. What is the minimal experiment to test top predictions?
2. Which strains should be constructed first?
3. What growth conditions replicate model assumptions?
4. How to measure production/flux changes?
5. What are the success criteria?

## Search PubMed
Please search for:
- Standard protocols for strain construction
- Prior validation of similar computational predictions
- Relevant analytical methods

## Deliverable
Detailed experimental protocol with timeline and resources needed.
"""

validation_meeting = run_meeting(
    agent_team=METABOLIC_TEAM,
    agenda=validation_agenda,
    model="gpt-4o-2024-08-06",
    temperature=0.7,
    num_rounds=3
)

(discussions_dir / 'validation_planning').mkdir(exist_ok=True)
validation_meeting.save(discussions_dir / 'validation_planning')

## Step 7: Final Recommendations

The PI synthesizes all discussions.

In [None]:
final_agenda = """
# Final Metabolic Engineering Recommendations

## Project Summary
We have completed:
1. ✓ Project planning and model selection
2. ✓ Genome-scale metabolic modeling
3. ✓ Gene knockout target identification
4. ✓ Results interpretation
5. ✓ Strain design and validation planning

## Task for PI
Please synthesize all discussions and provide:

1. **Top 5 Engineering Targets**: With rationale for each
2. **Recommended Strain Designs**: Specific gene deletions to construct
3. **Key Insights**: Most important computational findings
4. **Experimental Priorities**: What to test first
5. **Success Metrics**: How to evaluate engineered strains
6. **Risk Mitigation**: Backup strategies if primary approach fails

## Consider
- Computational prediction confidence
- Experimental feasibility
- Expected impact (growth/production changes)
- Novelty vs. validation of known targets
- Resource requirements

## Deliverable
Executive summary with clear next steps for metabolic engineering.
"""

final_meeting = run_meeting(
    agent=PI,
    task=final_agenda,
    critic=FEASIBILITY_CRITIC,
    model="gpt-4o-2024-08-06",
    temperature=0.5,
    num_rounds=2
)

(discussions_dir / 'final_recommendations').mkdir(exist_ok=True)
final_meeting.save(discussions_dir / 'final_recommendations')

## Summary

This workflow demonstrates how AI agents can:
1. **Plan** metabolic engineering projects
2. **Select** appropriate metabolic models
3. **Execute** constraint-based modeling
4. **Interpret** flux distributions and knockout effects
5. **Design** engineered strains
6. **Plan** experimental validation

The agents use:
- **PubMed search** for literature context
- **Domain expertise** in metabolism and engineering
- **Systems thinking** for network-level understanding
- **Critical evaluation** of predictions

All discussions saved in `discussions/` directory.

---

**Next Steps:**
1. Review agent discussions in `discussions/`
2. Examine computational results in `metabolic_results_agent/`
3. Construct top-priority strains
4. Validate with growth/production assays
5. Iterate based on experimental results