# Gem-Flux MCP Server: Basic Workflow

This notebook demonstrates the complete workflow for building, gapfilling, and analyzing a metabolic model using the Gem-Flux MCP Server.

## Workflow Overview

1. **Build Media** - Create growth medium from compounds
2. **Build Model** - Build draft metabolic model from protein sequences
3. **Gapfill Model** - Add reactions to enable growth
4. **Run FBA** - Analyze flux distribution and growth rate

## Prerequisites

- Gem-Flux MCP Server installed and running
- Python 3.11 environment
- ModelSEEDpy (Fxe/dev fork)
- COBRApy

## Setup

First, let's import the necessary modules and initialize the MCP tools.

In [1]:
# Import Gem-Flux MCP tools
from gem_flux_mcp.tools.media_builder import build_media
from gem_flux_mcp.tools.build_model import build_model
from gem_flux_mcp.tools.gapfill_model import gapfill_model
from gem_flux_mcp.tools.run_fba import run_fba

# Import database and storage modules
from gem_flux_mcp.database.loader import load_compounds_database, load_reactions_database
from gem_flux_mcp.database.index import DatabaseIndex
from gem_flux_mcp.templates.loader import load_templates
from gem_flux_mcp.media.atp_loader import load_atp_media
from gem_flux_mcp.media.predefined_loader import load_predefined_media
from gem_flux_mcp.storage.models import clear_all_models
from gem_flux_mcp.storage.media import clear_all_media

# Import types for requests
from gem_flux_mcp.types import (
    BuildMediaRequest,
    BuildModelRequest,
    GapfillModelRequest,
    RunFBARequest
)

import json
from pathlib import Path

print("✓ Imports successful")

modelseedpy 0.4.3
✓ Imports successful


## Initialize Database and Templates

Load the ModelSEED database and templates required for model building.

In [2]:
# Clear session storage
clear_all_models()
clear_all_media()
print("✓ Session storage cleared")

# Load database
database_dir = Path("../data/database")
compounds_path = database_dir / "compounds.tsv"
reactions_path = database_dir / "reactions.tsv"

print(f"Loading database from {database_dir}...")
compounds_df = load_compounds_database(str(compounds_path))
reactions_df = load_reactions_database(str(reactions_path))
db_index = DatabaseIndex(compounds_df, reactions_df)
print(f"✓ Loaded {len(compounds_df)} compounds and {len(reactions_df)} reactions")

# Load templates
template_dir = Path("../data/templates")
print(f"\nLoading templates from {template_dir}...")
templates = load_templates(str(template_dir))
print(f"✓ Loaded {len(templates)} templates: {list(templates.keys())}")

# Load ATP gapfilling media
print("\nLoading ATP gapfilling media...")
atp_media = load_atp_media()
print(f"✓ Loaded {len(atp_media)} ATP test media")

# Load predefined media
print("\nLoading predefined media library...")
predefined_media = load_predefined_media()
print(f"✓ Loaded {len(predefined_media)} predefined media: {list(predefined_media.keys())}")

✓ Session storage cleared
Loading database from ../data/database...
✓ Loaded 33992 compounds and 43774 reactions

Loading templates from ../data/templates...


DatabaseError: Template 'GramNegative' has no metabolites.
A valid template must contain at least one metabolite.

## Step 1: Build Media

Create a minimal glucose medium for aerobic growth. This medium contains:
- D-Glucose (carbon source)
- O2 (electron acceptor)
- Essential minerals and ions

We'll use the predefined `glucose_minimal_aerobic` media for convenience.

In [3]:
# Option 1: Use predefined media (recommended)
media_id = "glucose_minimal_aerobic"
print(f"Using predefined media: {media_id}")
print("✓ Media ready")

# Option 2: Create custom media
# Uncomment the following to create custom media instead:
"""
custom_media_request = BuildMediaRequest(
    compounds=[
        "cpd00027",  # D-Glucose
        "cpd00007",  # O2
        "cpd00001",  # H2O
        "cpd00009",  # Phosphate
        "cpd00011",  # CO2
        "cpd00067",  # H+
        "cpd00013",  # NH3
        "cpd00048",  # SO4
        "cpd00205",  # K+
        "cpd00254",  # Mg
        "cpd00971",  # Na+
        "cpd00063",  # Ca2+
        "cpd00099",  # Cl-
        "cpd10515",  # Fe2+
        "cpd00030",  # Mn2+
        "cpd00034",  # Zn2+
        "cpd00058",  # Cu2+
        "cpd00149"   # Co2+
    ],
    default_uptake=100.0,
    custom_bounds={
        "cpd00027": (-5.0, 100.0),   # Limit glucose uptake
        "cpd00007": (-10.0, 100.0)   # Aerobic conditions
    }
)

media_response = build_media(custom_media_request, db_index)
media_id = media_response.media_id

print(f"\n✓ Created custom media: {media_id}")
print(f"  Compounds: {media_response.num_compounds}")
print(f"  Type: {media_response.media_type}")
"""

Using predefined media: glucose_minimal_aerobic
✓ Media ready


'\ncustom_media_request = BuildMediaRequest(\n    compounds=[\n        "cpd00027",  # D-Glucose\n        "cpd00007",  # O2\n        "cpd00001",  # H2O\n        "cpd00009",  # Phosphate\n        "cpd00011",  # CO2\n        "cpd00067",  # H+\n        "cpd00013",  # NH3\n        "cpd00048",  # SO4\n        "cpd00205",  # K+\n        "cpd00254",  # Mg\n        "cpd00971",  # Na+\n        "cpd00063",  # Ca2+\n        "cpd00099",  # Cl-\n        "cpd10515",  # Fe2+\n        "cpd00030",  # Mn2+\n        "cpd00034",  # Zn2+\n        "cpd00058",  # Cu2+\n        "cpd00149"   # Co2+\n    ],\n    default_uptake=100.0,\n    custom_bounds={\n        "cpd00027": (-5.0, 100.0),   # Limit glucose uptake\n        "cpd00007": (-10.0, 100.0)   # Aerobic conditions\n    }\n)\n\nmedia_response = build_media(custom_media_request, db_index)\nmedia_id = media_response.media_id\n\nprint(f"\n✓ Created custom media: {media_id}")\nprint(f"  Compounds: {media_response.num_compounds}")\nprint(f"  Type: {media_respo

## Step 2: Build Model

Build a draft metabolic model from E. coli protein sequences using the GramNegative template.

We'll create a small example with 5 key glycolysis proteins.

In [4]:
# Define example protein sequences (abbreviated for demonstration)
# In a real scenario, you would load these from a FASTA file or genome annotation
protein_sequences = {
    "hexokinase": "MKLVINLVGNSGLGKSTFTQRLINSLQIDEDVRKQLAELSALQRGVKVVLTGSKGVTT",
    "pgk": "MKQHKAMIVALERFRKEKRDAALLNLVRNPVADAGVIHYVDAKK",
    "gapdh": "MSVALERYGIDEVASIGGLVEVNNQYLNSSNGIIKQLLKKLKEK",
    "eno": "MGKVIASKLAGNKAPLYRHIADLAGNSQVSAFGPNAKIGDKIAEE",
    "pyrk": "MAILDSGIHNGIVEGLMTTVHSITATQKTVDGPSHKDWRGGRAAT"
}

# Build model request
build_request = BuildModelRequest(
    protein_sequences=protein_sequences,
    template="GramNegative",
    model_name="E_coli_demo",
    annotate_with_rast=False  # Use offline template matching
)

print("Building model...")
build_response = build_model(build_request, db_index)

model_id = build_response.model_id
print(f"\n✓ Model built: {model_id}")
print(f"  Reactions: {build_response.num_reactions}")
print(f"  Metabolites: {build_response.num_metabolites}")
print(f"  Genes: {build_response.num_genes}")
print(f"  Exchange reactions: {build_response.num_exchange_reactions}")
print(f"  Template: {build_response.template_used}")
print(f"  Compartments: {build_response.compartments}")
print(f"  Has biomass reaction: {build_response.has_biomass_reaction}")
print(f"\nNote: This is a DRAFT model (suffix: .draft)")
print("      It likely cannot predict growth without gapfilling.")

NameError: name 'BuildModelRequest' is not defined

## Step 3: Gapfill Model

Gapfill the draft model to enable growth in the glucose minimal medium.

Gapfilling adds missing reactions from the template database to enable the model to produce biomass.

This process has two stages:
1. **ATP Correction** - Ensures ATP production pathways work
2. **Genome-Scale Gapfilling** - Adds reactions for target media and growth

In [5]:
# Gapfill request
gapfill_request = GapfillModelRequest(
    model_id=model_id,
    media_id=media_id,
    target_growth_rate=0.01,  # Minimum viable growth
    gapfill_mode="complete"   # Full gapfilling (ATP + genome-scale)
)

print("Gapfilling model...")
print("This may take 1-5 minutes depending on model complexity.\n")

gapfill_response = gapfill_model(gapfill_request, db_index)

gapfilled_model_id = gapfill_response.model_id
print(f"\n✓ Model gapfilled: {gapfilled_model_id}")
print(f"  Original model: {gapfill_response.original_model_id}")
print(f"  Reactions added: {gapfill_response.num_reactions_added}")
print(f"  Growth rate before: {gapfill_response.growth_rate_before:.3f} hr⁻¹")
print(f"  Growth rate after: {gapfill_response.growth_rate_after:.3f} hr⁻¹")

# Display statistics
stats = gapfill_response.gapfill_statistics
print(f"\nGapfilling Statistics:")
print(f"  ATP Correction:")
print(f"    Media tested: {stats['atp_gapfill']['media_tested']}")
print(f"    Media feasible: {stats['atp_gapfill']['media_feasible']}")
print(f"    Reactions added: {stats['atp_gapfill']['reactions_added']}")
print(f"  Genome-Scale:")
print(f"    Reactions added: {stats['genome_gapfill']['reactions_added']}")
print(f"    Reactions reversed: {stats['genome_gapfill']['reactions_reversed']}")

# Display added reactions
if gapfill_response.reactions_added:
    print(f"\nAdded Reactions (showing first 5):")
    for rxn in gapfill_response.reactions_added[:5]:
        print(f"  {rxn['id']}: {rxn['name']}")
        print(f"    {rxn['equation']}")
        print(f"    Direction: {rxn['direction']}, Bounds: {rxn['bounds']}")

NameError: name 'GapfillModelRequest' is not defined

## Step 4: Run FBA

Execute Flux Balance Analysis to predict metabolic fluxes and growth rate.

FBA optimizes the biomass objective function subject to:
- Stoichiometric constraints (mass balance)
- Thermodynamic constraints (reaction directionality)
- Media constraints (nutrient availability)

In [6]:
# FBA request
fba_request = RunFBARequest(
    model_id=gapfilled_model_id,
    media_id=media_id,
    objective="bio1",      # Biomass reaction
    maximize=True,         # Maximize growth
    flux_threshold=1e-6    # Filter small fluxes
)

print("Running FBA...")
fba_response = run_fba(fba_request, db_index)

print(f"\n✓ FBA Complete")
print(f"  Status: {fba_response.status}")
print(f"  Objective value (growth rate): {fba_response.objective_value:.3f} hr⁻¹")
print(f"  Active reactions: {fba_response.active_reactions}")
print(f"  Total flux: {fba_response.total_flux:.1f} mmol/gDW/h")

# Display uptake fluxes
print(f"\nUptake Fluxes (top 5):")
for cpd_id, flux_info in list(fba_response.uptake_fluxes.items())[:5]:
    print(f"  {flux_info['name']} ({cpd_id}): {flux_info['flux']:.3f} mmol/gDW/h")

# Display secretion fluxes
print(f"\nSecretion Fluxes (top 5):")
for cpd_id, flux_info in list(fba_response.secretion_fluxes.items())[:5]:
    print(f"  {flux_info['name']} ({cpd_id}): {flux_info['flux']:.3f} mmol/gDW/h")

# Display top internal fluxes
print(f"\nTop Internal Fluxes (showing first 10):")
for flux_info in fba_response.top_fluxes[:10]:
    print(f"  {flux_info['reaction_name']} ({flux_info['reaction_id']}): {flux_info['flux']:.3f}")

# Display summary
summary = fba_response.summary
print(f"\nSummary:")
print(f"  Uptake reactions: {summary['uptake_reactions']}")
print(f"  Secretion reactions: {summary['secretion_reactions']}")
print(f"  Internal reactions: {summary['internal_reactions']}")

NameError: name 'RunFBARequest' is not defined

## Interpretation

The FBA results show:

1. **Growth Rate**: The predicted growth rate in hr⁻¹ (reciprocal hours)
   - Typical E. coli values: 0.8-1.0 hr⁻¹ in glucose minimal aerobic

2. **Uptake Fluxes**: Nutrients consumed from the medium
   - Negative flux = uptake/consumption
   - Glucose and oxygen are typically the highest uptakes

3. **Secretion Fluxes**: Metabolic byproducts released
   - Positive flux = secretion/production
   - CO2 is typically the main secretion product in aerobic growth

4. **Internal Fluxes**: Metabolic pathway activity
   - Shows which reactions are active (flux > threshold)
   - High fluxes often indicate key pathways (glycolysis, TCA cycle)

## Workflow Complete!

We have successfully:
1. ✓ Created growth media
2. ✓ Built a draft metabolic model
3. ✓ Gapfilled the model for growth
4. ✓ Analyzed flux distribution and growth rate

## Model States

Notice the model ID transformations:
- **Draft model**: `E_coli_demo.draft`
- **Gapfilled model**: `E_coli_demo.draft.gf`

The original draft model is preserved in storage, allowing you to compare before/after gapfilling.

## Next Steps

Try modifying this workflow:
- Use different media (anaerobic, pyruvate)
- Build larger models with more proteins
- Analyze different objective functions
- Compare multiple gapfilling iterations

See other notebooks:
- `02_database_lookups.ipynb` - Explore compound/reaction databases
- `03_session_management.ipynb` - Manage models and media
- `04_error_handling.ipynb` - Handle common errors