# Gem-Flux MCP Server: Database Lookups

This notebook demonstrates how to search and retrieve compound and reaction information from the ModelSEED database.

## Database Overview

The ModelSEED database contains:
- **33,993 compounds** with IDs, names, formulas, and cross-references
- **43,775 reactions** with equations, EC numbers, and pathway information

## Available Tools

1. **Compound Tools**:
   - `get_compound_name` - Get detailed information for a specific compound ID
   - `search_compounds` - Search compounds by name, formula, or alias

2. **Reaction Tools**:
   - `get_reaction_name` - Get detailed information for a specific reaction ID
   - `search_reactions` - Search reactions by name, EC number, or pathway

## Setup

In [1]:
# Import database tools and their request types
from gem_flux_mcp.tools.compound_lookup import (
    get_compound_name, 
    search_compounds,
    GetCompoundNameRequest,
    SearchCompoundsRequest
)
from gem_flux_mcp.tools.reaction_lookup import (
    get_reaction_name, 
    search_reactions,
    GetReactionNameRequest,
    SearchReactionsRequest
)

# Import database loading
from gem_flux_mcp.database.loader import load_compounds_database, load_reactions_database
from gem_flux_mcp.database.index import DatabaseIndex

from pathlib import Path
import json

print("✓ Imports successful")

modelseedpy 0.4.3
✓ Imports successful


## Load Database

In [2]:
# Load database
database_dir = Path("../data/database")
compounds_path = database_dir / "compounds.tsv"
reactions_path = database_dir / "reactions.tsv"

print(f"Loading database from {database_dir}...")
compounds_df = load_compounds_database(str(compounds_path))
reactions_df = load_reactions_database(str(reactions_path))
db_index = DatabaseIndex(compounds_df, reactions_df)
print(f"✓ Loaded {len(compounds_df)} compounds and {len(reactions_df)} reactions")

Loading database from ../data/database...
✓ Loaded 33992 compounds and 43774 reactions


## Part 1: Compound Lookups

### Get Compound by ID

Use `get_compound_name` to retrieve detailed information for a specific compound ID.

In [3]:
# Lookup glucose (cpd00027)
request = GetCompoundNameRequest(compound_id="cpd00027")
response = get_compound_name(request, db_index)

print("Compound Information:")
print(f"  ID: {response['id']}")
print(f"  Name: {response['name']}")
print(f"  Abbreviation: {response['abbreviation']}")
print(f"  Formula: {response['formula']}")
print(f"  Mass: {response['mass']} g/mol")
print(f"  Charge: {response['charge']}")
print(f"  InChI Key: {response['inchikey']}")
print(f"  SMILES: {response['smiles']}")

print(f"\n  Aliases: {response['aliases']}")
# External IDs field removed - not in response

Compound Information:
  ID: cpd00027
  Name: D-Glucose
  Abbreviation: glc-D
  Formula: C6H12O6
  Mass: 180.0 g/mol
  Charge: 0
  InChI Key: WQZGKKKJIJFFOK-GASJEMHNSA-N
  SMILES: OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O

  Aliases: {'Name': ['D-(+)-Glucose', 'D-Glucopyranose', 'D-Glucose', 'Dextrose', 'Glucopyranose', 'Glucose', 'Grape sugar'], 'AlgaGEM': ['S_D_45_Glucose_c', 'S_Glucose_c', 'S_Glucose_ext_b', 'S_Glucose_p'], 'AraGEM': ['S_D_45_Glucose_c'], 'BiGG': ['glc__D'], 'KEGG': ['C00031', 'C00293'], 'MetaCyc': ['Glucopyranose']}


### Search Compounds by Name

Use `search_compounds` to find compounds by name, formula, or alias.

In [4]:
# Search for glucose-related compounds
request = SearchCompoundsRequest(query="glucose", limit=10)
response = search_compounds(request, db_index)

print(f"Search Results: {response['num_results']} compounds found")
print(f"Truncated: {response['truncated']}\n")

for result in response['results']:
    print(f"  {result['id']}: {result['name']}")
    print(f"    Formula: {result['formula']}")
    print(f"    Match: {result['match_field']} ({result['match_type']})")
    print()

Search Results: 10 compounds found
Truncated: True

  cpd27165: Glucose
    Formula: C6H12O6
    Match: name (exact)

  cpd07132: 1,2,3,4-Tetragalloyl-alpha-D-glucose
    Formula: C34H28O22
    Match: name (partial)

  cpd02748: 1,2,3,6-Tetrakis-O-galloyl-beta-D-glucose
    Formula: C34H28O22
    Match: name (partial)

  cpd23643: 1,6-Anhydro-beta-D-glucose
    Formula: C6H10O5
    Match: name (partial)

  cpd28942: 1,6-anhydroglucose
    Formula: C6H10O5
    Match: name (partial)

  cpd07321: 1-Caffeoyl-beta-D-glucose
    Formula: C15H18O9
    Match: name (partial)

  cpd17856: 1-Feruloyl-D-glucose
    Formula: C16H20O9
    Match: name (partial)

  cpd02671: 1-O,2-O,6-O-Trigalloyl-beta-D-glucose
    Formula: C27H24O18
    Match: name (partial)

  cpd02532: 1-O,6-O-Digalloyl-beta-D-glucose
    Formula: C20H20O14
    Match: name (partial)

  cpd32975: 1-O-4-hydroxybenzoyl-beta-D-glucose
    Formula: C13H16O8
    Match: name (partial)



### Search Priority Demonstration

The search uses priority-based matching:
1. Exact ID match (highest priority)
2. Exact name match
3. Exact abbreviation match
4. Partial name match
5. Formula match
6. Alias match (lowest priority)

In [5]:
# Search by exact ID
request = SearchCompoundsRequest(query="cpd00027", limit=5)
response = search_compounds(request, db_index)
print("Search by exact ID (cpd00027):")
print(f"  First result: {response['results'][0]['name']}")
print(f"  Match type: {response['results'][0]['match_type']}\n")

# Search by formula
request = SearchCompoundsRequest(query="C6H12O6", limit=5)
response = search_compounds(request, db_index)
print("Search by formula (C6H12O6):")
for result in response['results'][:3]:
    print(f"  {result['name']} ({result['id']})")
print()

# Search with no results
request = SearchCompoundsRequest(query="nonexistent_compound_xyz", limit=10)
response = search_compounds(request, db_index)
print("Search with no results:")
print(f"  Num results: {response['num_results']}")
print(f"  Suggestions: {response['suggestions']}")

Search by exact ID (cpd00027):
  First result: D-Glucose
  Match type: exact

Search by formula (C6H12O6):
  1,4-beta-D-Mannooligosaccharide (cpd17398)
  1L-chiro-inositol (cpd25015)
  aldehydo-D-gulose (cpd34373)

Search with no results:
  Num results: 0
  Suggestions: ['Try a more general search term', 'Check spelling of compound name', 'Search by formula (e.g., C6H12O6)', 'Search by database ID from other sources (KEGG, BiGG)']


### Explore Common Metabolites

In [6]:
# Common metabolites to explore
common_metabolites = [
    "cpd00001",  # H2O
    "cpd00002",  # ATP
    "cpd00008",  # ADP
    "cpd00067",  # H+
    "cpd00020",  # Pyruvate
    "cpd00036",  # Succinate
    "cpd00024",  # Acetyl-CoA
]

print("Common Metabolites:")
for cpd_id in common_metabolites:
    request = GetCompoundNameRequest(compound_id=cpd_id)
    response = get_compound_name(request, db_index)
    print(f"  {cpd_id}: {response['name']} ({response['formula']})")

Common Metabolites:
  cpd00001: H2O (H2O)
  cpd00002: ATP (C10H13N5O13P3)
  cpd00008: ADP (C10H13N5O10P2)
  cpd00067: H+ (H)
  cpd00020: Pyruvate (C3H3O3)
  cpd00036: Succinate (C4H4O4)
  cpd00024: 2-Oxoglutarate (C5H4O5)


## Part 2: Reaction Lookups

### Get Reaction by ID

Use `get_reaction_name` to retrieve detailed information for a specific reaction ID.

In [7]:
# Lookup hexokinase (rxn00148)
request = GetReactionNameRequest(reaction_id="rxn00148")
response = get_reaction_name(request, db_index)

print("Reaction Information:")
print(f"  ID: {response['id']}")
print(f"  Name: {response['name']}")
print(f"  Abbreviation: {response['abbreviation']}")
print(f"  Equation (readable): {response['equation']}")
print(f"  Equation (with IDs): {response['equation_with_ids']}")
print(f"  Reversibility: {response['reversibility']}")
print(f"  Direction: {response['direction']}")
print(f"  Is transport: {response['is_transport']}")
print(f"  EC numbers: {response['ec_numbers']}")
print(f"  Pathways: {response['pathways']}")
if response['deltag'] is not None:
    print(f"  ΔG: {response['deltag']} ± {response['deltagerr']} kcal/mol")

print(f"\n  Aliases: {response['aliases']}")
# External IDs field removed - not in response

Reaction Information:
  ID: rxn00148
  Name: ATP:pyruvate 2-O-phosphotransferase
  Abbreviation: R00200
  Equation (readable): (1) ATP + (1) Pyruvate <=> (1) ADP + (1) Phosphoenolpyruvate + (1) H+
  Equation (with IDs): (1) cpd00002[0] + (1) cpd00020[0] <=> (1) cpd00008[0] + (1) cpd00061[0] + (1) cpd00067[0]
  Reversibility: reversible
  Direction: bidirectional
  Is transport: False
  EC numbers: ['2.7.1.40']
  Pathways: ['ANAEROFRUCAT-PWY', 'rn00010']
  ΔG: 6.53 ± 0.1 kcal/mol

  Aliases: {'AraCyc': ['PEPDEPHOS-RXN'], 'BiGG': ['CDC19', 'PYK', 'PYK2'], 'BrachyCyc': ['PEPDEPHOS-RXN'], 'KEGG': ['R00200'], 'MetaCyc': ['PEPDEPHOS-RXN'], 'Name': ['ATP:pyruvate 2-O-phosphotransferase', 'ATP:pyruvate O2-phosphotransferase', 'phosphoenol transphosphorylase', 'phosphoenolpyruvate kinase', 'pyruvate kinase']}


### Search Reactions by Name

Use `search_reactions` to find reactions by name, EC number, or pathway.

In [8]:
# Search for hexokinase reactions
request = SearchReactionsRequest(query="hexokinase", limit=10)
response = search_reactions(request, db_index)

print(f"Search Results: {response['num_results']} reactions found")
print(f"Truncated: {response['truncated']}\n")

for result in response['results']:
    print(f"  {result['id']}: {result['name']}")
    print(f"    Equation: {result['equation']}")
    print(f"    EC: {result['ec_numbers']}")
    print(f"    Match: {result['match_field']} ({result['match_type']})")
    print()

Search Results: 10 reactions found
Truncated: True

  rxn19890: hexokinase
    Equation: (1) ATP + (1) D-Hexoses => (1) ADP + (1) H+ + (1) D-Hexose 6-phosphate
    EC: ['2.7.1.1']
    Match: name (exact)

  rxn19891: hexokinase
    Equation: (1) ATP + (1) D-Hexoses => (1) ADP + (1) H+ + (1) D-Hexose 6-phosphate
    EC: ['2.7.1.1']
    Match: name (exact)

  rxn34232: hexokinase
    Equation: (1) ATP + (1) D-Glucose => (1) ADP + (1) H+ + (1) D-glucose-6-phosphate
    EC: ['2.7.1.1', '2.7.1.2']
    Match: name (exact)

  rxn36734: hexokinase
    Equation: (1) ATP + (1) D-Allose => (1) ADP + (1) H+ + (1) D-Hexose 6-phosphate
    EC: ['2.7.1.1']
    Match: name (exact)

  rxn36735: hexokinase
    Equation: (1) ATP + (1) alpha-D-Glucose => (1) ADP + (1) H+ + (1) D-Hexose 6-phosphate
    EC: ['2.7.1.1']
    Match: name (exact)

  rxn36736: hexokinase
    Equation: (1) ATP + (1) beta-D-Fructose => (1) ADP + (1) H+ + (1) D-Hexose 6-phosphate
    EC: ['2.7.1.1']
    Match: name (exact)

  rxn36

### Search by EC Number

Find all reactions with a specific enzyme commission number.

In [9]:
# Search by EC number (hexokinase: 2.7.1.1)
request = SearchReactionsRequest(query="2.7.1.1", limit=10)
response = search_reactions(request, db_index)

print(f"Reactions with EC 2.7.1.1: {response['num_results']} found\n")

for result in response['results']:
    print(f"  {result['id']}: {result['name']}")
    print(f"    {result['equation']}")
    print()

Reactions with EC 2.7.1.1: 10 found

  rxn40647: 
    (1) GTP + (1) Kanamycin A <=> (1) GDP + (1) H+ + (1) kanamycin A 2''-phosphate

  rxn40943: 
    (1) beta-D-Mannose[1] + (1) PTSH-PHOSPHORYLATED <=> (1) Protein-Histidines + (1) beta-D-mannopyranose 6-phosphate

  rxn41053: 
    (1) PPi + (1) Purine-Ribonucleosides <=> (1) Phosphate + (1) H+ + (1) 5'-Phosphomononucleotide

  rxn41211: 
    (1) ATP + (1) bGalNAc-13-bGlucNAc-14-Man-R <=> (1) ADP + (1) H+ + (1) bGalNAc-13-bGlucNAc-14-Man6P-R

  rxn41321: 
    (1) PTSH-PHOSPHORYLATED + (1) alpha-chitobiose[1] <=> (1) Protein-Histidines + (1) alpha-chitobiose 6'-phosphate

  rxn41325: 
    (1) 2-Deoxy-D-glucose[1] + (1) PTSH-PHOSPHORYLATED <=> (1) 2-Deoxy-D-glucose 6-phosphate + (1) Protein-Histidines

  rxn41748: 
    (1) ATP + (1) 6-deoxy-6-sulfo-D-fructose <=> (1) ADP + (1) H+ + (1) 6-deoxy-6-sulfo-D-fructose 1-phosphate

  rxn41856: 
    (1) N-acetyl-alpha-D-glucosamine[1] + (1) PTSH-PHOSPHORYLATED <=> (1) Protein-Histidines + (1) N-

### Search by Pathway

Find all reactions in a specific metabolic pathway.

In [10]:
# Search for glycolysis reactions
request = SearchReactionsRequest(query="glycolysis", limit=20)
response = search_reactions(request, db_index)

print(f"Glycolysis Reactions: {response['num_results']} found\n")

for result in response['results'][:10]:
    print(f"  {result['id']}: {result['name']}")
    print(f"    {result['equation']}")
    print()

Glycolysis Reactions: 20 found

  rxn39387: 
    (1) 2-Hydroxyethyl-ThPP + (1) Enzyme N6-(lipoyl)lysine <=> (1) TPP + (1) [Dihydrolipoyllysine-residue acetyltransferase] S-acetyldihydrolipoyllysine

  rxn41481: 
    (1) H2O + (1) ATP + (1) Pyruvate <=> (1) Phosphate + (1) AMP + (2) H+

  rxn41717: 
    (1) Succinate + (1) ETR-Quinones <=> (1) Fumarate + (1) ETR-Quinols

  rxn47178: 
    (1) ATP + (1) Pyruvate <=> (1) ADP + (1) H+

  rxn48263: 
    (1) L-Malate + (1) ETR-Quinones <=> (1) Oxaloacetate + (1) ETR-Quinols

  rxn00499: (S)-Lactate:NAD+ oxidoreductase
    (1) NAD + (1) L-Lactate <=> (1) NADH + (1) Pyruvate + (1) H+

  rxn19740: (S)-malate hydro-lyase
    (1) L-Malate <=> (1) H2O + (1) Fumarate

  rxn00799: (S)-malate hydro-lyase (fumarate-forming)
    (1) L-Malate <=> (1) H2O + (1) Fumarate

  rxn00248: (S)-malate:NAD+ oxidoreductase
    (1) NAD + (1) L-Malate <=> (1) NADH + (1) Oxaloacetate + (1) H+

  rxn37888: 1.2.1.13-RXN.e
    (1) NADP + (1) Phosphate + (1) Glyceraldehyd

### Explore Key Metabolic Reactions

In [11]:
# Key reactions in central metabolism
key_reactions = [
    "rxn00148",  # Hexokinase (glycolysis)
    "rxn00200",  # Phosphofructokinase (glycolysis)
    "rxn00216",  # Pyruvate kinase (glycolysis)
    "rxn00148",  # Citrate synthase (TCA cycle)
    "rxn00256",  # Isocitrate dehydrogenase (TCA cycle)
]

print("Key Metabolic Reactions:")
for rxn_id in key_reactions:
    request = GetReactionNameRequest(reaction_id=rxn_id)
    response = get_reaction_name(request, db_index)
    print(f"\n  {rxn_id}: {response['name']}")
    print(f"    {response['equation']}")
    print(f"    EC: {response['ec_numbers']}")
    print(f"    Pathways: {response['pathways']}")

Key Metabolic Reactions:

  rxn00148: ATP:pyruvate 2-O-phosphotransferase
    (1) ATP + (1) Pyruvate <=> (1) ADP + (1) Phosphoenolpyruvate + (1) H+
    EC: ['2.7.1.40']
    Pathways: ['ANAEROFRUCAT-PWY', 'rn00010']

  rxn00200: 2-oxoglutaramate amidohydrolase
    (1) H2O + (1) 2-Oxoglutaramate => (1) NH3 + (1) 2-Oxoglutarate
    EC: ['3.5.1.-', '3.5.1.111', '3.5.1.3']
    Pathways: ['KEGG: rn00250 (Alanine, aspartate and glutamate metabolism)']

  rxn00216: ATP:D-glucose 6-phosphotransferase
    (1) ATP + (1) D-Glucose => (1) ADP + (1) H+ + (1) D-glucose-6-phosphate
    EC: ['2.7.1.1', '2.7.1.2']
    Pathways: ['ANAEROFRUCAT-PWY', 'rn00521']

  rxn00148: ATP:pyruvate 2-O-phosphotransferase
    (1) ATP + (1) Pyruvate <=> (1) ADP + (1) Phosphoenolpyruvate + (1) H+
    EC: ['2.7.1.40']
    Pathways: ['ANAEROFRUCAT-PWY', 'rn00010']

  rxn00256: acetyl-CoA:oxaloacetate C-acetyltransferase (thioester-hydrolysing)
    (1) CoA + (1) H+ + (1) Citrate <= (1) H2O + (1) Acetyl-CoA + (1) Oxaloaceta

## Part 3: Advanced Search Techniques

### Combining Searches

Find compounds and then lookup reactions that use them.

In [12]:
# Step 1: Find ATP
compound_request = SearchCompoundsRequest(query="ATP", limit=1)
compound_response = search_compounds(compound_request, db_index)
atp_id = compound_response['results'][0]['id']
print(f"Found ATP: {atp_id}\n")

# Step 2: Search for reactions using ATP
reaction_request = SearchReactionsRequest(query="ATP", limit=10)
reaction_response = search_reactions(reaction_request, db_index)

print(f"Reactions involving ATP: {reaction_response['num_results']} found\n")
for result in reaction_response['results'][:5]:
    print(f"  {result['name']}")
    print(f"    {result['equation']}")
    print()

Found ATP: cpd00002

Reactions involving ATP: 10 found

  (6S)-6-Hydroxy-1,4,5,6-tetrahydronicotinamide-adenine-dinucleotide hydro-lyase (ATP-hydrolysing)
    (1) ATP + (1) (6S)-6-beta-Hydroxy-1,4,5,6-tetrahydronicotinamide-adenine dinucleotide => (1) NADH + (1) ADP + (1) Phosphate + (1) H+

  (K++H+)-ATPase
    (1) H2O + (1) ATP + (1) K+[1] => (1) ADP + (1) Phosphate + (1) H+ + (1) K+

  1-(5-phospho-D-ribosyl)-ATP:diphosphate phospho-alpha-D-ribosyl-transferase
    (1) PPi + (1) H+ + (1) Phosphoribosyl-ATP <= (1) ATP + (1) PRPP

  2-hydroxy-dATP diphosphohydrolase
    (1) H2O + (1) 2-Hydroxy-dATP => (1) PPi + (2) H+ + (1) 2-Hydroxy-dAMP

  2-hydroxy-dATP diphosphohydrolase
    (1) H2O + (1) 2-hydroxy-dATP => (1) PPi + (2) H+ + (1) 2-Hydroxy-dAMP



### Performance Testing

Database lookups are highly optimized with O(1) index lookups.

In [13]:
import time

# Test compound lookup performance
test_ids = ["cpd00001", "cpd00027", "cpd00002", "cpd00067", "cpd00020"] * 200

start = time.time()
for cpd_id in test_ids:
    request = GetCompoundNameRequest(compound_id=cpd_id)
    response = get_compound_name(request, db_index)
end = time.time()

total_time = end - start
avg_time = total_time / len(test_ids) * 1000  # Convert to ms

print(f"Compound Lookup Performance:")
print(f"  Total lookups: {len(test_ids)}")
print(f"  Total time: {total_time:.3f} seconds")
print(f"  Average time per lookup: {avg_time:.3f} ms")
print(f"  Throughput: {len(test_ids) / total_time:.0f} lookups/second")

Compound Lookup Performance:
  Total lookups: 1000
  Total time: 0.043 seconds
  Average time per lookup: 0.043 ms
  Throughput: 23149 lookups/second


## Summary

This notebook demonstrated:

1. **Compound Lookups**:
   - Get detailed compound information by ID
   - Search compounds by name, formula, or alias
   - Priority-based search ranking

2. **Reaction Lookups**:
   - Get detailed reaction information by ID
   - Search reactions by name, EC number, or pathway
   - Explore metabolic pathways

3. **Advanced Techniques**:
   - Combine compound and reaction searches
   - Performance testing (O(1) lookups)

## Key Takeaways

- **Fast lookups**: Database indexes enable O(1) compound/reaction retrieval
- **Priority search**: Exact matches ranked higher than partial matches
- **Rich metadata**: Cross-references to KEGG, BiGG, MetaCyc, ChEBI
- **Pathway information**: Connect reactions to metabolic pathways

## Next Steps

- Use database tools to explore media compositions (what compounds are in your media?)
- Use database tools to interpret FBA results (what reactions are active?)
- Build custom media based on compound searches

See other notebooks:
- `01_basic_workflow.ipynb` - Complete modeling workflow
- `03_session_management.ipynb` - Manage models and media
- `04_error_handling.ipynb` - Handle common errors