# Gem-Flux MCP Server: Database Lookups

This notebook demonstrates how to search and retrieve compound and reaction information from the ModelSEED database.

## Database Overview

The ModelSEED database contains:
- **33,993 compounds** with IDs, names, formulas, and cross-references
- **43,775 reactions** with equations, EC numbers, and pathway information

## Available Tools

1. **Compound Tools**:
   - `get_compound_name` - Get detailed information for a specific compound ID
   - `search_compounds` - Search compounds by name, formula, or alias

2. **Reaction Tools**:
   - `get_reaction_name` - Get detailed information for a specific reaction ID
   - `search_reactions` - Search reactions by name, EC number, or pathway

## Setup

In [1]:
# Import database tools
from gem_flux_mcp.tools.compound_lookup import get_compound_name, search_compounds
from gem_flux_mcp.tools.reaction_lookup import get_reaction_name, search_reactions

# Import database loading
from gem_flux_mcp.database.loader import load_compounds_database, load_reactions_database
from gem_flux_mcp.database.index import DatabaseIndex

# Import types
from gem_flux_mcp.types import (
    GetCompoundNameRequest,
    SearchCompoundsRequest,
    GetReactionNameRequest,
    SearchReactionsRequest
)

from pathlib import Path
import json

print("✓ Imports successful")

ModuleNotFoundError: No module named 'gem_flux_mcp'

## Load Database

In [2]:
# Load database
database_dir = Path("../data/database")
compounds_path = database_dir / "compounds.tsv"
reactions_path = database_dir / "reactions.tsv"

print(f"Loading database from {database_dir}...")
compounds_df = load_compounds_database(str(compounds_path))
reactions_df = load_reactions_database(str(reactions_path))
db_index = DatabaseIndex(compounds_df, reactions_df)
print(f"✓ Loaded {len(compounds_df)} compounds and {len(reactions_df)} reactions")

NameError: name 'Path' is not defined

## Part 1: Compound Lookups

### Get Compound by ID

Use `get_compound_name` to retrieve detailed information for a specific compound ID.

In [3]:
# Lookup glucose (cpd00027)
request = GetCompoundNameRequest(compound_id="cpd00027")
response = get_compound_name(request, db_index)

print("Compound Information:")
print(f"  ID: {response.id}")
print(f"  Name: {response.name}")
print(f"  Abbreviation: {response.abbreviation}")
print(f"  Formula: {response.formula}")
print(f"  Mass: {response.mass} g/mol")
print(f"  Charge: {response.charge}")
print(f"  InChI Key: {response.inchikey}")
print(f"  SMILES: {response.smiles}")

print(f"\n  Aliases: {response.aliases}")
print(f"\n  External IDs:")
for db, ids in response.external_ids.items():
    print(f"    {db}: {ids}")

NameError: name 'GetCompoundNameRequest' is not defined

### Search Compounds by Name

Use `search_compounds` to find compounds by name, formula, or alias.

In [4]:
# Search for glucose-related compounds
request = SearchCompoundsRequest(query="glucose", limit=10)
response = search_compounds(request, db_index)

print(f"Search Results: {response.num_results} compounds found")
print(f"Truncated: {response.truncated}\n")

for result in response.results:
    print(f"  {result.id}: {result.name}")
    print(f"    Formula: {result.formula}")
    print(f"    Match: {result.match_field} ({result.match_type})")
    print()

NameError: name 'SearchCompoundsRequest' is not defined

### Search Priority Demonstration

The search uses priority-based matching:
1. Exact ID match (highest priority)
2. Exact name match
3. Exact abbreviation match
4. Partial name match
5. Formula match
6. Alias match (lowest priority)

In [5]:
# Search by exact ID
request = SearchCompoundsRequest(query="cpd00027", limit=5)
response = search_compounds(request, db_index)
print("Search by exact ID (cpd00027):")
print(f"  First result: {response.results[0].name}")
print(f"  Match type: {response.results[0].match_type}\n")

# Search by formula
request = SearchCompoundsRequest(query="C6H12O6", limit=5)
response = search_compounds(request, db_index)
print("Search by formula (C6H12O6):")
for result in response.results[:3]:
    print(f"  {result.name} ({result.id})")
print()

# Search with no results
request = SearchCompoundsRequest(query="nonexistent_compound_xyz", limit=10)
response = search_compounds(request, db_index)
print("Search with no results:")
print(f"  Num results: {response.num_results}")
print(f"  Suggestions: {response.suggestions}")

NameError: name 'SearchCompoundsRequest' is not defined

### Explore Common Metabolites

In [6]:
# Common metabolites to explore
common_metabolites = [
    "cpd00001",  # H2O
    "cpd00002",  # ATP
    "cpd00008",  # ADP
    "cpd00067",  # H+
    "cpd00020",  # Pyruvate
    "cpd00036",  # Succinate
    "cpd00024",  # Acetyl-CoA
]

print("Common Metabolites:")
for cpd_id in common_metabolites:
    request = GetCompoundNameRequest(compound_id=cpd_id)
    response = get_compound_name(request, db_index)
    print(f"  {cpd_id}: {response.name} ({response.formula})")

Common Metabolites:


NameError: name 'GetCompoundNameRequest' is not defined

## Part 2: Reaction Lookups

### Get Reaction by ID

Use `get_reaction_name` to retrieve detailed information for a specific reaction ID.

In [7]:
# Lookup hexokinase (rxn00148)
request = GetReactionNameRequest(reaction_id="rxn00148")
response = get_reaction_name(request, db_index)

print("Reaction Information:")
print(f"  ID: {response.id}")
print(f"  Name: {response.name}")
print(f"  Abbreviation: {response.abbreviation}")
print(f"  Equation (readable): {response.equation}")
print(f"  Equation (with IDs): {response.equation_with_ids}")
print(f"  Reversibility: {response.reversibility}")
print(f"  Direction: {response.direction}")
print(f"  Is transport: {response.is_transport}")
print(f"  EC numbers: {response.ec_numbers}")
print(f"  Pathways: {response.pathways}")
if response.deltag is not None:
    print(f"  ΔG: {response.deltag} ± {response.deltagerr} kcal/mol")

print(f"\n  Aliases: {response.aliases}")
print(f"\n  External IDs:")
for db, ids in response.external_ids.items():
    print(f"    {db}: {ids}")

NameError: name 'GetReactionNameRequest' is not defined

### Search Reactions by Name

Use `search_reactions` to find reactions by name, EC number, or pathway.

In [8]:
# Search for hexokinase reactions
request = SearchReactionsRequest(query="hexokinase", limit=10)
response = search_reactions(request, db_index)

print(f"Search Results: {response.num_results} reactions found")
print(f"Truncated: {response.truncated}\n")

for result in response.results:
    print(f"  {result.id}: {result.name}")
    print(f"    Equation: {result.equation}")
    print(f"    EC: {result.ec_numbers}")
    print(f"    Match: {result.match_field} ({result.match_type})")
    print()

NameError: name 'SearchReactionsRequest' is not defined

### Search by EC Number

Find all reactions with a specific enzyme commission number.

In [9]:
# Search by EC number (hexokinase: 2.7.1.1)
request = SearchReactionsRequest(query="2.7.1.1", limit=10)
response = search_reactions(request, db_index)

print(f"Reactions with EC 2.7.1.1: {response.num_results} found\n")

for result in response.results:
    print(f"  {result.id}: {result.name}")
    print(f"    {result.equation}")
    print()

NameError: name 'SearchReactionsRequest' is not defined

### Search by Pathway

Find all reactions in a specific metabolic pathway.

In [10]:
# Search for glycolysis reactions
request = SearchReactionsRequest(query="glycolysis", limit=20)
response = search_reactions(request, db_index)

print(f"Glycolysis Reactions: {response.num_results} found\n")

for result in response.results[:10]:
    print(f"  {result.id}: {result.name}")
    print(f"    {result.equation}")
    print()

NameError: name 'SearchReactionsRequest' is not defined

### Explore Key Metabolic Reactions

In [11]:
# Key reactions in central metabolism
key_reactions = [
    "rxn00148",  # Hexokinase (glycolysis)
    "rxn00200",  # Phosphofructokinase (glycolysis)
    "rxn00216",  # Pyruvate kinase (glycolysis)
    "rxn00148",  # Citrate synthase (TCA cycle)
    "rxn00256",  # Isocitrate dehydrogenase (TCA cycle)
]

print("Key Metabolic Reactions:")
for rxn_id in key_reactions:
    request = GetReactionNameRequest(reaction_id=rxn_id)
    response = get_reaction_name(request, db_index)
    print(f"\n  {rxn_id}: {response.name}")
    print(f"    {response.equation}")
    print(f"    EC: {response.ec_numbers}")
    print(f"    Pathways: {response.pathways}")

Key Metabolic Reactions:


NameError: name 'GetReactionNameRequest' is not defined

## Part 3: Advanced Search Techniques

### Combining Searches

Find compounds and then lookup reactions that use them.

In [12]:
# Step 1: Find ATP
compound_request = SearchCompoundsRequest(query="ATP", limit=1)
compound_response = search_compounds(compound_request, db_index)
atp_id = compound_response.results[0].id
print(f"Found ATP: {atp_id}\n")

# Step 2: Search for reactions using ATP
reaction_request = SearchReactionsRequest(query="ATP", limit=10)
reaction_response = search_reactions(reaction_request, db_index)

print(f"Reactions involving ATP: {reaction_response.num_results} found\n")
for result in reaction_response.results[:5]:
    print(f"  {result.name}")
    print(f"    {result.equation}")
    print()

NameError: name 'SearchCompoundsRequest' is not defined

### Performance Testing

Database lookups are highly optimized with O(1) index lookups.

In [13]:
import time

# Test compound lookup performance
test_ids = ["cpd00001", "cpd00027", "cpd00002", "cpd00067", "cpd00020"] * 200

start = time.time()
for cpd_id in test_ids:
    request = GetCompoundNameRequest(compound_id=cpd_id)
    response = get_compound_name(request, db_index)
end = time.time()

total_time = end - start
avg_time = total_time / len(test_ids) * 1000  # Convert to ms

print(f"Compound Lookup Performance:")
print(f"  Total lookups: {len(test_ids)}")
print(f"  Total time: {total_time:.3f} seconds")
print(f"  Average time per lookup: {avg_time:.3f} ms")
print(f"  Throughput: {len(test_ids) / total_time:.0f} lookups/second")

NameError: name 'GetCompoundNameRequest' is not defined

## Summary

This notebook demonstrated:

1. **Compound Lookups**:
   - Get detailed compound information by ID
   - Search compounds by name, formula, or alias
   - Priority-based search ranking

2. **Reaction Lookups**:
   - Get detailed reaction information by ID
   - Search reactions by name, EC number, or pathway
   - Explore metabolic pathways

3. **Advanced Techniques**:
   - Combine compound and reaction searches
   - Performance testing (O(1) lookups)

## Key Takeaways

- **Fast lookups**: Database indexes enable O(1) compound/reaction retrieval
- **Priority search**: Exact matches ranked higher than partial matches
- **Rich metadata**: Cross-references to KEGG, BiGG, MetaCyc, ChEBI
- **Pathway information**: Connect reactions to metabolic pathways

## Next Steps

- Use database tools to explore media compositions (what compounds are in your media?)
- Use database tools to interpret FBA results (what reactions are active?)
- Build custom media based on compound searches

See other notebooks:
- `01_basic_workflow.ipynb` - Complete modeling workflow
- `03_session_management.ipynb` - Manage models and media
- `04_error_handling.ipynb` - Handle common errors