# Database Lookup Tools Demo

This notebook demonstrates the ModelSEED database lookup tools:
- `get_compound`: Look up compounds by ID
- `search_compounds`: Search compounds by name/formula
- `get_reaction`: Look up reactions by ID
- `search_reactions`: Search reactions by name/EC number

## Key Features

- O(1) lookup by ModelSEED IDs (cpd00027, rxn00148)
- Case-insensitive name search
- Formula and abbreviation search
- EC number search for reactions
- Database contains 30,000+ compounds and 35,000+ reactions

## Setup

First, let's import the necessary modules and load the ModelSEED database.

In [None]:
import sys
from pathlib import Path

# Add src to path
sys.path.insert(0, str(Path.cwd().parent.parent / "src"))

from gem_flux_mcp.database import load_compounds_database, load_reactions_database
from gem_flux_mcp.database.index import DatabaseIndex

In [None]:
# Load ModelSEED database
db_dir = Path.cwd().parent.parent / "data" / "database"
compounds_df = load_compounds_database(db_dir / "compounds.tsv")
reactions_df = load_reactions_database(db_dir / "reactions.tsv")
db_index = DatabaseIndex(compounds_df, reactions_df)

print(f"✅ Loaded {len(compounds_df)} compounds")
print(f"✅ Loaded {len(reactions_df)} reactions")

## Example 1: Get Compound by ID

Look up a specific compound using its ModelSEED ID.

In [None]:
# Look up D-Glucose (cpd00027)
glucose = db_index.get_compound_by_id("cpd00027")

if glucose is not None:
    print(f"Compound ID: cpd00027")
    print(f"Name: {glucose['name']}")
    print(f"Formula: {glucose['formula']}")
    print(f"Mass: {glucose['mass']:.2f} Da")
    print(f"Charge: {glucose['charge']}")
    print(f"SMILES: {glucose['smiles']}")
else:
    print("Compound not found")

# Search for glucose compounds
glucose_compounds = db_index.search_compounds_by_name("glucose")

print(f"Found {len(glucose_compounds)} compounds matching 'glucose':\n")
for compound in glucose_compounds:
    print(f"  {compound.name}: {compound['name']} ({compound['formula']})")

In [None]:
# Search for ATP by abbreviation
atp_compounds = db_index.search_compounds_by_abbreviation("atp")

print(f"Found {len(atp_compounds)} compounds with 'ATP' in abbreviation:\n")
for compound in atp_compounds[:5]:
    print(f"  {compound.name}: {compound['name']}")
    print(f"    Abbreviation: {compound['abbreviation']}")
    print(f"    Formula: {compound['formula']}")
    print()

## Example 3: Search Compounds by Abbreviation

In [None]:
# Search for ATP by abbreviation
atp_compounds = db_index.search_compounds_by_abbreviation("atp")

print(f"Found {len(atp_compounds)} compounds with 'ATP' in abbreviation:\n")
for cpd_id, cpd_data in list(atp_compounds.items())[:5]:
    print(f"  {cpd_id}: {cpd_data['name']}")
    print(f"    Abbreviation: {cpd_data['abbreviation']}")
    print(f"    Formula: {cpd_data['formula']}")
    print()

# Search for kinase reactions
kinase_reactions = db_index.search_reactions_by_name("kinase")

print(f"Found {len(kinase_reactions)} reactions with 'kinase' in name:\n")
for reaction in kinase_reactions:
    print(f"  {reaction.name}: {reaction['name']}")

In [None]:
# Search for reactions with EC 2.7.1 (phosphotransferases with alcohol group as acceptor)
transferase_reactions = db_index.search_reactions_by_ec_number("2.7.1")

print(f"Found {len(transferase_reactions)} reactions with EC 2.7.1.x:\n")
for reaction in transferase_reactions:
    print(f"  {reaction.name}: {reaction['name']}")
    print(f"    EC: {reaction['ec_numbers']}")
    print()

# Find all reactions involved in glycolysis
glycolysis_reactions = db_index.search_reactions_by_name("glycolysis")

print(f"Found {len(glycolysis_reactions)} reactions related to glycolysis:\n")
for reaction in glycolysis_reactions:
    print(f"  {reaction.name}: {reaction['name']}")
    print(f"    Equation: {reaction['equation'][:80]}...")  # Truncate long equations
    print()

In [None]:
# Find all reactions involving ATP
# First, search for ATP-related reactions
atp_reactions = db_index.search_reactions_by_name("atp")

print(f"Found {len(atp_reactions)} reactions involving ATP\n")
print("Sample reactions:")
for reaction in atp_reactions[:5]:
    print(f"  {reaction.name}: {reaction['name']}")

## Example 6: Search Reactions by EC Number

In [None]:
# Search for reactions with EC 2.7.1 (phosphotransferases with alcohol group as acceptor)
transferase_reactions = db_index.search_reactions_by_ec("2.7.1")

print(f"Found {len(transferase_reactions)} reactions with EC 2.7.1.x:\n")
for rxn_id, rxn_data in list(transferase_reactions.items())[:10]:
    print(f"  {rxn_id}: {rxn_data['name']}")
    print(f"    EC: {rxn_data['ec_numbers']}")
    print()

## Example 7: Explore Metabolic Pathways

In [None]:
# Find all reactions involved in glycolysis
glycolysis_reactions = db_index.search_reactions_by_name("glycolysis")

print(f"Found {len(glycolysis_reactions)} reactions related to glycolysis:\n")
for rxn_id, rxn_data in glycolysis_reactions.items():
    print(f"  {rxn_id}: {rxn_data['name']}")
    print(f"    Equation: {rxn_data['equation'][:80]}..."  # Truncate long equations)
    print()

## Example 8: Cross-Reference Compounds and Reactions

In [None]:
# Find all reactions involving ATP
# First, search for ATP-related reactions
atp_reactions = db_index.search_reactions_by_name("atp")

print(f"Found {len(atp_reactions)} reactions involving ATP\n")
print("Sample reactions:")
for rxn_id, rxn_data in list(atp_reactions.items())[:5]:
    print(f"  {rxn_id}: {rxn_data['name']}")

## Summary

The database lookup tools provide:

### Compounds
✅ `get_compound_by_id(cpd_id)` - O(1) lookup by ID  
✅ `search_compounds_by_name(name)` - Case-insensitive name search  
✅ `search_compounds_by_abbreviation(abbr)` - Search by abbreviation  

### Reactions
✅ `get_reaction_by_id(rxn_id)` - O(1) lookup by ID  
✅ `search_reactions_by_name(name)` - Case-insensitive name search  
✅ `search_reactions_by_abbreviation(abbr)` - Search by abbreviation  
✅ `search_reactions_by_ec(ec_number)` - Search by EC classification  

### Database Stats
- **33,992 compounds** with chemical properties
- **43,774 reactions** with stoichiometry and EC numbers
- Aliases for cross-referencing to KEGG, BiGG, MetaCyc

**Next Steps**: Use compound/reaction IDs with `build_media` or explore metabolic pathways!