# PDB MCP Server - Python Tutorial

## What you will learn:
1. How to install and set up MCP in Python
2. How to define MCP **tools** (functions the AI can call)
3. How to define MCP **resources** (data the AI can read)
4. How to define MCP **prompts** (reusable templates)
5. How to query the RCSB Protein Data Bank API
6. How to run and test your MCP server

---

## Part 1: Installation

First, install the required packages.

In [None]:
# Install required packages
!pip install "mcp[cli]" requests

## Part 2: Understanding the PDB API

Before building MCP tools, let's understand the APIs we're wrapping.
The RCSB PDB has two main APIs:

| API | Base URL | Purpose |
|-----|----------|----------|
| **Data API** | `https://data.rcsb.org/rest/v1` | Get info about specific structures |
| **Search API** | `https://search.rcsb.org/rcsbsearch/v2` | Search for structures by keyword |
| **File Download** | `https://files.rcsb.org/download` | Download coordinate files (PDB, mmCIF) |

Let's test them directly first:

In [None]:
import requests
import json
import re
from typing import Optional

# API Base URLs (same as the TypeScript version's axios baseURL)
PDB_DATA_API = "https://data.rcsb.org/rest/v1"
PDB_SEARCH_API = "https://search.rcsb.org/rcsbsearch/v2"
PDB_FILES_URL = "https://files.rcsb.org/download"

HEADERS = {
    "User-Agent": "PDB-MCP-Server-Python/1.0.0",
    "Accept": "application/json",
}

print("API URLs configured!")

In [None]:
# Test 1: Get structure info for a famous protein (Hemoglobin - 1HBB)
response = requests.get(f"{PDB_DATA_API}/core/entry/1hbb", headers=HEADERS, timeout=30)
data = response.json()

print(f"PDB ID: 1HBB")
print(f"Title: {data['struct']['title']}")
print(f"Method: {data['exptl'][0]['method']}")
print(f"Release Date: {data['rcsb_accession_info']['initial_release_date']}")
print(f"\nFull response has {len(data)} top-level keys:")
print(list(data.keys()))

In [None]:
# Test 2: Search for structures by keyword
search_query = {
    "query": {
        "type": "terminal",
        "service": "full_text",
        "parameters": {
            "value": "insulin"
        }
    },
    "return_type": "entry",
    "request_options": {
        "paginate": {"start": 0, "rows": 5},
        "results_content_type": ["experimental"],
        "sort": [{"sort_by": "score", "direction": "desc"}]
    }
}

response = requests.post(
    f"{PDB_SEARCH_API}/query",
    json=search_query,
    headers=HEADERS,
    timeout=30
)
results = response.json()

print(f"Search for 'insulin' found {results.get('total_count', 0)} total structures")
print(f"\nTop 5 results:")
for hit in results.get("result_set", []):
    print(f"  - {hit['identifier']} (score: {hit['score']:.2f})")

## Part 3: Validation Helper

The TypeScript version had type guards like `isValidPDBIdArgs`. In Python, we write validation functions.

A valid PDB ID is:
- Exactly 4 characters
- Starts with a digit
- Followed by 3 alphanumeric characters
- Example: `1HBB`, `7S4S`, `3J9I`

In [None]:
def is_valid_pdb_id(pdb_id: str) -> bool:
    """Validate a PDB ID (4-character code starting with a digit)."""
    if not isinstance(pdb_id, str):
        return False
    return bool(re.match(r'^[0-9][a-zA-Z0-9]{3}$', pdb_id))

# Test validation
print(f"'1HBB' valid? {is_valid_pdb_id('1HBB')}")   # True
print(f"'7S4S' valid? {is_valid_pdb_id('7S4S')}")   # True
print(f"'ABC'  valid? {is_valid_pdb_id('ABC')}")     # False - too short
print(f"'ABCD' valid? {is_valid_pdb_id('ABCD')}")   # False - doesn't start with digit

## Part 4: Building the 5 MCP Tools (Core of the Server)

The original TypeScript server defines 5 tools:

| # | Tool Name | What It Does |
|---|-----------|-------------|
| 1 | `search_structures` | Search PDB by keyword, filter by method/resolution |
| 2 | `get_structure_info` | Get detailed info for a specific PDB ID |
| 3 | `download_structure` | Download coordinates in PDB/mmCIF/XML format |
| 4 | `search_by_uniprot` | Find structures linked to a UniProt accession |
| 5 | `get_structure_quality` | Get validation metrics (R-factors, Ramachandran, etc.) |

In MCP Python, each tool is simply a **function with `@mcp.tool()` decorator**.

---

### How TypeScript vs Python MCP tools compare:

**TypeScript (original):**
```typescript
// Need to define schema manually
{
  name: 'search_structures',
  description: 'Search PDB database...',
  inputSchema: {
    type: 'object',
    properties: {
      query: { type: 'string', description: '...' },
    },
    required: ['query'],
  },
}
// Then write a separate handler function
```

**Python (our version):**
```python
@mcp.tool()
def search_structures(query: str, limit: int = 25) -> str:
    """Search PDB database..."""
    # just write the logic here
```

Python is MUCH simpler - the decorator + type hints + docstring replace all that JSON schema.

### Tool 1: search_structures

Searches the PDB by keyword with optional filters for experimental method and resolution.

In [None]:
def search_structures(
    query: str,
    limit: int = 25,
    sort_by: str = "score",
    experimental_method: Optional[str] = None,
    resolution_range: Optional[str] = None,
) -> str:
    """
    Search PDB database for protein structures by keyword, protein name, or PDB ID.
    
    Args:
        query: Search query (protein name, keyword, PDB ID, etc.)
        limit: Number of results to return (1-1000, default: 25)
        sort_by: Sort results by (release_date, resolution, score, etc.)
        experimental_method: Filter by method (X-RAY, NMR, ELECTRON MICROSCOPY)
        resolution_range: Resolution range filter (e.g., '1.0-2.0')
    
    Returns:
        JSON string with search results
    """
    # Build the base search query (same structure as TypeScript version)
    search_body = {
        "query": {
            "type": "terminal",
            "service": "full_text",
            "parameters": {"value": query}
        },
        "return_type": "entry",
        "request_options": {
            "paginate": {"start": 0, "rows": min(limit, 1000)},
            "results_content_type": ["experimental"],
            "sort": [{"sort_by": sort_by, "direction": "desc"}]
        }
    }
    
    # Add filters if provided (same logic as TypeScript version)
    filters = []
    
    if experimental_method:
        filters.append({
            "type": "terminal",
            "service": "text",
            "parameters": {
                "attribute": "exptl.method",
                "operator": "exact_match",
                "value": experimental_method
            }
        })
    
    if resolution_range:
        parts = resolution_range.split("-")
        if len(parts) == 2:
            try:
                min_res, max_res = float(parts[0]), float(parts[1])
                filters.append({
                    "type": "terminal",
                    "service": "text",
                    "parameters": {
                        "attribute": "rcsb_entry_info.resolution_combined",
                        "operator": "range",
                        "value": {
                            "from": min_res,
                            "to": max_res,
                            "include_lower": True,
                            "include_upper": True
                        }
                    }
                })
            except ValueError:
                pass
    
    if filters:
        search_body["query"] = {
            "type": "group",
            "logical_operator": "and",
            "nodes": [search_body["query"]] + filters
        }
    
    try:
        response = requests.post(
            f"{PDB_SEARCH_API}/query",
            json=search_body,
            headers=HEADERS,
            timeout=30
        )
        response.raise_for_status()
        return json.dumps(response.json(), indent=2)
    except requests.RequestException as e:
        return json.dumps({"error": f"Search failed: {str(e)}"})


# === TEST IT ===
print("=" * 60)
print("TEST: Search for 'kinase' structures solved by X-RAY")
print("=" * 60)
result = search_structures("kinase", limit=3, experimental_method="X-RAY DIFFRACTION")
parsed = json.loads(result)
print(f"Total found: {parsed.get('total_count', 0)}")
for hit in parsed.get("result_set", []):
    print(f"  {hit['identifier']} - score: {hit['score']:.2f}")

### Tool 2: get_structure_info

Gets detailed information for a specific PDB structure. Can return JSON metadata or actual coordinate files.

In [None]:
def get_structure_info(
    pdb_id: str,
    format: str = "json"
) -> str:
    """
    Get detailed information for a specific PDB structure.
    
    Args:
        pdb_id: PDB ID (4-character code, e.g., '1ABC')
        format: Output format - 'json', 'pdb', 'mmcif', or 'xml'
    
    Returns:
        Structure information in the requested format
    """
    if not is_valid_pdb_id(pdb_id):
        return json.dumps({"error": f"Invalid PDB ID: {pdb_id}. Must be 4 characters starting with a digit."})
    
    pdb_id_lower = pdb_id.lower()
    
    try:
        if format == "json":
            # Get metadata from the Data API
            response = requests.get(
                f"{PDB_DATA_API}/core/entry/{pdb_id_lower}",
                headers=HEADERS,
                timeout=30
            )
            response.raise_for_status()
            return json.dumps(response.json(), indent=2)
        else:
            # Download coordinate file
            extension = "cif" if format == "mmcif" else format
            url = f"{PDB_FILES_URL}/{pdb_id_lower}.{extension}"
            response = requests.get(url, timeout=30)
            response.raise_for_status()
            return response.text
    except requests.RequestException as e:
        return json.dumps({"error": f"Failed to fetch structure {pdb_id}: {str(e)}"})


# === TEST IT ===
print("=" * 60)
print("TEST: Get info for 7S4S (a cryoEM structure)")
print("=" * 60)
result = get_structure_info("7S4S", format="json")
data = json.loads(result)
if "error" not in data:
    print(f"Title: {data['struct']['title']}")
    print(f"Method: {data['exptl'][0]['method']}")
    if 'rcsb_entry_info' in data:
        res = data['rcsb_entry_info'].get('resolution_combined', [None])
        print(f"Resolution: {res}")
else:
    print(data)

### Tool 3: download_structure

Downloads coordinate files in various formats (PDB, mmCIF, XML). Can also download specific biological assemblies.

In [None]:
def download_structure(
    pdb_id: str,
    format: str = "pdb",
    assembly_id: Optional[str] = None
) -> str:
    """
    Download structure coordinates in various formats.
    
    Args:
        pdb_id: PDB ID (4-character code)
        format: File format - 'pdb', 'mmcif', 'mmtf', or 'xml'
        assembly_id: Biological assembly ID (optional)
    
    Returns:
        Structure file content as string
    """
    if not is_valid_pdb_id(pdb_id):
        return json.dumps({"error": f"Invalid PDB ID: {pdb_id}"})
    
    pdb_id_lower = pdb_id.lower()
    extension = "cif" if format == "mmcif" else format
    
    if assembly_id:
        url = f"{PDB_FILES_URL}/{pdb_id_lower}-assembly{assembly_id}.{extension}"
    else:
        url = f"{PDB_FILES_URL}/{pdb_id_lower}.{extension}"
    
    try:
        response = requests.get(url, timeout=30)
        response.raise_for_status()
        header = f"Structure file for {pdb_id} ({format.upper()} format)"
        if assembly_id:
            header += f" - Assembly {assembly_id}"
        return f"{header}:\n\n{response.text}"
    except requests.RequestException as e:
        return json.dumps({"error": f"Download failed: {str(e)}"})


# === TEST IT ===
print("=" * 60)
print("TEST: Download first 20 lines of 1HBB in PDB format")
print("=" * 60)
result = download_structure("1HBB", format="pdb")
# Just show first 20 lines to avoid flooding the notebook
lines = result.split("\n")
for line in lines[:20]:
    print(line)
print(f"\n... ({len(lines)} total lines)")

### Tool 4: search_by_uniprot

Finds PDB structures linked to a UniProt accession number. This is useful when you know the protein but want to find all solved structures.

In [None]:
def search_by_uniprot(
    uniprot_id: str,
    limit: int = 25
) -> str:
    """
    Find PDB structures associated with a UniProt accession.
    
    Args:
        uniprot_id: UniProt accession number (e.g., 'P00533' for EGFR)
        limit: Number of results to return (1-1000, default: 25)
    
    Returns:
        JSON string with matching PDB entries
    """
    search_body = {
        "query": {
            "type": "terminal",
            "service": "text",
            "parameters": {
                "attribute": "rcsb_polymer_entity_container_identifiers.reference_sequence_identifiers.database_accession",
                "operator": "exact_match",
                "value": uniprot_id
            }
        },
        "return_type": "entry",
        "request_options": {
            "paginate": {"start": 0, "rows": min(limit, 1000)},
            "results_content_type": ["experimental"]
        }
    }
    
    try:
        response = requests.post(
            f"{PDB_SEARCH_API}/query",
            json=search_body,
            headers=HEADERS,
            timeout=30
        )
        response.raise_for_status()
        return json.dumps(response.json(), indent=2)
    except requests.RequestException as e:
        return json.dumps({"error": f"UniProt search failed: {str(e)}"})


# === TEST IT ===
print("=" * 60)
print("TEST: Find structures for EGFR (UniProt: P00533)")
print("=" * 60)
result = search_by_uniprot("P00533", limit=5)
parsed = json.loads(result)
print(f"Total structures for EGFR: {parsed.get('total_count', 0)}")
for hit in parsed.get("result_set", [])[:5]:
    print(f"  {hit['identifier']}")

### Tool 5: get_structure_quality

Gets validation metrics for a structure (resolution, R-factors, Ramachandran stats, etc.).

> **Note**: The original TypeScript version used random numbers for some metrics (a placeholder). Our Python version fetches real data from the PDB validation API.

In [None]:
def get_structure_quality(pdb_id: str) -> str:
    """
    Get structure quality metrics and validation data.
    
    Args:
        pdb_id: PDB ID (4-character code)
    
    Returns:
        JSON string with quality metrics
    """
    if not is_valid_pdb_id(pdb_id):
        return json.dumps({"error": f"Invalid PDB ID: {pdb_id}"})
    
    pdb_id_lower = pdb_id.lower()
    
    try:
        # Get entry data
        entry_response = requests.get(
            f"{PDB_DATA_API}/core/entry/{pdb_id_lower}",
            headers=HEADERS,
            timeout=30
        )
        entry_response.raise_for_status()
        entry_data = entry_response.json()
        
        # Extract quality metrics from the entry data
        quality_data = {
            "pdb_id": pdb_id_lower,
            "method": entry_data.get("exptl", [{}])[0].get("method", "Unknown"),
        }
        
        # Resolution (from rcsb_entry_info)
        entry_info = entry_data.get("rcsb_entry_info", {})
        quality_data["resolution"] = entry_info.get("resolution_combined", None)
        
        # R-factors (from refine data if available)
        refine = entry_data.get("refine", [{}])
        if refine:
            quality_data["r_work"] = refine[0].get("ls_R_factor_R_work", None)
            quality_data["r_free"] = refine[0].get("ls_R_factor_R_free", None)
        
        # Try to get validation metrics
        try:
            quality_response = requests.get(
                f"{PDB_DATA_API}/core/entry/{pdb_id_lower}",
                headers=HEADERS,
                timeout=30
            )
            if quality_response.ok:
                q_data = quality_response.json()
                pdbx_vrpt = q_data.get("pdbx_vrpt_summary", {})
                if pdbx_vrpt:
                    quality_data["validation"] = {
                        "clashscore": pdbx_vrpt.get("clashscore", None),
                        "ramachandran_outlier_percent": pdbx_vrpt.get("percent_ramachandran_outliers_full_length", None),
                        "rotamer_outlier_percent": pdbx_vrpt.get("percent_rotamer_outliers_full_length", None),
                    }
        except requests.RequestException:
            pass
        
        return json.dumps(quality_data, indent=2)
    except requests.RequestException as e:
        return json.dumps({"error": f"Failed to fetch quality data: {str(e)}"})


# === TEST IT ===
print("=" * 60)
print("TEST: Quality metrics for 1HBB")
print("=" * 60)
result = get_structure_quality("1HBB")
print(result)

## Part 5: Putting It All Together as an MCP Server

Now we combine all 5 tools into a proper MCP server.

This is the equivalent of the entire `src/index.ts` file, but in Python.

> **Important**: The MCP server runs as a standalone process, not inside Jupyter.
> The cell below **writes the server file to disk**. You then run it from the terminal.

In [None]:
# This cell writes the complete MCP server to a Python file
# You can then run it from the terminal

server_code = '''
#!/usr/bin/env python3
"""
PDB MCP Server - Python Version

A Model Context Protocol server that provides access to the
Protein Data Bank (PDB) for AI assistants like Claude.

Converted from: https://github.com/Augmented-Nature/PDB-MCP-Server
Original: TypeScript | This version: Python

Tools provided:
  1. search_structures    - Search PDB by keyword
  2. get_structure_info   - Get details for a PDB ID
  3. download_structure   - Download coordinate files
  4. search_by_uniprot    - Find structures by UniProt ID
  5. get_structure_quality - Get validation metrics
"""

import json
import re
from typing import Optional

import requests
from mcp.server.fastmcp import FastMCP

# ============================================================
# Server Setup
# ============================================================

mcp = FastMCP(
    "PDB Server",
    instructions="Access the Protein Data Bank (PDB) for protein structure search, retrieval, and validation."
)

# API Configuration
PDB_DATA_API = "https://data.rcsb.org/rest/v1"
PDB_SEARCH_API = "https://search.rcsb.org/rcsbsearch/v2"
PDB_FILES_URL = "https://files.rcsb.org/download"
HEADERS = {
    "User-Agent": "PDB-MCP-Server-Python/1.0.0",
    "Accept": "application/json",
}


# ============================================================
# Validation
# ============================================================

def is_valid_pdb_id(pdb_id: str) -> bool:
    """Validate a PDB ID (4-char code starting with a digit)."""
    return isinstance(pdb_id, str) and bool(re.match(r"^[0-9][a-zA-Z0-9]{3}$", pdb_id))


# ============================================================
# Tool 1: Search Structures
# ============================================================

@mcp.tool()
def search_structures(
    query: str,
    limit: int = 25,
    sort_by: str = "score",
    experimental_method: Optional[str] = None,
    resolution_range: Optional[str] = None,
) -> str:
    """
    Search PDB database for protein structures by keyword, protein name, or PDB ID.

    Args:
        query: Search query (protein name, keyword, PDB ID, etc.)
        limit: Number of results to return (1-1000, default: 25)
        sort_by: Sort results by (release_date, resolution, score)
        experimental_method: Filter by method (X-RAY DIFFRACTION, ELECTRON MICROSCOPY, NMR)
        resolution_range: Resolution range filter (e.g., "1.0-2.0")
    """
    search_body = {
        "query": {
            "type": "terminal",
            "service": "full_text",
            "parameters": {"value": query},
        },
        "return_type": "entry",
        "request_options": {
            "paginate": {"start": 0, "rows": min(limit, 1000)},
            "results_content_type": ["experimental"],
            "sort": [{"sort_by": sort_by, "direction": "desc"}],
        },
    }

    filters = []
    if experimental_method:
        filters.append({
            "type": "terminal",
            "service": "text",
            "parameters": {
                "attribute": "exptl.method",
                "operator": "exact_match",
                "value": experimental_method,
            },
        })

    if resolution_range:
        parts = resolution_range.split("-")
        if len(parts) == 2:
            try:
                min_res, max_res = float(parts[0]), float(parts[1])
                filters.append({
                    "type": "terminal",
                    "service": "text",
                    "parameters": {
                        "attribute": "rcsb_entry_info.resolution_combined",
                        "operator": "range",
                        "value": {
                            "from": min_res,
                            "to": max_res,
                            "include_lower": True,
                            "include_upper": True,
                        },
                    },
                })
            except ValueError:
                pass

    if filters:
        search_body["query"] = {
            "type": "group",
            "logical_operator": "and",
            "nodes": [search_body["query"]] + filters,
        }

    try:
        response = requests.post(
            f"{PDB_SEARCH_API}/query",
            json=search_body,
            headers=HEADERS,
            timeout=30,
        )
        response.raise_for_status()
        return json.dumps(response.json(), indent=2)
    except requests.RequestException as e:
        return json.dumps({"error": f"Search failed: {str(e)}"})


# ============================================================
# Tool 2: Get Structure Info
# ============================================================

@mcp.tool()
def get_structure_info(pdb_id: str, format: str = "json") -> str:
    """
    Get detailed information for a specific PDB structure.

    Args:
        pdb_id: PDB ID (4-character code, e.g., "1ABC")
        format: Output format - "json", "pdb", "mmcif", or "xml"
    """
    if not is_valid_pdb_id(pdb_id):
        return json.dumps({"error": f"Invalid PDB ID: {pdb_id}"})

    pdb_id_lower = pdb_id.lower()

    try:
        if format == "json":
            response = requests.get(
                f"{PDB_DATA_API}/core/entry/{pdb_id_lower}",
                headers=HEADERS,
                timeout=30,
            )
            response.raise_for_status()
            return json.dumps(response.json(), indent=2)
        else:
            extension = "cif" if format == "mmcif" else format
            url = f"{PDB_FILES_URL}/{pdb_id_lower}.{extension}"
            response = requests.get(url, timeout=30)
            response.raise_for_status()
            return response.text
    except requests.RequestException as e:
        return json.dumps({"error": f"Failed to fetch structure {pdb_id}: {str(e)}"})


# ============================================================
# Tool 3: Download Structure
# ============================================================

@mcp.tool()
def download_structure(
    pdb_id: str,
    format: str = "pdb",
    assembly_id: Optional[str] = None,
) -> str:
    """
    Download structure coordinates in various formats.

    Args:
        pdb_id: PDB ID (4-character code)
        format: File format - "pdb", "mmcif", "mmtf", or "xml"
        assembly_id: Biological assembly ID (optional)
    """
    if not is_valid_pdb_id(pdb_id):
        return json.dumps({"error": f"Invalid PDB ID: {pdb_id}"})

    pdb_id_lower = pdb_id.lower()
    extension = "cif" if format == "mmcif" else format

    if assembly_id:
        url = f"{PDB_FILES_URL}/{pdb_id_lower}-assembly{assembly_id}.{extension}"
    else:
        url = f"{PDB_FILES_URL}/{pdb_id_lower}.{extension}"

    try:
        response = requests.get(url, timeout=30)
        response.raise_for_status()
        header = f"Structure file for {pdb_id} ({format.upper()} format)"
        if assembly_id:
            header += f" - Assembly {assembly_id}"
        return f"{header}:\\n\\n{response.text}"
    except requests.RequestException as e:
        return json.dumps({"error": f"Download failed: {str(e)}"})


# ============================================================
# Tool 4: Search by UniProt
# ============================================================

@mcp.tool()
def search_by_uniprot(uniprot_id: str, limit: int = 25) -> str:
    """
    Find PDB structures associated with a UniProt accession.

    Args:
        uniprot_id: UniProt accession number (e.g., "P00533" for EGFR)
        limit: Number of results to return (1-1000, default: 25)
    """
    search_body = {
        "query": {
            "type": "terminal",
            "service": "text",
            "parameters": {
                "attribute": "rcsb_polymer_entity_container_identifiers.reference_sequence_identifiers.database_accession",
                "operator": "exact_match",
                "value": uniprot_id,
            },
        },
        "return_type": "entry",
        "request_options": {
            "paginate": {"start": 0, "rows": min(limit, 1000)},
            "results_content_type": ["experimental"],
        },
    }

    try:
        response = requests.post(
            f"{PDB_SEARCH_API}/query",
            json=search_body,
            headers=HEADERS,
            timeout=30,
        )
        response.raise_for_status()
        return json.dumps(response.json(), indent=2)
    except requests.RequestException as e:
        return json.dumps({"error": f"UniProt search failed: {str(e)}"})


# ============================================================
# Tool 5: Get Structure Quality
# ============================================================

@mcp.tool()
def get_structure_quality(pdb_id: str) -> str:
    """
    Get structure quality metrics and validation data for a PDB structure.

    Args:
        pdb_id: PDB ID (4-character code)
    """
    if not is_valid_pdb_id(pdb_id):
        return json.dumps({"error": f"Invalid PDB ID: {pdb_id}"})

    pdb_id_lower = pdb_id.lower()

    try:
        response = requests.get(
            f"{PDB_DATA_API}/core/entry/{pdb_id_lower}",
            headers=HEADERS,
            timeout=30,
        )
        response.raise_for_status()
        entry_data = response.json()

        quality_data = {
            "pdb_id": pdb_id_lower,
            "method": entry_data.get("exptl", [{}])[0].get("method", "Unknown"),
            "resolution": entry_data.get("rcsb_entry_info", {}).get(
                "resolution_combined", None
            ),
        }

        refine = entry_data.get("refine", [{}])
        if refine:
            quality_data["r_work"] = refine[0].get("ls_R_factor_R_work", None)
            quality_data["r_free"] = refine[0].get("ls_R_factor_R_free", None)

        pdbx_vrpt = entry_data.get("pdbx_vrpt_summary", {})
        if pdbx_vrpt:
            quality_data["validation"] = {
                "clashscore": pdbx_vrpt.get("clashscore"),
                "ramachandran_outlier_percent": pdbx_vrpt.get(
                    "percent_ramachandran_outliers_full_length"
                ),
                "rotamer_outlier_percent": pdbx_vrpt.get(
                    "percent_rotamer_outliers_full_length"
                ),
            }

        return json.dumps(quality_data, indent=2)
    except requests.RequestException as e:
        return json.dumps({"error": f"Failed to fetch quality data: {str(e)}"})


# ============================================================
# Resources (data the AI can read, like URLs)
# ============================================================

@mcp.resource("pdb://structure/{pdb_id}")
def get_structure_resource(pdb_id: str) -> str:
    """Complete structure information for a PDB ID."""
    response = requests.get(
        f"{PDB_DATA_API}/core/entry/{pdb_id.lower()}",
        headers=HEADERS,
        timeout=30,
    )
    response.raise_for_status()
    return json.dumps(response.json(), indent=2)


@mcp.resource("pdb://coordinates/{pdb_id}")
def get_coordinates_resource(pdb_id: str) -> str:
    """Structure coordinates in PDB format."""
    response = requests.get(
        f"{PDB_FILES_URL}/{pdb_id.lower()}.pdb",
        timeout=30,
    )
    response.raise_for_status()
    return response.text


@mcp.resource("pdb://mmcif/{pdb_id}")
def get_mmcif_resource(pdb_id: str) -> str:
    """Structure data in mmCIF format."""
    response = requests.get(
        f"{PDB_FILES_URL}/{pdb_id.lower()}.cif",
        timeout=30,
    )
    response.raise_for_status()
    return response.text


# ============================================================
# Prompts (reusable templates for the AI)
# ============================================================

@mcp.prompt()
def analyze_structure(pdb_id: str) -> str:
    """Prompt to analyze a protein structure comprehensively."""
    return f"""Please analyze the protein structure with PDB ID {pdb_id}.
    
Use the available tools to:
1. Get the structure info (get_structure_info)
2. Check the quality metrics (get_structure_quality)
3. Search for related structures by UniProt ID if available

Provide a summary including:
- What protein/complex this is
- Experimental method and resolution
- Quality assessment
- Any notable features or ligands"""


@mcp.prompt()
def compare_methods(protein_name: str) -> str:
    """Prompt to compare X-ray and cryoEM structures of the same protein."""
    return f"""Search for structures of {protein_name} solved by both X-RAY DIFFRACTION 
and ELECTRON MICROSCOPY. Compare the available structures in terms of:
- Resolution
- Quality metrics
- What biological state they capture

Use search_structures with experimental_method filter for each method."""


# ============================================================
# Run the server
# ============================================================

if __name__ == "__main__":
    mcp.run()
'''

# Write to file
with open("pdb_mcp_server.py", "w") as f:
    f.write(server_code)

print("Server file written to: pdb_mcp_server.py")
print("\nTo run it:")
print("  python pdb_mcp_server.py")
print("\nTo test with MCP Inspector:")
print("  npx @modelcontextprotocol/inspector python pdb_mcp_server.py")
print("\nTo register with Claude Code:")
print("  claude mcp add pdb-server /opt/anaconda3/bin/python /path/to/pdb_mcp_server.py")

## Part 6: How to Connect This Server to Claude Code / Cursor

Once the server file is saved, you need to tell Claude Code or Cursor about it.

### Option A: Claude Code

Add to your `~/.claude/claude_desktop_config.json`:

```json
{
  "mcpServers": {
    "pdb-server": {
      "command": "python",
      "args": ["/full/path/to/pdb_mcp_server.py"]
    }
  }
}
```

### Option B: Cursor

Add to your Cursor MCP settings:

```json
{
  "mcpServers": {
    "pdb-server": {
      "command": "python",
      "args": ["/full/path/to/pdb_mcp_server.py"]
    }
  }
}
```

Then restart Claude Code / Cursor. The AI will now be able to call your PDB tools!

## Part 7: Interactive Demo - Try All 5 Tools

Let's test each tool interactively. These are the same functions the AI would call.

In [None]:
# Demo 1: Search for cryoEM structures of spike protein
print("=" * 60)
print("DEMO 1: Search for SARS-CoV-2 spike protein cryoEM structures")
print("=" * 60)

result = search_structures(
    query="SARS-CoV-2 spike protein",
    limit=5,
    experimental_method="ELECTRON MICROSCOPY"
)
parsed = json.loads(result)
print(f"\nTotal cryoEM spike structures: {parsed.get('total_count', 0)}")
print("\nTop 5:")
for hit in parsed.get("result_set", []):
    print(f"  PDB: {hit['identifier']}  |  Score: {hit['score']:.2f}")

In [None]:
# Demo 2: Get detailed info about a specific structure
print("=" * 60)
print("DEMO 2: Detailed info for 6VYB (SARS-CoV-2 Spike, open state)")
print("=" * 60)

result = get_structure_info("6VYB")
data = json.loads(result)

print(f"\nTitle: {data['struct']['title']}")
print(f"Method: {data['exptl'][0]['method']}")
print(f"Resolution: {data.get('rcsb_entry_info', {}).get('resolution_combined', 'N/A')}")
print(f"Polymer entities: {data.get('rcsb_entry_info', {}).get('polymer_entity_count', 'N/A')}")
print(f"Release date: {data.get('rcsb_accession_info', {}).get('initial_release_date', 'N/A')}")

In [None]:
# Demo 3: Download a structure (show first few ATOM lines)
print("=" * 60)
print("DEMO 3: Download 1HBB and show ATOM records")
print("=" * 60)

result = download_structure("1HBB", format="pdb")
lines = result.split("\n")

# Show header and first 5 ATOM lines
print(lines[0])  # Header from our function
atom_lines = [l for l in lines if l.startswith("ATOM")]
print(f"\nTotal ATOM records: {len(atom_lines)}")
print("\nFirst 5 ATOM lines:")
for line in atom_lines[:5]:
    print(f"  {line}")

In [None]:
# Demo 4: Find all structures for Human Insulin (UniProt P01308)
print("=" * 60)
print("DEMO 4: All PDB structures for Human Insulin (P01308)")
print("=" * 60)

result = search_by_uniprot("P01308", limit=10)
parsed = json.loads(result)
print(f"\nTotal insulin structures: {parsed.get('total_count', 0)}")
print("\nFirst 10:")
for hit in parsed.get("result_set", [])[:10]:
    print(f"  {hit['identifier']}")

In [None]:
# Demo 5: Quality check
print("=" * 60)
print("DEMO 5: Structure quality for 6VYB")
print("=" * 60)

result = get_structure_quality("6VYB")
print(result)

## Part 8: Summary - TypeScript vs Python Side-by-Side

| Concept | TypeScript (Original) | Python (Our Version) |
|---------|----------------------|---------------------|
| **Package** | `@modelcontextprotocol/sdk` | `mcp` (pip install) |
| **Server class** | `new Server({name, version}, {capabilities})` | `FastMCP("name")` |
| **Define a tool** | JSON schema + separate handler function | `@mcp.tool()` decorator on a function |
| **Tool inputs** | `inputSchema: {type: 'object', properties: {...}}` | Python type hints: `def f(x: str, n: int = 5)` |
| **Tool description** | `description: 'Search PDB...'` | Docstring: `"""Search PDB..."""` |
| **HTTP client** | `axios` | `requests` |
| **Resources** | `setRequestHandler(ReadResourceRequestSchema, ...)` | `@mcp.resource("pdb://...")` |
| **Transport** | `StdioServerTransport` | Built into `mcp.run()` |
| **Run** | `npm run build && node build/index.js` | `python pdb_mcp_server.py` |
| **Config files** | `package.json`, `tsconfig.json` | None needed (just the .py file) |

### Key takeaway: Python MCP is much simpler than TypeScript MCP.

The TypeScript version needs ~500 lines with manual JSON schemas.  
The Python version does the same in ~250 lines with decorators and type hints.