# Medical Database Integration Pipeline

## Overview

This notebook demonstrates how to integrate Python/FastMCP MCP servers as data sources for medical database ingestion. Connect to medical database MCP servers via URL, ingest patient records, drug interactions, and clinical data, then build a healthcare knowledge graph.

**IMPORTANT**: This implementation supports ONLY Python-based MCP servers and FastMCP servers. Users can bring their own Python/FastMCP MCP servers via URL connections.

### Modules Used (20+)

- **Ingestion**: MCPIngestor, ingest_mcp, DBIngestor, FileIngestor
- **Parsing**: MCPParser, JSONParser, StructuredDataParser, DocumentParser
- **Extraction**: NERExtractor, RelationExtractor, TripleExtractor, SemanticAnalyzer
- **KG**: GraphBuilder, GraphValidator, EntityResolver, GraphAnalyzer
- **Triple Store**: TripleStore, TripleManager, QueryEngine
- **Reasoning**: InferenceEngine, RuleManager, ExplanationGenerator
- **Quality**: KGQualityAssessor, ValidationEngine
- **Export**: JSONExporter, RDFExporter, OWLExporter, ReportGenerator
- **Visualization**: KGVisualizer, OntologyVisualizer, TemporalVisualizer

### Pipeline

**Connect to Medical MCP Server ‚Üí Ingest Patient/Drug Data via MCP ‚Üí Parse MCP Responses ‚Üí Extract Medical Entities ‚Üí Build Healthcare KG ‚Üí Query & Analyze ‚Üí Generate Reports ‚Üí Visualize**

---

## Step 1: Connect to Medical Database MCP Server

Connect to a Python/FastMCP MCP server that provides medical database access via URL. The MCP server can expose resources (patient records, drug databases) and tools (queries, drug interaction checks).

In [None]:
from semantica.ingest import MCPIngestor, ingest_mcp
from semantica.parse import MCPParser, JSONParser, StructuredDataParser, DocumentParser
from semantica.semantic_extract import NERExtractor, RelationExtractor, TripleExtractor, SemanticAnalyzer
from semantica.kg import GraphBuilder, GraphValidator, EntityResolver, GraphAnalyzer
from semantica.triple_store import TripleStore, TripleManager, QueryEngine
from semantica.reasoning import InferenceEngine, RuleManager, ExplanationGenerator
from semantica.kg_qa import KGQualityAssessor, ValidationEngine
from semantica.export import JSONExporter, RDFExporter, OWLExporter, ReportGenerator
from semantica.visualization import KGVisualizer, OntologyVisualizer, TemporalVisualizer
import json
from datetime import datetime, timedelta

# Initialize MCP ingestor
mcp_ingestor = MCPIngestor()

# Connect to medical database MCP server via URL
# Replace with your actual MCP server URL
# Example: http://localhost:8000/mcp or https://api.example.com/medical-mcp
medical_mcp_url = "http://localhost:8000/mcp"

try:
    # Connect to MCP server with authentication (if required)
    mcp_ingestor.connect(
        "medical_server",
        url=medical_mcp_url,
        headers={
            "Authorization": "Bearer your_token",
            "X-API-Key": "your_api_key"
        } if "api.example.com" in medical_mcp_url else {}
    )
    print(f"‚úì Connected to medical MCP server at {medical_mcp_url}")
    
    # List available resources (patient records, drug databases)
    resources = mcp_ingestor.list_available_resources("medical_server")
    print(f"\nüìä Available Resources ({len(resources)}):")
    for resource in resources[:5]:  # Show first 5
        print(f"  - {resource.uri}: {resource.name}")
        if resource.description:
            print(f"    {resource.description[:80]}...")
    
    # List available tools (queries, drug interaction checks)
    tools = mcp_ingestor.list_available_tools("medical_server")
    print(f"\nüîß Available Tools ({len(tools)}):")
    for tool in tools[:5]:  # Show first 5
        print(f"  - {tool.name}: {tool.description or 'No description'}")
        
except Exception as e:
    print(f"‚ö† Connection failed: {e}")
    print("Note: This example uses a placeholder URL. Replace with your actual MCP server URL.")
    print("For testing, you can use a mock MCP server or skip connection and use sample data below.")


## Step 2: Ingest Medical Data from MCP Server

Ingest patient records, drug interactions, and clinical data using both resource-based and tool-based methods.


In [None]:
# Initialize parsers
mcp_parser = MCPParser()
json_parser = JSONParser()
structured_parser = StructuredDataParser()

medical_data = []

# Method 1: Resource-based ingestion
# Ingest from MCP resources (patient records, drug databases)
try:
    # Example: Ingest patient records resource
    patient_data = mcp_ingestor.ingest_resources(
        "medical_server",
        resource_uris=["resource://patients/records", "resource://drugs/interactions"]
    )
    
    for item in patient_data:
        medical_data.append(item)
        print(f"‚úì Ingested resource: {item.resource_uri}")
        
except Exception as e:
    print(f"‚ö† Resource ingestion: {e}")

# Method 2: Tool-based ingestion
# Call MCP tools to retrieve data dynamically
try:
    # Example: Query patient records
    patient_records = mcp_ingestor.ingest_tool_output(
        "medical_server",
        tool_name="query_patient_records",
        arguments={
            "patient_id": "P001",
            "date_range": {
                "start": (datetime.now() - timedelta(days=365)).isoformat(),
                "end": datetime.now().isoformat()
            }
        }
    )
    
    if patient_records:
        medical_data.append(patient_records)
        print(f"‚úì Retrieved patient records via tool")
        
    # Example: Check drug interactions
    drug_interactions = mcp_ingestor.ingest_tool_output(
        "medical_server",
        tool_name="check_drug_interactions",
        arguments={
            "medications": ["Lisinopril", "Metformin", "Aspirin"]
        }
    )
    
    if drug_interactions:
        medical_data.append(drug_interactions)
        print(f"‚úì Retrieved drug interactions via tool")
        
except Exception as e:
    print(f"‚ö† Tool-based ingestion: {e}")
    print("Note: Using sample data for demonstration")

# Sample medical data (if MCP server is not available)
if not medical_data:
    print("\nüìù Using sample medical data for demonstration:")
    sample_data = {
        "patient_records": [
            {
                "patient_id": "P001",
                "visit_date": (datetime.now() - timedelta(days=30)).isoformat(),
                "diagnosis": ["Hypertension", "Type 2 Diabetes"],
                "medications": ["Lisinopril 10mg", "Metformin 500mg"],
                "procedures": ["Blood Pressure Check", "HbA1c Test"],
                "doctor": "Dr. Smith",
                "notes": "Patient shows improvement in blood pressure control."
            },
            {
                "patient_id": "P002",
                "visit_date": (datetime.now() - timedelta(days=15)).isoformat(),
                "diagnosis": ["Asthma"],
                "medications": ["Albuterol Inhaler"],
                "procedures": ["Spirometry"],
                "doctor": "Dr. Johnson",
                "notes": "Asthma well controlled with current medication."
            }
        ],
        "drug_interactions": [
            {
                "drug1": "Lisinopril",
                "drug2": "Aspirin",
                "interaction_type": "moderate",
                "description": "May increase risk of kidney problems"
            },
            {
                "drug1": "Metformin",
                "drug2": "Alcohol",
                "interaction_type": "severe",
                "description": "May cause lactic acidosis"
            }
        ]
    }
    medical_data.append(sample_data)
    print(f"  Loaded {len(sample_data['patient_records'])} patient records")
    print(f"  Loaded {len(sample_data['drug_interactions'])} drug interactions")

print(f"\nüìä Total medical data items ingested: {len(medical_data)}")


## Step 3: Parse MCP Medical Data

Parse the medical data received from MCP server responses.


In [None]:
parsed_medical_data = []

# Parse MCP responses
for data_item in medical_data:
    try:
        # Parse MCP response (handles JSON, text, binary)
        if isinstance(data_item, dict):
            parsed_item = data_item
        else:
            parsed_item = mcp_parser.parse_response(data_item, response_type="json")
        
        parsed_medical_data.append(parsed_item)
        
    except Exception as e:
        print(f"‚ö† Parsing error: {e}")

# Extract patient records and drug interactions
patient_records = []
drug_interactions = []

for item in parsed_medical_data:
    if isinstance(item, dict):
        if "patient_records" in item:
            patient_records.extend(item["patient_records"])
        elif "patient_id" in item:
            patient_records.append(item)
        elif "drug_interactions" in item:
            drug_interactions.extend(item["drug_interactions"])
        elif "drug1" in item:
            drug_interactions.append(item)

print(f"‚úì Parsed {len(parsed_medical_data)} data items")
print(f"‚úì Extracted {len(patient_records)} patient records")
print(f"‚úì Extracted {len(drug_interactions)} drug interactions")


## Step 4: Extract Medical Entities and Relationships

Extract medical entities (patients, diagnoses, medications, procedures, doctors) and relationships from MCP data.


In [None]:
ner_extractor = NERExtractor()
relation_extractor = RelationExtractor()
triple_extractor = TripleExtractor()
semantic_analyzer = SemanticAnalyzer()

medical_entities = []
medical_relationships = []

# Extract from patient records
for record in patient_records:
    if isinstance(record, dict):
        patient_id = record.get("patient_id", "")
        
        # Patient entity
        medical_entities.append({
            "id": patient_id,
            "type": "Patient",
            "name": patient_id,
            "properties": {"visit_date": record.get("visit_date", "")}
        })
        
        # Diagnoses
        for diagnosis in record.get("diagnosis", []):
            medical_entities.append({
                "id": diagnosis,
                "type": "Diagnosis",
                "name": diagnosis,
                "properties": {}
            })
            medical_relationships.append({
                "source": patient_id,
                "target": diagnosis,
                "type": "has_diagnosis",
                "properties": {"timestamp": record.get("visit_date", "")}
            })
        
        # Medications
        for medication in record.get("medications", []):
            medical_entities.append({
                "id": medication,
                "type": "Medication",
                "name": medication,
                "properties": {}
            })
            medical_relationships.append({
                "source": patient_id,
                "target": medication,
                "type": "prescribed",
                "properties": {"timestamp": record.get("visit_date", "")}
            })
        
        # Procedures
        for procedure in record.get("procedures", []):
            medical_entities.append({
                "id": procedure,
                "type": "Procedure",
                "name": procedure,
                "properties": {}
            })
            medical_relationships.append({
                "source": patient_id,
                "target": procedure,
                "type": "underwent",
                "properties": {"timestamp": record.get("visit_date", "")}
            })
        
        # Doctor
        doctor = record.get("doctor", "")
        if doctor:
            medical_entities.append({
                "id": doctor,
                "type": "Doctor",
                "name": doctor,
                "properties": {}
            })
            medical_relationships.append({
                "source": doctor,
                "target": patient_id,
                "type": "treats",
                "properties": {"timestamp": record.get("visit_date", "")}
            })

# Extract from drug interactions
for interaction in drug_interactions:
    if isinstance(interaction, dict):
        drug1 = interaction.get("drug1", "")
        drug2 = interaction.get("drug2", "")
        interaction_type = interaction.get("interaction_type", "")
        
        if drug1 and drug2:
            medical_relationships.append({
                "source": drug1,
                "target": drug2,
                "type": "interacts_with",
                "properties": {
                    "interaction_type": interaction_type,
                    "description": interaction.get("description", "")
                }
            })

# Remove duplicates
seen_entities = set()
unique_entities = []
for entity in medical_entities:
    entity_key = (entity["id"], entity["type"])
    if entity_key not in seen_entities:
        seen_entities.add(entity_key)
        unique_entities.append(entity)

medical_entities = unique_entities

print(f"‚úì Extracted {len(medical_entities)} medical entities")
print(f"  - Patients: {len([e for e in medical_entities if e['type'] == 'Patient'])}")
print(f"  - Diagnoses: {len([e for e in medical_entities if e['type'] == 'Diagnosis'])}")
print(f"  - Medications: {len([e for e in medical_entities if e['type'] == 'Medication'])}")
print(f"  - Procedures: {len([e for e in medical_entities if e['type'] == 'Procedure'])}")
print(f"  - Doctors: {len([e for e in medical_entities if e['type'] == 'Doctor'])}")
print(f"‚úì Extracted {len(medical_relationships)} relationships")


## Step 5: Build Healthcare Knowledge Graph

Build a knowledge graph from the extracted medical entities and relationships, then store in triple store.


In [None]:
builder = GraphBuilder()
graph_validator = GraphValidator()
entity_resolver = EntityResolver()
graph_analyzer = GraphAnalyzer()

# Build knowledge graph
medical_kg = builder.build(medical_entities, medical_relationships)

# Validate and resolve entities
validated_kg = graph_validator.validate(medical_kg)
resolved_kg = entity_resolver.resolve_entities(validated_kg)

# Analyze graph structure
metrics = graph_analyzer.compute_metrics(resolved_kg)

# Store in triple store
triple_store = TripleStore()
triple_manager = TripleManager()
query_engine = QueryEngine()

triple_store.add_knowledge_graph(resolved_kg)
triple_manager.manage_triples(resolved_kg)

print(f"‚úì Built healthcare knowledge graph")
print(f"  Entities: {len(resolved_kg.get('entities', []))}")
print(f"  Relationships: {len(resolved_kg.get('relationships', []))}")
print(f"  Graph density: {metrics.get('density', 0):.3f}")
print(f"‚úì Stored in triple store")


## Step 6: Query and Analyze Medical Data

Query the healthcare knowledge graph and analyze medical patterns.


In [None]:
# Query patient data
patient_query = query_engine.query(
    "SELECT ?patient ?diagnosis WHERE { ?patient has_diagnosis ?diagnosis }"
)

# Query drug interactions
interaction_query = query_engine.query(
    "SELECT ?drug1 ?drug2 ?type WHERE { ?drug1 interacts_with ?drug2 }"
)

# Inference engine for medical rules
inference_engine = InferenceEngine()
rule_manager = RuleManager()
explanation_generator = ExplanationGenerator()

# Medical analysis rules
inference_engine.add_rule("IF has_diagnosis(Diabetes) AND prescribed(Metformin) THEN diabetes_treatment")
inference_engine.add_rule("IF interacts_with(Drug1, Drug2) AND interaction_type(severe) THEN contraindication")

# Add facts from medical data
for record in patient_records:
    if isinstance(record, dict):
        for diagnosis in record.get("diagnosis", []):
            inference_engine.add_fact({
                "patient": record.get("patient_id", ""),
                "diagnosis": diagnosis
            })
        for medication in record.get("medications", []):
            inference_engine.add_fact({
                "patient": record.get("patient_id", ""),
                "medication": medication
            })

for interaction in drug_interactions:
    if isinstance(interaction, dict):
        inference_engine.add_fact({
            "drug1": interaction.get("drug1", ""),
            "drug2": interaction.get("drug2", ""),
            "interaction_type": interaction.get("interaction_type", "")
        })

# Generate medical insights
medical_insights = inference_engine.forward_chain()

print(f"‚úì Queried healthcare knowledge graph")
print(f"  Patient-diagnosis relationships: {len(patient_query.get('results', []))}")
print(f"  Drug interactions: {len(interaction_query.get('results', []))}")
print(f"  Medical insights: {len(medical_insights)}")

# Quality assessment
quality_assessor = KGQualityAssessor()
validation_engine = ValidationEngine()

quality_metrics = quality_assessor.assess_quality(resolved_kg)
validation_results = validation_engine.validate(resolved_kg)

print(f"‚úì Quality assessment completed")
print(f"  Quality score: {quality_metrics.get('overall_score', 0):.2f}")


## Step 7: Export and Visualize

Export the healthcare knowledge graph and generate visualizations.


In [None]:
import tempfile
import os

temp_dir = tempfile.mkdtemp()

json_exporter = JSONExporter()
rdf_exporter = RDFExporter()
owl_exporter = OWLExporter()
report_generator = ReportGenerator()

# Export knowledge graph
json_exporter.export_knowledge_graph(resolved_kg, os.path.join(temp_dir, "medical_kg.json"))
rdf_exporter.export_knowledge_graph(resolved_kg, os.path.join(temp_dir, "medical_kg.rdf"))
owl_exporter.export_knowledge_graph(resolved_kg, os.path.join(temp_dir, "medical_kg.owl"))

# Generate report
report_data = {
    "summary": f"Medical database integration from MCP server identified {len(medical_insights)} insights",
    "patients": len([e for e in medical_entities if e['type'] == 'Patient']),
    "diagnoses": len([e for e in medical_entities if e['type'] == 'Diagnosis']),
    "medications": len([e for e in medical_entities if e['type'] == 'Medication']),
    "drug_interactions": len(drug_interactions),
    "insights": len(medical_insights)
}

report = report_generator.generate_report(report_data, format="markdown")

print("‚úì Exported healthcare knowledge graph")
print(f"  JSON: {os.path.join(temp_dir, 'medical_kg.json')}")
print(f"  RDF: {os.path.join(temp_dir, 'medical_kg.rdf')}")
print(f"  OWL: {os.path.join(temp_dir, 'medical_kg.owl')}")
print(f"‚úì Generated report ({len(report)} characters)")

# Visualize
kg_visualizer = KGVisualizer()
ontology_visualizer = OntologyVisualizer()
temporal_visualizer = TemporalVisualizer()

kg_viz = kg_visualizer.visualize_network(resolved_kg, output="interactive")
ontology_viz = ontology_visualizer.visualize_ontology(resolved_kg, output="interactive")
temporal_viz = temporal_visualizer.visualize_timeline(resolved_kg, output="interactive")

print("‚úì Generated visualizations for healthcare knowledge graph")

# Cleanup: Disconnect from MCP server
try:
    mcp_ingestor.disconnect("medical_server")
    print("\n‚úì Disconnected from MCP server")
except:
    pass

print(f"\n‚úÖ Pipeline complete: MCP Server ‚Üí Ingest ‚Üí Parse ‚Üí Extract ‚Üí Build KG ‚Üí Query ‚Üí Export ‚Üí Visualize")
print(f"üìä Total modules used: 20+")
