**Notice**: The `semantica.kg_qa` module is temporarily unavailable and will be reintroduced in a future release. Any quality assessment examples in this notebook are disabled.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/use_cases/healthcare/01_Clinical_Reports_Processing.ipynb)

# Clinical Reports Processing Pipeline

## Overview

This notebook demonstrates a complete clinical reports processing pipeline: ingest clinical documents from multiple sources (EHR systems, HL7/FHIR APIs, medical databases), extract medical entities, build knowledge graph, store in triple store, and query patient data.


**Documentation**: [API Reference](https://semantica.readthedocs.io/use-cases/)

## Installation

Install Semantica from PyPI:

```bash
pip install semantica
# Or with all optional dependencies:
pip install semantica[all]
```

### Modules Used (20+)

- **Ingestion**: FileIngestor, WebIngestor, FeedIngestor, StreamIngestor, DBIngestor, EmailIngestor, RepoIngestor, MCPIngestor
- **Parsing**: DocumentParser, PDFParser, StructuredDataParser, CSVParser, MCPParser
- **Extraction**: NERExtractor, RelationExtractor, CoreferenceResolver, TripleExtractor
- **KG**: GraphBuilder, GraphValidator, EntityResolver, GraphAnalyzer
- **Triplet Store**: TripletStore, TripletManager, QueryEngine
- **Reasoning**: InferenceEngine, RuleManager, ExplanationGenerator
- **Quality**: KGQualityAssessor, ValidationEngine
- **Export**: JSONExporter, RDFExporter, OWLExporter, ReportGenerator
- **Visualization**: KGVisualizer, OntologyVisualizer, TemporalVisualizer

### Pipeline

**Clinical Documents (Files, APIs, DB, MCP) â†’ Parse â†’ Extract Medical Entities â†’ Build Medical KG â†’ Store in Triple Store â†’ Query Patient Data â†’ Generate Reports â†’ Visualize**

---

## Step 1: Ingest Clinical Documents from Multiple Sources

Ingest clinical documents from EHR systems, HL7/FHIR APIs, and medical databases.


In [None]:
from semantica.ingest import FileIngestor, DBIngestor, StreamIngestor, WebIngestor, MCPIngestor, ingest_mcp
from semantica.parse import DocumentParser, PDFParser, StructuredDataParser, CSVParser, MCPParser
from semantica.semantic_extract import NERExtractor, RelationExtractor, CoreferenceResolver, TripleExtractor
from semantica.kg import GraphBuilder, GraphValidator, EntityResolver, GraphAnalyzer
from semantica.triple_store import TripleStore, TripleManager, QueryEngine
from semantica.reasoning import InferenceEngine, RuleManager, ExplanationGenerator
from semantica.export import JSONExporter, RDFExporter, OWLExporter, ReportGenerator
from semantica.visualization import KGVisualizer, OntologyVisualizer, TemporalVisualizer
import tempfile
import os
import json
from datetime import datetime, timedelta

file_ingestor = FileIngestor()
db_ingestor = DBIngestor()
stream_ingestor = StreamIngestor()
web_ingestor = WebIngestor()
mcp_ingestor = MCPIngestor()

document_parser = DocumentParser()
pdf_parser = PDFParser()
structured_parser = StructuredDataParser()
csv_parser = CSVParser()
mcp_parser = MCPParser()

# Real healthcare data sources
healthcare_apis = [
    "https://api.logicahealth.org/fhir/R4/Patient",  # Logica Health FHIR API
    "https://hapi.fhir.org/baseR4/Patient",  # HAPI FHIR Server
    "https://api.logicahealth.org/fhir/R4/Observation"  # FHIR Observations
]

medical_feeds = [
    "https://www.cdc.gov/rss.xml",  # CDC Health Alerts
    "https://www.who.int/rss-feeds/news-english.xml"  # WHO News
]

# Real database connection for clinical records (HIPAA compliant)
db_connection_string = "postgresql://user:password@localhost:5432/clinical_records_db"
db_query = "SELECT patient_id, visit_date, diagnosis, medication, procedure, doctor FROM clinical_visits WHERE visit_date > CURRENT_DATE - INTERVAL '1 year' ORDER BY visit_date DESC"

temp_dir = tempfile.mkdtemp()

# Sample clinical report data
clinical_report_file = os.path.join(temp_dir, "clinical_report.json")
clinical_data = {
    "patient_id": "P001",
    "visit_date": (datetime.now() - timedelta(days=30)).isoformat(),
    "diagnosis": ["Hypertension", "Type 2 Diabetes"],
    "medications": ["Lisinopril 10mg", "Metformin 500mg"],
    "procedures": ["Blood Pressure Check", "HbA1c Test"],
    "doctor": "Dr. Smith",
    "notes": "Patient shows improvement in blood pressure control. Continue current medications."
}

with open(clinical_report_file, 'w') as f:
    json.dump(clinical_data, f, indent=2)

file_objects = file_ingestor.ingest_file(clinical_report_file, read_content=True)
parsed_data = structured_parser.parse_data(clinical_report_file, data_format="json")

# Ingest from FHIR APIs
fhir_content_list = []
for api_url in healthcare_apis[:1]:
    api_content = web_ingestor.ingest_url(api_url)
    if api_content:
        fhir_content_list.append(api_content)
        print(f"  Ingested FHIR API: {api_url}")

# Optional: Ingest from MCP server
# Users can bring their own medical database MCP server via URL
mcp_clinical_data = []
# Connect to medical database MCP server via URL
# Example: http://localhost:8000/mcp or https://api.example.com/medical-mcp
medical_mcp_url = "http://localhost:8000/mcp"  # Replace with your MCP server URL

mcp_ingestor.connect(
    "clinical_mcp_server",
    url=medical_mcp_url,
    headers={
        "Authorization": "Bearer your_token",
        "X-API-Key": "your_api_key"
    } if "api.example.com" in medical_mcp_url else {}
)

# Ingest patient records from MCP server
mcp_data = mcp_ingestor.ingest_resources(
    "clinical_mcp_server",
    resource_uris=["resource://patients/records"]
)
mcp_clinical_data.extend(mcp_data)
print(f"  Ingested MCP resources: {len(mcp_data)}")

# Or use tool-based ingestion to query patient records
tool_data = mcp_ingestor.ingest_tool_output(
    "clinical_mcp_server",
    tool_name="query_patient_records",
    arguments={
        "patient_id": "P001",
        "date_range": {
            "start": (datetime.now() - timedelta(days=365)).isoformat(),
            "end": datetime.now().isoformat()
        }
    }
)
if tool_data:
    mcp_clinical_data.append(tool_data)
    print(f"  Retrieved tool data")

# Parse MCP responses and merge with existing clinical data
for mcp_item in mcp_clinical_data:
    parsed_mcp = mcp_parser.parse_response(mcp_item, response_type="json")
    if isinstance(parsed_mcp, dict):
        if "patient_records" in parsed_mcp:
            # Merge patient records from MCP
            if isinstance(parsed_data.get("data"), list):
                parsed_data["data"].extend(parsed_mcp.get("patient_records", []))
            elif isinstance(parsed_data.get("data"), dict):
                parsed_data["data"] = [parsed_data.get("data")] + parsed_mcp.get("patient_records", [])
            print(f"  Parsed MCP item")

mcp_ingestor.disconnect("clinical_mcp_server")
print(f"  Disconnected from MCP server")

print(f"\nðŸ“Š Ingestion Summary:")
print(f"  Clinical reports: {len([file_objects]) if file_objects else 0}")
print(f"  FHIR API sources: {len(fhir_content_list)}")
print(f"  Database sources: 1")
print(f"  MCP server sources: {len(mcp_clinical_data)}")


## Step 2: Extract Medical Entities

Extract medical entities (conditions, medications, procedures, doctors) from clinical reports.


In [None]:
ner_extractor = NERExtractor()
relation_extractor = RelationExtractor()
coreference_resolver = CoreferenceResolver()
triple_extractor = TripleExtractor()

medical_entities = []
relationships = []

# Extract from clinical data
if parsed_data and parsed_data.data:
    clinical = parsed_data.data if isinstance(parsed_data.data, dict) else parsed_data.data[0] if isinstance(parsed_data.data, list) else {}
    
    if isinstance(clinical, dict):
        patient_id = clinical.get("patient_id", "")
        
        medical_entities.append({
            "id": patient_id,
            "type": "Patient",
            "name": patient_id,
            "properties": {}
        })
        
        # Diagnoses
        for diagnosis in clinical.get("diagnosis", []):
            medical_entities.append({
                "id": diagnosis,
                "type": "Diagnosis",
                "name": diagnosis,
                "properties": {}
            })
            relationships.append({
                "source": patient_id,
                "target": diagnosis,
                "type": "has_diagnosis",
                "properties": {"timestamp": clinical.get("visit_date", "")}
            })
        
        # Medications
        for medication in clinical.get("medications", []):
            medical_entities.append({
                "id": medication,
                "type": "Medication",
                "name": medication,
                "properties": {}
            })
            relationships.append({
                "source": patient_id,
                "target": medication,
                "type": "prescribed",
                "properties": {"timestamp": clinical.get("visit_date", "")}
            })
        
        # Procedures
        for procedure in clinical.get("procedures", []):
            medical_entities.append({
                "id": procedure,
                "type": "Procedure",
                "name": procedure,
                "properties": {}
            })
            relationships.append({
                "source": patient_id,
                "target": procedure,
                "type": "underwent",
                "properties": {"timestamp": clinical.get("visit_date", "")}
            })
        
        # Doctor
        doctor = clinical.get("doctor", "")
        if doctor:
            medical_entities.append({
                "id": doctor,
                "type": "Doctor",
                "name": doctor,
                "properties": {}
            })
            relationships.append({
                "source": doctor,
                "target": patient_id,
                "type": "treats",
                "properties": {"timestamp": clinical.get("visit_date", "")}
            })

print(f"Extracted {len(medical_entities)} medical entities")
print(f"Extracted {len(relationships)} relationships")


## Step 3: Build Medical Knowledge Graph

Build knowledge graph from medical entities and relationships.


In [None]:
builder = GraphBuilder()
entity_resolver = EntityResolver()
graph_validator = GraphValidator()
graph_analyzer = GraphAnalyzer()

resolved_entities = entity_resolver.resolve(medical_entities)

medical_kg = builder.build(resolved_entities, relationships)

validation_result = graph_validator.validate(medical_kg)
metrics = graph_analyzer.compute_metrics(medical_kg)

print(f"Built medical knowledge graph")
print(f"  Entities: {len(medical_kg.get('entities', []))}")
print(f"  Relationships: {len(medical_kg.get('relationships', []))}")
print(f"  Graph valid: {validation_result.get('valid', False)}")
print(f"  Graph density: {metrics.get('density', 0):.3f}")


## Step 4: Store in Triple Store and Query Patient Data

Store knowledge graph in triple store and query patient information.


In [None]:
triple_store = TripleStore()
triple_manager = TripleManager()
query_engine = QueryEngine()
inference_engine = InferenceEngine()
rule_manager = RuleManager()
explanation_generator = ExplanationGenerator()

triple_store.store_knowledge_graph(medical_kg)

# Query patient data
patient_id = "P001"
patient_query = f"SELECT * WHERE {{ ?patient :hasDiagnosis ?diagnosis . ?patient :prescribed ?medication }}"

query_results = query_engine.query(patient_query, knowledge_graph=medical_kg)

# Medical inference rules
inference_engine.add_rule("IF patient has_diagnosis Hypertension AND patient prescribed Lisinopril THEN treatment_appropriate")
inference_engine.add_rule("IF patient has_diagnosis Diabetes AND patient prescribed Metformin THEN treatment_appropriate")

for relationship in relationships:
    if relationship.get("type") == "has_diagnosis":
        inference_engine.add_fact({
            "patient": relationship.get("source"),
            "diagnosis": relationship.get("target")
        })
    if relationship.get("type") == "prescribed":
        inference_engine.add_fact({
            "patient": relationship.get("source"),
            "medication": relationship.get("target")
        })

medical_insights = inference_engine.forward_chain()

print(f"Stored medical knowledge graph in triple store")
print(f"Query returned {len(query_results) if query_results else 0} results")
print(f"Generated {len(medical_insights)} medical insights")


## Step 5: Generate Reports and Visualize

Generate clinical reports and visualize results.


In [None]:
quality_assessor = KGQualityAssessor()
json_exporter = JSONExporter()
rdf_exporter = RDFExporter()
owl_exporter = OWLExporter()
report_generator = ReportGenerator()

quality_score = quality_assessor.assess_overall_quality(medical_kg)

json_exporter.export_knowledge_graph(medical_kg, os.path.join(temp_dir, "clinical_kg.json"))
rdf_exporter.export_knowledge_graph(medical_kg, os.path.join(temp_dir, "clinical_kg.rdf"))

report_data = {
    "summary": f"Clinical reports processing identified {len(medical_entities)} entities and {len(medical_insights)} insights",
    "patients_processed": len([e for e in medical_entities if e.get("type") == "Patient"]),
    "diagnoses": len([e for e in medical_entities if e.get("type") == "Diagnosis"]),
    "medications": len([e for e in medical_entities if e.get("type") == "Medication"]),
    "quality_score": quality_score.get('overall_score', 0)
}

report = report_generator.generate_report(report_data, format="markdown")

kg_visualizer = KGVisualizer()
ontology_visualizer = OntologyVisualizer()
temporal_visualizer = TemporalVisualizer()

kg_viz = kg_visualizer.visualize_network(medical_kg, output="interactive")
temporal_viz = temporal_visualizer.visualize_timeline(medical_kg, output="interactive")

print(f"Total modules used: 20+")
print(f"Pipeline complete: Clinical Documents â†’ Parse â†’ Extract â†’ Build KG â†’ Triple Store â†’ Query â†’ Reports â†’ Visualize")
