# Medical Ontology MCP - Research Workflow

This notebook demonstrates a complete research workflow using the Medical Ontology MCP server for clinical coding and terminology analysis.

## Overview

We'll cover:
1. Setting up the Medical Ontology MCP
2. Searching for medical terms
3. Looking up specific codes
4. Batch processing clinical data
5. Cross-ontology mapping
6. Exporting results for statistical analysis

## Prerequisites

- Medical ontology data (SNOMED CT, ICD-10, RxNorm, LOINC)
- Python packages: `pip install medical-ontology-mcp[jupyter,research]`

## 1. Setup and Configuration

In [None]:
# Load the Medical Ontology MCP magic commands
%load_ext medical_ontology_mcp.jupyter_magic

# Configure data path (adjust to your data location)
%medical_config data_path=../data

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from medical_ontology_mcp.client import MedicalOntologyClient

# Configure plotting
plt.style.use('default')
sns.set_palette("husl")

print("✅ Libraries imported successfully")

## 2. Basic Medical Term Search

In [None]:
# Search for diabetes-related terms
%medical_search diabetes mellitus

In [None]:
# Search in specific ontology
%medical_search --ontology ICD10 --limit 10 hypertension

In [None]:
# Look up specific codes
%medical_lookup ICD10 E11.9

In [None]:
%medical_lookup RxNorm 6809

## 3. Batch Processing Clinical Conditions

Process multiple medical conditions at once for research analysis.

In [None]:
# Batch process common chronic conditions
%%medical_batch --limit 3
diabetes mellitus
hypertension
chronic kidney disease
coronary artery disease
chronic obstructive pulmonary disease
heart failure
atrial fibrillation
depression
osteoarthritis
asthma

In [None]:
# Analyze the batch results
df = _medical_batch_df
print(f"📊 Batch Processing Summary:")
print(f"   Total queries: {df['query'].nunique()}")
print(f"   Total results: {len(df)}")
print(f"   Ontologies covered: {', '.join(df['ontology'].unique())}")

# Display results by ontology
ontology_counts = df.groupby('ontology').size()
print(f"\n📋 Results by Ontology:")
for ontology, count in ontology_counts.items():
    print(f"   {ontology}: {count} results")

## 4. Research Analysis and Visualization

In [None]:
# Create visualization of search results
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# Results by ontology
ontology_counts.plot(kind='bar', ax=axes[0], color='skyblue')
axes[0].set_title('Medical Codes Found by Ontology')
axes[0].set_ylabel('Number of Codes')
axes[0].tick_params(axis='x', rotation=45)

# Results by condition
condition_counts = df.groupby('query').size().sort_values(ascending=False)
condition_counts.head(8).plot(kind='barh', ax=axes[1], color='lightcoral')
axes[1].set_title('Medical Codes Found by Condition')
axes[1].set_xlabel('Number of Codes')

plt.tight_layout()
plt.show()

print("📈 Visualization complete")

## 5. Clinical Data Processing Example

Simulate processing clinical notes or discharge summaries.

In [None]:
# Simulate clinical notes data
clinical_notes = [
    {
        'patient_id': 'P001',
        'note': 'Patient presents with type 2 diabetes mellitus and hypertension. Blood pressure elevated.',
        'primary_diagnosis': 'diabetes mellitus type 2',
        'secondary_diagnosis': 'hypertension'
    },
    {
        'patient_id': 'P002', 
        'note': 'Acute myocardial infarction with subsequent heart failure. Patient stable.',
        'primary_diagnosis': 'myocardial infarction',
        'secondary_diagnosis': 'heart failure'
    },
    {
        'patient_id': 'P003',
        'note': 'Chronic kidney disease stage 4 with anemia. Requires dialysis planning.',
        'primary_diagnosis': 'chronic kidney disease',
        'secondary_diagnosis': 'anemia'
    },
    {
        'patient_id': 'P004',
        'note': 'Pneumonia with COPD exacerbation. Started on antibiotics and bronchodilators.',
        'primary_diagnosis': 'pneumonia',
        'secondary_diagnosis': 'COPD exacerbation'
    }
]

clinical_df = pd.DataFrame(clinical_notes)
print("📋 Clinical Notes Dataset:")
display(clinical_df)

In [None]:
# Process diagnoses and get ICD-10 codes
async def process_clinical_data(clinical_df):
    """Process clinical data and add ICD-10 codes"""
    client = MedicalOntologyClient(data_path='../data')
    await client.initialize()
    
    # Process primary diagnoses
    primary_codes = []
    secondary_codes = []
    
    for _, row in clinical_df.iterrows():
        # Get ICD-10 codes for primary diagnosis
        primary_results = await client.search(row['primary_diagnosis'], ['ICD10'], limit=1)
        if primary_results['ICD10']:
            primary_codes.append(primary_results['ICD10'][0]['code'])
        else:
            primary_codes.append('Not found')
        
        # Get ICD-10 codes for secondary diagnosis
        secondary_results = await client.search(row['secondary_diagnosis'], ['ICD10'], limit=1)
        if secondary_results['ICD10']:
            secondary_codes.append(secondary_results['ICD10'][0]['code'])
        else:
            secondary_codes.append('Not found')
    
    await client.close()
    
    return primary_codes, secondary_codes

# Run the processing
import asyncio

primary_codes, secondary_codes = await process_clinical_data(clinical_df)

# Add codes to dataframe
clinical_df['primary_icd10'] = primary_codes
clinical_df['secondary_icd10'] = secondary_codes

print("✅ Clinical data processed with ICD-10 codes:")
display(clinical_df[['patient_id', 'primary_diagnosis', 'primary_icd10', 'secondary_diagnosis', 'secondary_icd10']])

## 6. Medication Analysis with RxNorm

In [None]:
# Search for common diabetes medications
%%medical_batch --ontology RxNorm --limit 2
metformin
insulin
glipizide
linagliptin
empagliflozin

In [None]:
# Analyze medication results
medication_df = _medical_batch_df
print(f"💊 Medication Analysis:")
print(f"   Medications searched: {medication_df['query'].nunique()}")
print(f"   RxNorm codes found: {len(medication_df)}")

# Group by medication
med_summary = medication_df.groupby('query').agg({
    'code': 'count',
    'preferred_term': lambda x: '; '.join(x)
}).rename(columns={'code': 'codes_found', 'preferred_term': 'formulations'})

display(med_summary)

## 7. Export for Statistical Analysis

In [None]:
# Prepare data for export to R/STATA/SPSS
export_df = clinical_df.copy()

# Create binary indicators for common conditions
conditions = ['diabetes', 'hypertension', 'heart failure', 'kidney disease']

for condition in conditions:
    export_df[f'has_{condition.replace(" ", "_")}'] = (
        export_df['primary_diagnosis'].str.contains(condition, case=False) |
        export_df['secondary_diagnosis'].str.contains(condition, case=False)
    ).astype(int)

# Add ICD-10 chapter information
def get_icd_chapter(code):
    """Extract ICD-10 chapter from code"""
    if pd.isna(code) or code == 'Not found':
        return None
    if code.startswith('E'):
        return 'Endocrine'
    elif code.startswith('I'):
        return 'Circulatory'
    elif code.startswith('N'):
        return 'Genitourinary'
    elif code.startswith('J'):
        return 'Respiratory'
    else:
        return 'Other'

export_df['primary_chapter'] = export_df['primary_icd10'].apply(get_icd_chapter)
export_df['secondary_chapter'] = export_df['secondary_icd10'].apply(get_icd_chapter)

print("📊 Data prepared for statistical analysis:")
display(export_df)

In [None]:
# Export to various formats
export_df.to_csv('clinical_data_coded.csv', index=False)
export_df.to_excel('clinical_data_coded.xlsx', index=False)

# Create STATA-compatible variable labels
stata_labels = {
    'patient_id': 'Patient Identifier',
    'primary_icd10': 'Primary Diagnosis ICD-10 Code',
    'secondary_icd10': 'Secondary Diagnosis ICD-10 Code',
    'has_diabetes': 'Has Diabetes (0=No, 1=Yes)',
    'has_hypertension': 'Has Hypertension (0=No, 1=Yes)',
    'primary_chapter': 'ICD-10 Chapter for Primary Diagnosis'
}

# Save labels to file
with open('variable_labels.txt', 'w') as f:
    for var, label in stata_labels.items():
        f.write(f'{var}: {label}\n')

print("✅ Data exported to:")
print("   - clinical_data_coded.csv (for R/Python)")
print("   - clinical_data_coded.xlsx (for Excel)")
print("   - variable_labels.txt (for STATA/SPSS)")

## 8. Research Summary and Next Steps

In [None]:
# Generate research summary
summary = {
    'total_patients': len(clinical_df),
    'conditions_analyzed': len(conditions),
    'icd10_codes_found': len([c for c in clinical_df['primary_icd10'] if c != 'Not found']),
    'medications_searched': medication_df['query'].nunique() if 'medication_df' in locals() else 0,
    'ontologies_used': ['ICD-10', 'RxNorm', 'SNOMED CT', 'LOINC']
}

print("📋 Research Workflow Summary:")
print("="*50)
for key, value in summary.items():
    if isinstance(value, list):
        print(f"{key.replace('_', ' ').title()}: {', '.join(value)}")
    else:
        print(f"{key.replace('_', ' ').title()}: {value}")

print("\n🎯 Recommended Next Steps:")
print("1. Scale up to larger clinical datasets")
print("2. Implement cross-ontology mapping")
print("3. Add natural language processing for free-text notes")
print("4. Integrate with electronic health record systems")
print("5. Develop predictive models using coded data")

print("\n📖 Additional Resources:")
print("- Medical Ontology MCP Documentation: https://medical-ontology-mcp.readthedocs.io/")
print("- FAIR Data Principles: https://www.go-fair.org/fair-principles/")
print("- Clinical Research Informatics: https://www.ncbi.nlm.nih.gov/pmc/")

## Citation

If you use this workflow in your research, please cite:

```bibtex
@software{medical_ontology_mcp,
  author = {Medical Informatics Research Team},
  title = {Medical Ontology MCP Server},
  year = {2024},
  publisher = {GitHub},
  url = {https://github.com/sajor2000/mcp_medicalterminology}
}
```

---

**Note**: This software is for research purposes only and should not be used for clinical decision making.