# MedScrub MCP-Powered Workflow 🤖

**AI-Assisted Healthcare Data Analysis with Claude Code**

This notebook demonstrates MedScrub's **unique capability**: combining Jupyter notebooks with Claude Code (via MCP) for AI-assisted healthcare data analysis where PHI is automatically de-identified.

## What Makes This Special? ⭐

**Traditional approach:**
1. Manually write code to de-identify data
2. Manually write analysis code
3. Risk PHI exposure when asking AI for help

**MCP-powered approach:**
1. Ask Claude to analyze healthcare data
2. Claude automatically de-identifies PHI using MCP tools
3. Claude performs analysis on safe data
4. Zero PHI exposure to AI services

## Prerequisites

- ✅ **Completed:** [01_quickstart_api.ipynb](./01_quickstart_api.ipynb)
- ✅ **MedScrub MCP Server:** Installed and configured (see [SETUP.md](../SETUP.md))
- ✅ **Claude Desktop:** Running with MCP connection

**Time:** 15-20 minutes

---
## Part 1: Understanding MCP (Model Context Protocol)

### What is MCP?

**MCP (Model Context Protocol)** is an open standard that allows AI assistants like Claude to:
- Access local tools and resources
- Interact with your development environment
- Execute functions on your behalf

### MedScrub MCP Server

The MedScrub MCP server provides 6 tools to Claude:

| Tool | Purpose |
|------|--------|
| `medscrub__deidentify_fhir` | De-identify FHIR resources |
| `medscrub__reidentify_fhir` | Restore original FHIR data |
| `medscrub__deidentify_text` | De-identify clinical text |
| `medscrub__reidentify_text` | Restore original text |
| `medscrub__get_session_info` | Get session metadata |
| `medscrub__list_phi_types` | List all PHI types detected |

### How It Works

```
You (in Jupyter):
  "I have patient data that needs analysis"
  
      ↓
      
Claude Desktop:
  1. Reads your Jupyter notebook
  2. Sees patient data contains PHI
  3. Automatically calls medscrub__deidentify_fhir
  4. Receives de-identified data
  5. Performs analysis safely
  6. Returns insights (no PHI exposed)
```

---
## Part 2: Setup and Verify MCP Connection

First, let's set up our environment and verify MCP is working.

In [None]:
# Standard setup
import os
import json
from dotenv import load_dotenv
from medscrub_client import MedScrubClient

# Load environment
load_dotenv()
jwt_token = os.getenv('MEDSCRUB_JWT_TOKEN')
api_url = os.getenv('MEDSCRUB_API_URL', 'https://api.medscrub.dev')

# Initialize client (for direct API calls when needed)
client = MedScrubClient(jwt_token=jwt_token, api_url=api_url)

print("✓ Environment loaded")
print("\n📌 IMPORTANT: This notebook works best with Claude Desktop open!")
print("   Open Claude Desktop and verify MCP connection.")

### ✋ Verify MCP Connection in Claude Desktop

**Before continuing, test your MCP connection:**

1. Open **Claude Desktop** (not the web app)
2. Start a new conversation
3. Ask Claude: `"What MCP servers are connected?"`

**Expected response:**
```
I can see the following MCP servers:
- medscrub
```

**If you don't see `medscrub`:**
- Follow the setup guide: [SETUP.md](../SETUP.md)
- Common fix: Restart Claude Desktop completely

**✅ Once verified, continue to Part 3 below.**

---
## Part 3: Load Sample Patient Data

Let's load a patient cohort for analysis. This data contains PHI.

In [None]:
# Load sample patient
with open('sample_data/patient_john_doe.json', 'r') as f:
    patient = json.load(f)

# Display patient info
print("Sample Patient Resource (contains PHI):")
print("="*60)
print(f"Name: {patient['name'][0]['text']}")
print(f"DOB: {patient['birthDate']}")
print(f"Gender: {patient['gender']}")
print(f"Phone: {patient['telecom'][0]['value']}")
print(f"Email: {patient['telecom'][2]['value']}")
print(f"Address: {patient['address'][0]['line'][0]}, {patient['address'][0]['city']}")
print(f"MRN: {patient['identifier'][0]['value']}")
print("="*60)
print("\n⚠️ This data contains PHI and should NOT be sent to AI without de-identification!")

---
## Part 4: AI-Assisted De-identification (The Magic!) ✨

Now for the **game-changing workflow**:

### Traditional Approach (Manual)
```python
# You write this code:
result = client.deidentify_fhir(patient)
deidentified = result['deidentifiedResource']
session_id = result['sessionId']
# ... more manual code
```

### MCP-Powered Approach (AI-Assisted)

**Just ask Claude in Claude Desktop:**

```
I have a FHIR Patient resource in my Jupyter notebook (the 'patient' variable).
Please use medscrub__deidentify_fhir to remove all PHI from it.

[paste the patient JSON from above]
```

**Claude will:**
1. ✅ Automatically call `medscrub__deidentify_fhir` tool
2. ✅ Send your patient data to MedScrub API (via MCP)
3. ✅ Receive de-identified resource + session ID
4. ✅ Show you the results

**Try it now!** 👆

### Store the De-identified Result

After Claude de-identifies the data, you can store the result in this notebook:

In [None]:
# OPTION A: Use Claude's result (paste the de-identified resource here)
# deidentified_patient = { ... paste from Claude ... }
# session_id = "... paste session ID from Claude ..."

# OPTION B: Use direct API call (traditional approach)
result = client.deidentify_fhir(patient)
deidentified_patient = result['deidentifiedResource']
session_id = result['sessionId']

print("✓ De-identified patient resource stored")
print(f"Session ID: {session_id}")
print(f"\nDe-identified name: {deidentified_patient['name'][0]['family']}")
print(f"Original name was: {patient['name'][0]['text']}")

---
## Part 5: AI-Assisted Data Analysis

Now that data is de-identified, let's use Claude for analysis.

### Example Analysis Tasks

**Ask Claude in Claude Desktop:**

#### Task 1: Extract Key Information
```
From the de-identified patient data, extract:
- Age (calculated from birth date)
- Gender
- Number of contact methods
- Address state

Show as a summary table.
```

#### Task 2: Generate Analysis Code
```
Write Python code to:
1. Load multiple de-identified patient resources
2. Calculate average age
3. Count patients by gender
4. Identify most common state

Make it work with the 'deidentified_patient' variable.
```

#### Task 3: Create Visualization
```
Create a bar chart showing:
- Number of phone contacts
- Number of email contacts

Use matplotlib and make it look professional.
```

**Try one or more tasks above!** Claude will generate code you can run in the cell below.

In [None]:
# Run Claude-generated analysis code here
# (paste code from Claude Desktop)

# Example: Simple analysis
from datetime import datetime

# Calculate age
birth_date = datetime.strptime(patient['birthDate'], '%Y-%m-%d')
age = (datetime.now() - birth_date).days // 365

print(f"Patient Analysis (De-identified):")
print(f"Age: {age} years")
print(f"Gender: {deidentified_patient['gender']}")
print(f"Contact methods: {len(deidentified_patient['telecom'])}")
print(f"State: {deidentified_patient['address'][0]['state']}")

---
## Part 6: Multi-Patient Cohort Analysis

Let's scale this up to analyze multiple patients.

### Scenario: Diabetes Patient Cohort

Imagine you have 10 diabetic patients and want to analyze medication patterns.

In [None]:
# Create a small cohort (for demonstration)
# In practice, you'd load from a database or file

patient_cohort = [
    patient,  # Our sample patient
    # In real scenario: load more patients here
]

print(f"Cohort size: {len(patient_cohort)} patients")
print("\n💡 With MCP, you can ask Claude:")
print('"De-identify all patients in the cohort and analyze demographics"')

### Ask Claude to Analyze the Cohort

**In Claude Desktop:**
```
I have a patient cohort in my Jupyter notebook.

Please:
1. De-identify each patient using medscrub__deidentify_fhir
2. Extract demographics (age, gender, state)
3. Calculate summary statistics
4. Create a pandas DataFrame

Use the 'patient_cohort' variable.
```

Claude will generate code like this:

In [None]:
# Claude-generated cohort analysis (example)
import pandas as pd

# De-identify all patients
deidentified_cohort = []
for p in patient_cohort:
    result = client.deidentify_fhir(p)
    deidentified_cohort.append(result['deidentifiedResource'])

# Extract demographics
demographics = []
for p in deidentified_cohort:
    birth_date = datetime.strptime(patient_cohort[0]['birthDate'], '%Y-%m-%d')
    age = (datetime.now() - birth_date).days // 365
    
    demographics.append({
        'age': age,
        'gender': p['gender'],
        'state': p['address'][0]['state'],
        'contact_methods': len(p['telecom'])
    })

# Create DataFrame
df = pd.DataFrame(demographics)

print("Cohort Demographics (De-identified):")
print(df)
print(f"\nAverage age: {df['age'].mean():.1f} years")
print(f"Gender distribution:\n{df['gender'].value_counts()}")

---
## Part 7: Re-identification When Needed

After analysis, you may need to restore original PHI (e.g., to contact a patient).

### AI-Assisted Re-identification

In [None]:
# OPTION A: Ask Claude in Claude Desktop
# "Use medscrub__reidentify_fhir to restore the original patient data"
# Claude will use the session_id to restore PHI

# OPTION B: Direct API call
reidentified = client.reidentify_fhir(deidentified_patient, session_id)
original_patient = reidentified['reidentifiedResource']

print("Re-identified Patient:")
print(f"Name: {original_patient['name'][0]['text']}")
print(f"Email: {original_patient['telecom'][2]['value']}")
print(f"Phone: {original_patient['telecom'][0]['value']}")

# Verify perfect restoration
if original_patient == patient:
    print("\n✓ Perfect match! Re-identification is 100% accurate.")

---
## Part 8: Real-World Workflow Example

### Complete Clinical Research Workflow

Here's a complete workflow you can use in production:

In [None]:
print("REAL-WORLD WORKFLOW: Clinical Research Study")
print("="*60)

# Step 1: Load patient cohort (from database, EHR, etc.)
print("\n1. Load patient cohort from database")
print("   → Loaded 10 diabetic patients")

# Step 2: De-identify all patient data
print("\n2. De-identify patient data (MCP or API)")
result = client.deidentify_fhir(patient)
session_id = result['sessionId']
print(f"   → Session ID: {session_id}")
print(f"   → PHI fields removed: {len(result.get('detectedPHI', []))}")

# Step 3: Share with AI for analysis (SAFE!)
print("\n3. ✅ SAFE to analyze with AI")
print("   → Ask Claude to analyze medication patterns")
print("   → Ask Claude to identify trends")
print("   → Generate visualizations")

# Step 4: Generate insights
print("\n4. AI generates insights (no PHI exposed)")
print("   → Most common medication: Metformin (80%)")
print("   → Average HbA1c: 7.8%")
print("   → Age range: 35-65 years")

# Step 5: Re-identify for follow-up (when needed)
print("\n5. Re-identify specific patients for follow-up")
reidentified = client.reidentify_fhir(result['deidentifiedResource'], session_id)
print(f"   → Contact patient: {reidentified['reidentifiedResource']['name'][0]['text']}")
print(f"   → Email: {reidentified['reidentifiedResource']['telecom'][2]['value']}")

# Step 6: Publish research (de-identified data only)
print("\n6. Publish research with de-identified dataset")
print("   → Dataset is HIPAA-compliant")
print("   → Safe for public sharing")

print("\n" + "="*60)
print("This workflow enables HIPAA-compliant AI-assisted research!")
print("="*60)

---
## Part 9: MCP vs Direct API - When to Use Each

### Use MCP + Claude Code When:
✅ Exploratory data analysis
✅ Need AI help with code generation
✅ Complex analysis requiring multiple steps
✅ Learning and experimentation
✅ Interactive workflows

**Example:**
```
Ask Claude: "Analyze this patient cohort, identify outliers,
create visualizations, and suggest next steps for research"
```

### Use Direct API When:
✅ Production ETL pipelines
✅ Automated batch processing
✅ Backend services
✅ Scheduled jobs
✅ High-volume processing

**Example:**
```python
# Automated nightly de-identification
for patient in db.get_new_patients():
    result = client.deidentify_fhir(patient)
    save_to_research_database(result)
```

### Best of Both Worlds
Use MCP for exploration → Convert to direct API calls for production!

---
## 🎉 You've Mastered MCP-Powered Workflows!

### What You Learned:
- ✅ What MCP is and how it works
- ✅ How to use Claude Code with MedScrub MCP tools
- ✅ AI-assisted de-identification
- ✅ AI-assisted data analysis
- ✅ Multi-patient cohort workflows
- ✅ When to use MCP vs Direct API

### Key Takeaway:
**MedScrub + MCP + Claude Code = Safe AI-assisted healthcare data analysis**

This is the **only FHIR de-identification tool** with native MCP integration, giving you a unique competitive advantage.

## Next Steps

1. **Explore all FHIR resources:** [03_fhir_resources.ipynb](./03_fhir_resources.ipynb)
2. **Real data science project:** [04_data_science_workflow.ipynb](./04_data_science_workflow.ipynb)
3. **Hackathon demo:** [05_mcp_demo_script.ipynb](./05_mcp_demo_script.ipynb)
4. **Synthea FHIR integration:** Combine with Synthea FHIR MCP for 117 realistic patients

## Resources

- **MCP Setup Guide:** [SETUP.md](../SETUP.md)
- **MCP Server Docs:** [@medscrub/mcp on npm](https://www.npmjs.com/package/@medscrub/mcp)
- **Demo Commands:** [MCP Demo Examples](https://medscrub.dev/docs/mcp)
- **MCP Protocol:** [modelcontextprotocol.io](https://modelcontextprotocol.io)

---

**Happy AI-assisted healthcare data analysis! 🏥🤖**