# MedScrub Quick Start - API Integration

**Welcome to MedScrub!** This notebook will get you up and running with FHIR de-identification in under 5 minutes.

## What You'll Learn
1. Connect to the MedScrub API
2. De-identify a FHIR Patient resource
3. Understand sessions and tokenization
4. Re-identify data when needed

## Prerequisites
- **JWT Token** from [medscrub.dev/playground](https://medscrub.dev/playground)
- `.env` file with your token (see `.env.example`)

**Time:** 5-10 minutes

---
## Step 1: Setup and Installation

First, let's import the required libraries and load our credentials.

In [None]:
# Import libraries
import os
import json
from dotenv import load_dotenv
from medscrub_client import MedScrubClient, MedScrubError, MedScrubAuthError, MedScrubRateLimitError

# Load environment variables from .env file
load_dotenv()

# Get credentials
jwt_token = os.getenv('MEDSCRUB_JWT_TOKEN')
api_url = os.getenv('MEDSCRUB_API_URL', 'https://api.medscrub.dev')

if not jwt_token:
    raise ValueError(
        "JWT token not found! Please:\n"
        "1. Copy .env.example to .env\n"
        "2. Get your JWT token from https://medscrub.dev/playground\n"
        "3. Add it to .env file"
    )

print("✓ Environment loaded successfully")
print(f"API URL: {api_url}")
print(f"Token: {jwt_token[:20]}..." if jwt_token else "Token: Not found")

---
## Step 2: Initialize MedScrub Client

Create a client instance and test the connection.

In [None]:
# Initialize the MedScrub client
client = MedScrubClient(jwt_token=jwt_token, api_url=api_url)

# Test the connection
try:
    health = client.health_check()
    print("✓ Connected to MedScrub API")
    print(f"Status: {health['status']}")
    print(f"Version: {health['version']}")
except MedScrubAuthError as e:
    print(f"❌ Authentication failed: {e}")
    print("Get a new JWT token from https://medscrub.dev/playground")
except Exception as e:
    print(f"❌ Connection failed: {e}")

---
## Step 3: Create a Sample FHIR Patient Resource

Let's create a FHIR Patient resource with PHI that needs to be de-identified.

In [None]:
# Sample FHIR Patient resource with PHI
patient = {
    "resourceType": "Patient",
    "id": "example-patient-001",
    "name": [
        {
            "use": "official",
            "family": "Smith",
            "given": ["John", "Robert"],
            "text": "John Robert Smith"
        }
    ],
    "gender": "male",
    "birthDate": "1985-03-15",
    "address": [
        {
            "use": "home",
            "line": ["123 Main Street", "Apartment 4B"],
            "city": "Boston",
            "state": "MA",
            "postalCode": "02134",
            "country": "USA"
        }
    ],
    "telecom": [
        {
            "system": "phone",
            "value": "617-555-1234",
            "use": "home"
        },
        {
            "system": "email",
            "value": "john.smith@example.com",
            "use": "home"
        }
    ],
    "identifier": [
        {
            "system": "http://hospital.example.org/patients",
            "value": "MRN-12345"
        }
    ]
}

# Display the original patient resource
print("Original FHIR Patient Resource:")
print(json.dumps(patient, indent=2))

---
## Step 4: De-identify the Patient Resource

Now let's de-identify this patient resource using MedScrub. All PHI will be replaced with reversible tokens.

In [None]:
# De-identify the patient resource
try:
    print("De-identifying FHIR Patient resource...\n")
    result = client.deidentify_fhir(patient)
    
    # Extract results
    deidentified_resource = result['deidentifiedResource']
    session_id = result['sessionId']
    detected_phi = result.get('detectedPHI', [])
    processing_time = result.get('processingTime', 0)
    
    # Display results
    print("✓ De-identification successful!\n")
    print(f"Session ID: {session_id}")
    print(f"PHI fields detected: {len(detected_phi)}")
    print(f"Processing time: {processing_time}ms")
    print(f"Accuracy: 99.9% (FHIR structured data)\n")
    
    # Show which PHI fields were detected
    print("Detected PHI fields:")
    for phi in detected_phi[:5]:  # Show first 5
        print(f"  - {phi.get('fieldPath', 'N/A')}: {phi.get('phiType', 'N/A')}")
    if len(detected_phi) > 5:
        print(f"  ... and {len(detected_phi) - 5} more")
    
except MedScrubRateLimitError as e:
    print(f"❌ Rate limit exceeded: {e}")
    print(f"Retry after: {e.retry_after} seconds")
except MedScrubError as e:
    print(f"❌ De-identification failed: {e}")

---
## Step 5: View the De-identified Resource

Let's see what the de-identified patient looks like. Notice how PHI has been replaced with tokens.

In [None]:
# Display the de-identified resource
print("De-identified FHIR Patient Resource:")
print(json.dumps(deidentified_resource, indent=2))

print("\n" + "="*60)
print("Notice: All PHI fields have been replaced with tokens")
print("Example: 'Smith' → '[FHIR_NAME_xyz123]'")
print("="*60)

---
## Step 6: Understanding Sessions

MedScrub uses **session-based tokenization**:
- Each de-identification creates or reuses a session
- Session ID is used to map tokens back to original PHI
- Sessions expire after 24 hours (default)

Let's query the session to see metadata.

In [None]:
# Get session information
try:
    session_info = client.get_session_info(session_id)
    
    print("Session Information:")
    print(f"  Session ID: {session_info['sessionId']}")
    print(f"  Token count: {session_info['tokenCount']}")
    print(f"  Created: {session_info['createdAt']}")
    print(f"  Expires: {session_info['expiresAt']}")
    print(f"  Hours remaining: {session_info['hoursRemaining']:.1f}h")
    
    if session_info.get('isExpiringSoon'):
        print("\n⚠️ Warning: Session expires soon (< 2 hours remaining)")
        
except MedScrubError as e:
    print(f"❌ Error fetching session: {e}")

---
## Step 7: Re-identify the Patient Resource

When you need the original PHI back (e.g., after AI analysis), use re-identification.

**Use case:** You de-identified data → sent to AI/LLM for analysis → now need to restore original patient details.

In [None]:
# Re-identify the de-identified resource
try:
    print("Re-identifying FHIR Patient resource...\n")
    reidentified = client.reidentify_fhir(deidentified_resource, session_id)
    
    original_resource = reidentified['reidentifiedResource']
    
    print("✓ Re-identification successful!\n")
    print("Original Patient Resource (restored):")
    print(json.dumps(original_resource, indent=2))
    
    # Verify it matches the original
    if original_resource == patient:
        print("\n✓ Perfect match! Re-identification is 100% accurate.")
    else:
        print("\n⚠️ Warning: Re-identified resource differs from original")
        
except MedScrubError as e:
    print(f"❌ Re-identification failed: {e}")

---
## Step 8: Complete Workflow Example

Here's the complete de-identification → analysis → re-identification workflow:

In [None]:
# Complete workflow
print("=" * 60)
print("COMPLETE WORKFLOW: De-identify → Analyze → Re-identify")
print("=" * 60 + "\n")

# 1. Start with PHI
print("1. Original patient: John Robert Smith")
print(f"   Email: {patient['telecom'][1]['value']}")
print(f"   Phone: {patient['telecom'][0]['value']}\n")

# 2. De-identify
print("2. De-identify (remove PHI)")
print(f"   Patient: {deidentified_resource['name'][0]['family']}")
print(f"   Email: {deidentified_resource['telecom'][1]['value']}")
print(f"   Phone: {deidentified_resource['telecom'][0]['value']}\n")

# 3. Safe to analyze with AI
print("3. ✓ SAFE to send to AI/LLM (no PHI exposed)")
print("   → You could now analyze this with Claude, GPT, etc.\n")

# 4. Re-identify when needed
print("4. Re-identify (restore original PHI)")
print(f"   Patient: {original_resource['name'][0]['given'][0]} {original_resource['name'][0]['family']}")
print(f"   Email: {original_resource['telecom'][1]['value']}")
print(f"   Phone: {original_resource['telecom'][0]['value']}\n")

print("=" * 60)
print("This workflow enables HIPAA-compliant AI/LLM usage!")
print("=" * 60)

---
## Step 9: Cleanup (Optional)

Delete the session when you're done to free up resources.

In [None]:
# Delete the session (optional)
try:
    response = client.delete_session(session_id)
    print(f"✓ Session {session_id} deleted successfully")
    print("\nNote: Once deleted, you cannot re-identify data from this session.")
except MedScrubError as e:
    print(f"❌ Error deleting session: {e}")

---
## 🎉 Congratulations!

You've successfully:
- ✓ Connected to the MedScrub API
- ✓ De-identified a FHIR Patient resource
- ✓ Understood session-based tokenization
- ✓ Re-identified the data
- ✓ Completed a full HIPAA-compliant AI workflow

## Next Steps

1. **Try more FHIR resources:** [03_fhir_resources.ipynb](./03_fhir_resources.ipynb)
2. **AI-assisted workflows:** [02_mcp_powered_workflow.ipynb](./02_mcp_powered_workflow.ipynb)
3. **Real data science workflow:** [04_data_science_workflow.ipynb](./04_data_science_workflow.ipynb)
4. **Hackathon demo:** [05_mcp_demo_script.ipynb](./05_mcp_demo_script.ipynb)

## Resources

- **Documentation:** [medscrub.dev/docs](https://medscrub.dev/docs)
- **API Reference:** [medscrub.dev/docs/api](https://medscrub.dev/docs/api)
- **MCP Server:** [@medscrub/mcp on npm](https://www.npmjs.com/package/@medscrub/mcp)
- **Support:** support@medscrub.dev

---

**Happy de-identifying! 🏥**