<img src="https://raw.githubusercontent.com/imohealth/solution-engineering/refs/heads/updates/Ambient%20AI%20Solution/PythonNotebooks/static/imo_health.png?token=GHSAT0AAAAAADSTDGZJSTP4JZ4FRZXRUZUW2LYZ3WA" alt="IMO Health Logo" width="300"/>

---

## Setup and Configuration

Import libraries and load the SOAP note from Step 1.

In [13]:
import sys
import os

# Add parent directory to path
sys.path.append(os.path.dirname(os.path.abspath('')))

import json
import requests
from typing import Dict, List, Any
from datetime import datetime

# Import configuration
import config

print("✓ Libraries imported successfully")

## Load SOAP Note from Step 1

Load the SOAP note generated in the previous step.

In [14]:
# Load SOAP note from Step 1
soap_file = 'soap_note_output.json'

try:
    with open(soap_file, 'r') as f:
        soap_note = json.load(f)
    
    print("✓ SOAP note loaded successfully")
    print(f"\nSOAP Note Sections:")
    print(f"  Subjective: {len(soap_note['subjective'])} characters")
    print(f"  Objective: {len(soap_note['objective'])} characters")
    print(f"  Assessment: {len(soap_note['assessment'])} characters")
    print(f"  Plan: {len(soap_note['plan'])} characters")
    
except FileNotFoundError:
    print(f"✗ Error: {soap_file} not found")
    print("  Please run Step 1 notebook first to generate the SOAP note")

## Prepare Text for Entity Extraction

We'll extract entities from the **Assessment** and **Plan** sections, as these contain the most clinically relevant information.

In [15]:
# Combine Assessment and Plan sections
assessment_plan_text = soap_note['assessment'] + '\n\n' + soap_note['plan']

print("Text for Entity Extraction:")
print("=" * 80)
print(assessment_plan_text)
print("=" * 80)
print(f"\nTotal length: {len(assessment_plan_text)} characters")

## Configure IMO API Access

Set up authentication for IMO Entity Extraction API.

In [None]:
import time

class IMOAuthenticator:
    """Handle IMO API authentication."""
    
    def __init__(self):
        self.auth_url = config.imo_auth_url if hasattr(config, 'imo_auth_url') else "https://auth.imohealth.com/oauth/token"
        self.client_id = config.imo_entity_extraction_client_id
        self.client_secret = config.imo_entity_extraction_client_secret
        self.client_secret = config.imo_normalize_enrichment_api_client_secret
        self.access_token = None
        self.token_expiry = None
    
    def get_access_token(self):
        """Get or refresh OAuth access token."""
        # Return cached token if still valid
        if self.access_token and self.token_expiry and time.time() < self.token_expiry:
            return self.access_token
        
        print("Requesting new access token from IMO OAuth endpoint...")
        
        headers = {'Content-Type': 'application/json'}
        payload = {
            'grant_type': 'client_credentials',
            'client_id': self.client_id,
            'client_secret': self.client_secret,
            'audience': 'https://api.imohealth.com'
        }
        
        try:
            response = requests.post(self.auth_url, headers=headers, json=payload, timeout=30)
            
            if response.status_code == 200:
                result = response.json()
                self.access_token = result.get('access_token')
                expires_in = result.get('expires_in', 3600)
                self.token_expiry = time.time() + expires_in - 60
                print(f"✓ Access token obtained (expires in {expires_in}s)")
                return self.access_token
            else:
                print(f"✗ OAuth Error: {response.status_code} - {response.text}")
                return None
        except Exception as e:
            print(f"✗ Error getting access token: {str(e)}")
            return None

# Initialize authenticator
authenticator = IMOAuthenticator()
print("✓ IMO Authenticator initialized")

## Extract Entities with IMO API

Call the IMO Entity Extraction API to identify medical entities in the text.

In [None]:
def extract_entities_with_context(text, context_chars=200):
    """
    Extract medical entities from text using IMO Entity Extraction API.
    
    Args:
        text (str): Medical text to analyze
        context_chars (int): Number of characters to capture around each entity
        
    Returns:
        dict: Categorized entities with context
    """
    # Get access token
    access_token = authenticator.get_access_token()
    if not access_token:
        raise Exception("Failed to obtain IMO API access token")
    
    # API endpoint
    entity_extraction_url = config.imo_entity_extraction_url if hasattr(config, 'imo_entity_extraction_url') else "https://api.imohealth.com/entityextraction/pipelines/imo-clinical-comprehensive"
    
    # Prepare request
    headers = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {access_token}'
    }
    
    payload = {'text': text}
    
    print(f"\nCalling IMO Entity Extraction API...")
    print(f"Endpoint: {entity_extraction_url}")
    print(f"Text length: {len(text)} characters")
    
    try:
        response = requests.post(
            entity_extraction_url,
            headers=headers,
            json=payload,
            timeout=30
        )
        
        if response.status_code == 200:
            result = response.json()
            print(f"✓ Successfully extracted entities")
            print(f"  Total entities found: {len(result.get('entities', []))}")
            
            # Parse and categorize entities
            return parse_entities_with_context(result, text, context_chars)
        else:
            print(f"✗ API Error: {response.status_code} - {response.text}")
            raise Exception(f"API returned status {response.status_code}")
            
    except Exception as e:
        print(f"✗ Error calling Entity Extraction API: {str(e)}")
        raise

def extract_context(text, offset, length, context_window=200):
    """
    Extract context around an entity in the text.
    
    Args:
        text (str): Full text
        offset (int): Starting position of entity
        length (int): Length of entity
        context_window (int): Number of characters before/after to include
        
    Returns:
        str: Context string
    """
    if not text:
        return ""
    
    start = max(0, offset - context_window)
    end = min(len(text), offset + length + context_window)
    
    context = text[start:end].strip()
    
    # Add ellipsis if context is truncated
    if start > 0:
        context = "..." + context
    if end < len(text):
        context = context + "..."
    
    return context

def parse_entities_with_context(api_response, original_text, context_chars=200):
    """
    Parse API response and extract context around each entity.
    
    Args:
        api_response (dict): IMO API response
        original_text (str): Original text for context extraction
        context_chars (int): Characters to capture around entity
        
    Returns:
        dict: Categorized entities with context
    """
    entities = {
        'problems': [],
        'procedures': [],
        'medications': [],
        'labs': []
    }
    
    # Ignore generic/administrative terms
    ignore_patterns = [
        'review test results', 'patient education', 'lifestyle',
        'education', 'review', 'follow-up', 'follow up',
        'appointment', 'monitoring', 'discussion', 'counseling',
        'instructions', 'recommendations', 'assessment', 'plan'
    ]
    
    for entity in api_response.get('entities', []):
        # Only include entities with assertion "present"
        assertion = entity.get('assertion', '').lower()
        if assertion != 'present':
            print(f"  Skipping entity '{entity.get('text', '')}' with assertion '{assertion}'")
            continue
        
        # Skip generic entities
        entity_text = entity.get('text', '').lower().strip()
        if any(pattern in entity_text for pattern in ignore_patterns):
            print(f"  Ignoring generic entity: '{entity.get('text', '')}'")
            continue
        
        # Get entity position
        category = entity.get('semantic', '').lower()
        offset = entity.get('begin', 0)
        end_offset = entity.get('end', 0)
        length = end_offset - offset
        
        # Extract context around the entity
        context = extract_context(original_text, offset, length, context_window=context_chars)
        
        # Extract IMO code from codemaps
        imo_code = ''
        imo_description = ''
        confidence = 0.0
        
        if 'codemaps' in entity and 'imo' in entity['codemaps']:
            imo_data = entity['codemaps']['imo']
            imo_code = imo_data.get('lexical_code', '')
            imo_description = imo_data.get('lexical_title', '')
            confidence = float(imo_data.get('confidence', 0.0))
        
        # Create entity record
        entity_record = {
            'text': entity.get('text', ''),
            'code': imo_code,
            'code_system': 'IMO',
            'description': imo_description,
            'offset': offset,
            'length': length,
            'confidence': confidence,
            'context': context,
            'context_length': len(context),
            'entity_id': entity.get('id', ''),
            'semantic': entity.get('semantic', ''),
            'assertion': assertion,
            'codemaps': entity.get('codemaps', {})
        }
        
        # Map categories based on semantic type
        if 'problem' in category or 'condition' in category or 'diagnosis' in category:
            entities['problems'].append(entity_record)
        elif 'procedure' in category:
            entities['procedures'].append(entity_record)
        elif 'medication' in category or 'drug' in category:
            entities['medications'].append(entity_record)
        elif 'lab' in category or 'observation' in category or 'test' in category:
            entities['labs'].append(entity_record)
        else:
            # Default to problems for unknown categories
            print(f"  Unknown category '{category}' for entity '{entity.get('text', '')}'")
            #entities['problems'].append(entity_record)
    
    return entities

# Extract entities
extracted_entities = extract_entities_with_context(assessment_plan_text, context_chars=200)

print("\n" + "=" * 80)
print("ENTITY EXTRACTION SUMMARY")
print("=" * 80)
print(f"Problems: {len(extracted_entities['problems'])}")
print(f"Procedures: {len(extracted_entities['procedures'])}")
print(f"Medications: {len(extracted_entities['medications'])}")
print(f"Labs: {len(extracted_entities['labs'])}")
print(f"\nTotal entities: {sum(len(v) for v in extracted_entities.values())}")

## Display Extracted Entities with Context

Show all extracted entities organized by category, including their contextual information.

In [None]:
def display_entities(entities_dict):
    """Display entities in a readable format."""
    
    for category, entity_list in entities_dict.items():
        if not entity_list:
            continue
        
        print("\n" + "=" * 80)
        print(f"{category.upper()} ({len(entity_list)} entities)")
        print("=" * 80)
        
        for i, entity in enumerate(entity_list, 1):
            print(f"\n{i}. {entity['text']}")
            print(f"   Semantic: {entity.get('semantic', 'N/A')}")
            print(f"   Assertion: {entity['assertion']}")
            print(f"   Confidence: {entity.get('confidence', 0):.2f}")
            
            if entity.get('code'):
                print(f"   IMO Code: {entity['code']}")
            if entity.get('description'):
                print(f"   IMO Description: {entity['description']}")
            
            print(f"\n   Context ({entity['context_length']} chars):")
            print(f"   {entity['context']}")
            
            # Display codemaps if available
            if entity.get('codemaps'):
                print(f"\n   Available Code Systems:")
                for system in entity['codemaps'].keys():
                    print(f"     - {system.upper()}")
            
            print("-" * 80)

# Display all entities
display_entities(extracted_entities)

## Analyze Context Quality

Evaluate the quality and usefulness of extracted contexts.

In [None]:
import statistics

def analyze_context_quality(entities_dict):
    """Analyze the quality of extracted contexts."""
    
    all_contexts = []
    for entity_list in entities_dict.values():
        for entity in entity_list:
            all_contexts.append(entity['context_length'])
    
    if all_contexts:
        print("Context Quality Analysis:")
        print("=" * 80)
        print(f"Total entities with context: {len(all_contexts)}")
        print(f"Average context length: {statistics.mean(all_contexts):.1f} characters")
        print(f"Median context length: {statistics.median(all_contexts):.1f} characters")
        print(f"Min context length: {min(all_contexts)} characters")
        print(f"Max context length: {max(all_contexts)} characters")
        print(f"\nContext captures: ~{statistics.mean(all_contexts)/2:.0f} chars before + ~{statistics.mean(all_contexts)/2:.0f} chars after entity")
    else:
        print("No entities found")

analyze_context_quality(extracted_entities)

## Save Extracted Entities

Save the entities with context to a JSON file for use in Step 3 (Normalization).

In [None]:
# Create output structure
entities_output = {
    'entities': extracted_entities,
    'extraction_metadata': {
        'total_entities': sum(len(v) for v in extracted_entities.values()),
        'problems_count': len(extracted_entities['problems']),
        'procedures_count': len(extracted_entities['procedures']),
        'medications_count': len(extracted_entities['medications']),
        'labs_count': len(extracted_entities['labs']),
        'context_chars': 200,
        'extracted_at': datetime.now().isoformat(),
        'source_text_length': len(assessment_plan_text)
    }
}

# Save to file
output_file = 'extracted_entities_output.json'
with open(output_file, 'w') as f:
    json.dump(entities_output, f, indent=2)

print(f"✓ Extracted entities saved to: {output_file}")
print(f"\nOutput includes:")
print(f"  - {entities_output['extraction_metadata']['total_entities']} entities")
print(f"  - Context for each entity (~200 chars)")
print(f"  - IMO codes and titles")
print(f"  - Standard code mappings (ICD-10, SNOMED, etc.)")

## Summary

### What We Accomplished

1. ✓ Loaded SOAP note from Step 1
2. ✓ Extracted Assessment and Plan sections for entity analysis
3. ✓ Called IMO Entity Extraction API
4. ✓ Categorized entities into problems, procedures, medications, and labs
5. ✓ Captured 200 characters of context around each entity
6. ✓ Filtered out generic/administrative terms
7. ✓ Saved entities with context for normalization

### Context Examples

Context helps with:
- **Clinical relevance**: "chest pain" with context "heavy pressure radiating to left arm" vs. "resolved chest pain"
- **Specificity**: "diabetes" with context "type 2 diabetes mellitus for 8 years"
- **Relationships**: "aspirin" with context "aspirin 325mg chewed immediately"

### Next Steps

The extracted entities will be used in **Step 3: Normalization with Enrichment**, where we'll:
- Normalize entities to standard medical terminologies
- Enrich with additional clinical metadata
- Use IMO Precision Normalize API
- Prepare data for diagnostic specificity workflow

### Key Takeaways

- **IMO Entity Extraction** provides high-accuracy medical entity recognition
- **Context** is crucial for understanding clinical intent
- **Assertion filtering** ensures we focus on present conditions
- **Categorization** organizes entities by clinical domain