# Criteria Validation Testing Notebook

This notebook demonstrates how to use the criteria validation module for healthcare/insurance prior authorization validation.

The criteria validation module:
1. Processes user history documents from S3
2. Validates them against configurable criteria questions
3. Generates recommendations (Pass/Fail/Information Not Found)
4. Supports async processing with rate limiting
5. Tracks costs and metering data

> **Note**: This notebook uses AWS services including S3 and Bedrock. You need valid AWS credentials with appropriate permissions.

## 1. Install Dependencies

In [None]:
# Auto-reload modules
%load_ext autoreload
%autoreload 2

ROOTDIR="../"

# First uninstall existing package
%pip uninstall -y idp_common

# Install the IDP common package including criteria validation
%pip install -q -e "{ROOTDIR}/lib/idp_common_pkg[all]"

# Install additional dependencies including nest_asyncio for Jupyter async support
%pip install -q pydantic nest_asyncio

# Check installed version
%pip show idp_common | grep -E "Version|Location"

## 2. Import Libraries and Set Up Environment

In [None]:
import os
import json
import time
import boto3
import logging
import datetime
import asyncio
from typing import Dict, Any

# Fix for Jupyter async event loop conflicts
import nest_asyncio
nest_asyncio.apply()

# Import criteria validation module
from idp_common.criteria_validation import CriteriaValidationService, CriteriaValidationResult

# Configure logging
logging.basicConfig(level=logging.INFO)
logging.getLogger('idp_common.criteria_validation').setLevel(logging.DEBUG)
logging.getLogger('idp_common.bedrock').setLevel(logging.INFO)

# Set environment variables
os.environ['METRIC_NAMESPACE'] = 'IDP-CriteriaValidation-Test'
os.environ['AWS_REGION'] = boto3.session.Session().region_name or 'us-east-1'

# Get AWS account ID for unique bucket names
sts_client = boto3.client('sts')
account_id = sts_client.get_caller_identity()["Account"]
region = os.environ['AWS_REGION']

# Create unique bucket names
user_history_bucket = os.getenv("CRITERIA_USER_HISTORY_BUCKET", f"criteria-validation-user-history-{account_id}-{region}")
criteria_bucket = os.getenv("CRITERIA_BUCKET", f"criteria-validation-criteria-{account_id}-{region}")
output_bucket = os.getenv("CRITERIA_OUTPUT_BUCKET", f"criteria-validation-output-{account_id}-{region}")

print("Environment setup:")
print(f"AWS_REGION: {os.environ.get('AWS_REGION')}")
print(f"User History bucket: {user_history_bucket}")
print(f"Criteria bucket: {criteria_bucket}")
print(f"Output bucket: {output_bucket}")
print("\n✅ Async event loop patched for Jupyter compatibility")

## 3. Set Up S3 Buckets and Sample Data

In [None]:
# Create S3 client
s3_client = boto3.client('s3')

# Function to create bucket if it doesn't exist
def ensure_bucket_exists(bucket_name):
    try:
        s3_client.head_bucket(Bucket=bucket_name)
        print(f"Bucket {bucket_name} already exists")
    except Exception:
        try:
            if region == 'us-east-1':
                s3_client.create_bucket(Bucket=bucket_name)
            else:
                s3_client.create_bucket(
                    Bucket=bucket_name,
                    CreateBucketConfiguration={'LocationConstraint': region}
                )
            print(f"Created bucket: {bucket_name}")
            
            # Wait for bucket to be accessible
            waiter = s3_client.get_waiter('bucket_exists')
            waiter.wait(Bucket=bucket_name)
        except Exception as e:
            print(f"Error creating bucket {bucket_name}: {str(e)}")
            raise

# Ensure all buckets exist
ensure_bucket_exists(user_history_bucket)
ensure_bucket_exists(criteria_bucket)
ensure_bucket_exists(output_bucket)

## 4. Create Sample Data

In [None]:
# Create sample user history
sample_user_history = """Patient: John Doe
Date: 2024-01-15

Medical History:
The patient has been diagnosed with rheumatoid arthritis (RA) and has failed treatment with methotrexate and two TNF inhibitors. 
The treating physician, Dr. Sarah Johnson, has recommended starting immunotherapy with infliximab.

Treatment Plan:
- Infliximab will be administered at the infusion center under direct supervision of trained medical staff
- The facility is equipped with emergency response equipment including epinephrine for anaphylaxis treatment
- Initial dose: 3 mg/kg at 0, 2, and 6 weeks, then every 8 weeks
- Pre-medication with antihistamines and corticosteroids as per protocol

Facility Information:
The treatment will be provided at Memorial Hospital Infusion Center, which has 24/7 emergency support and trained nursing staff.
"""

# Create sample criteria
sample_criteria = {
    "criteria": [
        "Will the immunotherapy be administered under the supervision of an appropriately trained physician?",
        "Is the facility equipped to treat anaphylaxis?",
        "Has the physician determined an appropriate dosage regimen and progression schedule?",
        "Are there adequate safety protocols in place for infusion reactions?"
    ]
}

# Upload sample data
request_id = "TEST-" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
request_prefix = "Prior-Auth"

# Upload user history
user_history_key = f"{request_prefix}-{request_id}/extracted_text/patient_history.txt"
s3_client.put_object(
    Bucket=user_history_bucket,
    Key=user_history_key,
    Body=sample_user_history.encode('utf-8')
)
print(f"Uploaded user history to: s3://{user_history_bucket}/{user_history_key}")

# Upload criteria
criteria_key = "administration_requirements.json"
s3_client.put_object(
    Bucket=criteria_bucket,
    Key=criteria_key,
    Body=json.dumps(sample_criteria).encode('utf-8')
)
print(f"Uploaded criteria to: s3://{criteria_bucket}/{criteria_key}")

## 5. Configure Criteria Validation Service

In [None]:
# Configuration for criteria validation
validation_config = {
    # Model configuration
    "model_id": "us.anthropic.claude-3-5-sonnet-20240620-v1:0",
    "temperature": 0.0,
    "top_k": 20,
    "top_p": 0.01,
    "max_tokens": 4096,
    
    # Bucket configuration
    "request_bucket": user_history_bucket,
    "request_history_prefix": request_prefix,
    "criteria_bucket": criteria_bucket,
    "output_bucket": output_bucket,
    
    # Criteria types to validate
    "criteria_types": ["administration_requirements"],
    
    # Recommendation options
    "recommendation_options": """Pass: The requirement criteria are fully met.
Fail: The requirement is partially met or requires additional information.
Information Not Found: No relevant data exists in the user history.""",
    
    # Prompts from original system
    "system_prompt": """You are a specialized insurance evaluator tasked with determining the eligibility of insurance coverage based on a patient's user history and a set of criterias. 
Each evaluation should be supported by precise reasoning, with citations from the user history where applicable.""",
    
    "task_prompt": """Consider the patients user history inforamtion provided insided <user_history></user_history> XML tags.

<user_history>
 <source_filepath>
 {source_filepath}
 </source_filepath>

 <content>
 {content}
 </content>
</user_history>

<criteria>
<criteria_type>
{criteria_type}
</criteria_type>

<question>
{question}
</question>
</criteria>

<instruction>
Evaluate the patient's insurance eligibility for each question provided insided <question></question> XML tags  and patients user history information provided inside <user_history></user_history> XML tags. 

Your Task:

For each question, provide:

Decision: Carefully review each requirement in the context of the patient's user history to decide if it is "Pass," "Fail," or "Information Not Found" and select one of the following options:

{recommendation_options}

Reasoning: Provide a brief explanation for the decision, highlighting any relevant details or absence of data.
Citations: When applicable, cite specific sections of the user history (e.g., page numbers, sections, S3 URI) that support your decision.

Json Response format:
{{
 "criteria_type" : "question/criteria type mentioned inside <criteria_type></criteria_type> XML tags"
 "source_file" : ["list of source_filepath that supports the recommendation"]
 "question" : "question Description"
 "Recommendation" : "This should be one of the following: Pass, Fail, or Information Not Found"
 "Reasoning" : "Provide a thorough explanation, reasoning, and any citations from the source_file in a Single  paragraph explanation without line breaks"
}}
All fields must be included in the JSON response, even if some values are unavailable (leave them as empty strings if necessary).
Ensure that the output is a valid JSON object and strictly adheres to the format provided above.
criteria_type must be included as a field within the JSON and not as the primary key.
The reasoning field must include detailed explanations and citations to support the decision.

</instruction>

Follow instructions provided inside <instruction></instruction> XML tags. 
Provide the output in a Json format inside <response></response> XML tags. Do not include any space after <response> tag and before </response> tag.""",
    
    # Async processing configuration
    "criteria_validation": {
        "semaphore": 3,  # Concurrent request limit
        "max_chunk_size": 180000,  # Max tokens per chunk
        "token_size": 4,  # Average chars per token
        "overlap_percentage": 10,  # Chunk overlap %
    }
}

print("Configuration created for criteria validation")

## 6. Run Criteria Validation (Fixed for Jupyter)

In [None]:
# Initialize the service
criteria_service = CriteriaValidationService(
    region=region,
    config=validation_config
)

print(f"\nValidating request: {request_id}")
print("This may take a few moments...")

# Run validation - now works in Jupyter thanks to nest_asyncio
start_time = time.time()
result = criteria_service.validate_request(
    request_id=request_id,
    config=validation_config
)
validation_time = time.time() - start_time

print(f"\nValidation completed in {validation_time:.2f} seconds")
print(f"Request ID: {result.request_id}")
print(f"Criteria Type: {result.criteria_type}")

## 7. Display Results

In [None]:
# Display validation responses
print("\n=== Validation Results ===")

# Helper function to parse S3 URIs
def parse_s3_uri(uri):
    parts = uri.replace("s3://", "").split("/")
    bucket = parts[0]
    key = "/".join(parts[1:])
    return bucket, key

# Read results from S3
if result.metadata and 'output_uris' in result.metadata:
    for output_uri in result.metadata['output_uris']:
        print(f"\nReading results from: {output_uri}")
        
        # Parse S3 URI and read content
        bucket, key = parse_s3_uri(output_uri)
        response = s3_client.get_object(Bucket=bucket, Key=key)
        content = response['Body'].read().decode('utf-8')
        responses = json.loads(content)
        
        # Display each validation response
        for idx, response_item in enumerate(responses):
            print(f"\n--- Criteria {idx + 1} ---")
            print(f"Question: {response_item.get('question', 'N/A')}")
            print(f"Recommendation: {response_item.get('Recommendation', 'N/A')}")
            print(f"Reasoning: {response_item.get('Reasoning', 'N/A')}")
            print(f"Source Files: {response_item.get('source_file', [])}")
else:
    print("No validation results found in metadata")

## 8. Display Metering and Cost Information

In [None]:
# Display metering information with support for nested model-specific structure
print("\n=== Token Usage ===")
if result.metering:
    # First try the old flat structure for backward compatibility
    if 'total_input_tokens' in result.metering:
        print(f"Total Input Tokens: {result.metering.get('total_input_tokens', 0):,}")
        print(f"Total Output Tokens: {result.metering.get('total_output_tokens', 0):,}")
    else:
        # Handle new nested structure: {model_key: {inputTokens: X, outputTokens: Y, totalTokens: Z}}
        total_input_tokens = 0
        total_output_tokens = 0
        total_tokens = 0
        
        print("\nPer-Model Token Usage:")
        for model_key, usage in result.metering.items():
            if isinstance(usage, dict) and ('inputTokens' in usage or 'outputTokens' in usage):
                input_tokens = usage.get('inputTokens', 0)
                output_tokens = usage.get('outputTokens', 0)
                model_total = usage.get('totalTokens', input_tokens + output_tokens)
                
                # Extract model name from the key for cleaner display
                model_name = model_key.split('/')[-1] if '/' in model_key else model_key
                print(f"  {model_name}:")
                print(f"    Input Tokens: {input_tokens:,}")
                print(f"    Output Tokens: {output_tokens:,}")
                print(f"    Total Tokens: {model_total:,}")
                
                # Add to totals
                total_input_tokens += input_tokens
                total_output_tokens += output_tokens
                total_tokens += model_total
        
        print(f"\nTotal Across All Models:")
        print(f"  Total Input Tokens: {total_input_tokens:,}")
        print(f"  Total Output Tokens: {total_output_tokens:,}")
        print(f"  Grand Total Tokens: {total_tokens:,}")
    
    # Display per-criteria usage if available (legacy structure)
    criteria_usage = result.metering.get('criteria_usage', {})
    if criteria_usage:
        print("\nPer-Criteria Usage:")
        for criteria_type, usage in criteria_usage.items():
            print(f"  {criteria_type}:")
            print(f"    Input Tokens: {usage.get('input_tokens', 0):,}")
            print(f"    Output Tokens: {usage.get('output_tokens', 0):,}")
else:
    print("No metering data available")

# Display detailed metering structure for debugging (optional)
print("\n=== Debug: Raw Metering Data Structure ===")
if result.metering:
    print(json.dumps(result.metering, indent=2))
else:
    print("No raw metering data to display")

# Display timing information
print("\n=== Timing Information ===")
if result.metadata and 'timing' in result.metadata:
    timing = result.metadata['timing']
    print(f"Total Duration: {timing.get('total_duration', 0):.2f} seconds")
    
    # Display per-criteria timing
    criteria_timing = timing.get('criteria_processing_time', [])
    if criteria_timing:
        print("\nPer-Criteria Processing Time:")
        for item in criteria_timing:
            print(f"  {item['criteria_type']}: {item['duration']:.2f} seconds")
else:
    print("No timing data available")

## 10. Clean Up (Optional)

In [None]:
# Function to delete objects in a bucket
def delete_bucket_objects(bucket_name):
    try:
        # List all objects in the bucket
        response = s3_client.list_objects_v2(Bucket=bucket_name)
        if 'Contents' in response:
            delete_keys = {'Objects': [{'Key': obj['Key']} for obj in response['Contents']]}
            s3_client.delete_objects(Bucket=bucket_name, Delete=delete_keys)
            print(f"Deleted all objects in bucket {bucket_name}")
        else:
            print(f"Bucket {bucket_name} is already empty")
            
        # Delete bucket
        s3_client.delete_bucket(Bucket=bucket_name)
        print(f"Deleted bucket {bucket_name}")
    except Exception as e:
        print(f"Error cleaning up bucket {bucket_name}: {str(e)}")

# Uncomment the following lines to delete the buckets
# print("Cleaning up resources...")
# delete_bucket_objects(user_history_bucket)
# delete_bucket_objects(criteria_bucket)
# delete_bucket_objects(output_bucket)
# print("Cleanup complete")

print("\n✅ Notebook completed successfully!")
print("Uncomment the cleanup section above to delete the test S3 buckets.")

## Conclusion

This notebook demonstrates the criteria validation module capabilities:

1. **Async Processing** - Concurrent evaluation of multiple criteria questions
2. **Rate Limiting** - Built-in semaphore control for API rate limits
3. **Chunking** - Automatic text chunking for large documents
4. **Cost Tracking** - Comprehensive token usage and metering
5. **Pydantic Validation** - Strong data validation for inputs/outputs
6. **S3 Integration** - Seamless reading/writing of validation data
7. **Jupyter Compatibility** - Fixed async event loop conflicts with nest_asyncio

Key benefits:
- **Scalability** - Process multiple criteria types concurrently
- **Reliability** - Built-in error handling and retry logic
- **Consistency** - Uses common bedrock client for standardized LLM interactions
- **Flexibility** - Configurable prompts and criteria types
- **Traceability** - Complete audit trail with source file citations

The module is designed for healthcare/insurance prior authorization validation but can be adapted for other business rule validation use cases.
