# Document Extraction Testing Framework

This notebook uses the new YAML-based testing framework for document extraction validation.

## Features:
- **YAML-only configuration**: Everything controlled from `config/test_config.yaml`
- **Real Bedrock integration**: Uses production-grade shared functions
- **3 test cases**: Field accuracy, blank detection, count validation
- **Multiple document types**: CECRL, CERL, RUT, RUB, ACC (when available)
- **Hierarchical controls**: Enable/disable at all levels

## Usage:
- **No code changes needed**: Modify only `config/test_config.yaml`
- **Run any test combination**: Controlled by YAML configuration
- **Compare prompt versions**: Specify versions to test in YAML

In [None]:
# =============================================================================
# NEW YAML-BASED FRAMEWORK SETUP
# =============================================================================

import sys
sys.path.append('src')

from test_manager import TestManager
import logging

# Setup logging
logging.basicConfig(level=logging.INFO)

print("🚀 Loading new YAML-based testing framework...")
print("📁 Configuration file: config/test_config.yaml")
print("🔧 No hardcoded test data - everything in YAML!")

# Initialize test manager - everything loads from YAML
tm = TestManager("config/test_config.yaml")
print("✅ Test Manager initialized with YAML configuration")

## 📋 Current Configuration

See what tests are enabled and will be executed:

In [None]:
# Show execution plan - what will actually run
print("📋 Current Execution Plan:")
tm.show_execution_plan()

print("\n📊 Enabled Test Summary:")
enabled_tests = tm.get_enabled_tests()
for test_case, documents in enabled_tests.items():
    print(f"  {test_case}: {len(documents)} documents")
    for doc in documents[:3]:  # Show first 3
        print(f"    - {doc}")
    if len(documents) > 3:
        print(f"    ... and {len(documents) - 3} more")

## 🧪 Run Field Accuracy Test

Test specific field values that should be correct (main test case):

In [None]:
# Run field accuracy test - main test case
print("🧪 Running Field Accuracy Test...")
print("🎯 Tests: nationality fixes, name field corrections, field accuracy\n")

results = tm.run_test_case("field_accuracy_test")
tm.show_results_summary(results)

## 🚀 Run All Enabled Tests

Run all enabled test cases (controlled by YAML configuration):

In [None]:
# Run all enabled tests
print("🚀 Running All Enabled Tests...")
print("📊 Testing multiple prompt versions, documents, and validation scenarios\n")

all_results = tm.run_all_enabled()
tm.show_results_summary(all_results)

## ⚙️ Configuration Examples

The framework is controlled entirely by `config/test_config.yaml`. Here are common configuration patterns:

In [None]:
# Display common configuration examples
print("⚙️ Common Configuration Patterns:")
print("="*50)

print("\n1️⃣ Test Only CECRL Documents:")
print("""
settings:
  CECRL: true
  CERL: false
  RUT: false
""")

print("\n2️⃣ Compare Specific Prompt Versions:")
print("""
test_cases:
  field_accuracy_test:
    prompts_to_test: ["v2.1.0", "v2.2.1"]
""")

print("\n3️⃣ Focus on Specific Test Case:")
print("""
categories:
  field_accuracy:
    enabled: true
  blank_detection:
    enabled: false
  count_validation:
    enabled: false
""")

print("\n4️⃣ Test Single Document:")
print("""
documents:
  us_passport_venezuela:
    enabled: true
  others:
    enabled: false
""")

print("\n📝 To change configuration: Edit config/test_config.yaml and re-run cells!")
print("🔄 No notebook code changes needed - everything is in YAML!")

## 📊 Test Case Details

The framework supports 3 core test cases:

# Show detailed information about test cases
print("📊 Framework Test Cases:")
print("="*40)

print("\n1️⃣ Field Accuracy Test:")
print("   🎯 Purpose: Test wrong/incomplete/swapped field data")
print("   ✅ Validation: Compare extracted vs expected specific values")
print("   📋 Example: nationality should be 'Venezuela' not 'Estados Unidos'")

print("\n2️⃣ Blank Detection Test:")
print("   🎯 Purpose: Document has data but model returns blank/empty")
print("   ✅ Validation: Uses schema to check minimum required fields extracted")
print("   📋 Example: Should extract firstName but model returns empty")

print("\n3️⃣ Count Validation Test:")
print("   🎯 Purpose: Should find N entities but finds different count")
print("   ✅ Validation: Count items in arrays (like relatedParties)")
print("   📋 Example: Should find 5 people, found 3")

print("\n🔧 Current Status:")
config = tm.config_loader
categories = config.get_categories()
for cat_name, cat_info in categories.items():
    status = "✅ ENABLED" if cat_info.get('enabled', False) else "❌ DISABLED"
    print(f"   {cat_name}: {status}")

print("\n📝 To enable/disable: Modify categories section in test_config.yaml")

## 🎯 Framework Features

Key advantages of the new YAML-based approach:

# Show framework capabilities and status
print("🎯 Framework Features & Status:")
print("="*45)

print("\n✅ YAML-Only Configuration")
print("   📁 File: config/test_config.yaml")
print("   🔧 No hardcoded test data in notebooks")

print("\n✅ Real Bedrock Integration")
print("   🤖 Model: us.amazon.nova-pro-v1:0")
print("   📡 Uses production shared functions")
print("   💾 Real S3 document downloads")

print("\n✅ Hierarchical Enable/Disable")
print("   🏢 Document types: CECRL, CERL, RUT, RUB, ACC")
print("   📂 Categories: field_accuracy, blank_detection, count_validation")
print("   📄 Individual documents and test cases")

print("\n✅ Multi-Version Prompt Testing")
settings = tm.config_loader.get_settings()
test_cases = tm.config_loader.get_test_cases()
versions = test_cases.get('field_accuracy_test', {}).get('prompts_to_test', [])
print(f"   📝 Available versions: {versions}")

print("\n✅ Schema-Based Validation")
print("   🔍 Field accuracy: Compare extracted vs expected values")
print("   🚫 Blank detection: Check required fields extracted")  
print("   🔢 Count validation: Verify entity counts")

print("\n📊 Current Configuration:")
settings = tm.config_loader.get_settings()
for doc_type in ['CECRL', 'CERL', 'RUT', 'RUB', 'ACC']:
    status = "✅" if settings.get(doc_type, False) else "❌"
    print(f"   {status} {doc_type}")

print("\n🎮 Next Steps:")
print("   1. Modify config/test_config.yaml for different scenarios")
print("   2. Re-run notebook cells to test changes")
print("   3. Add new document types when test data available")
print("   4. No code changes needed - pure YAML configuration!")

In [None]:
# Load latest comparison report
report_path = COMPARISON_DIR / "comparison_report.json"
if report_path.exists():
    with open(report_path) as f:
        report = json.load(f)

    print_summary(report)
else:
    print("❌ No comparison report found. Run tests first.")

## 🔍 Detailed Field Analysis

Examine specific field changes in detail:

In [None]:
# Detailed analysis of field changes
def show_detailed_changes(document_key: str):
    """Show detailed field changes for a specific document."""
    report_path = COMPARISON_DIR / "comparison_report.json"
    if not report_path.exists():
        print("❌ No comparison report found. Run tests first.")
        return

    with open(report_path) as f:
        report = json.load(f)

    if document_key not in report["document_comparisons"]:
        print(f"❌ Document '{document_key}' not found in report")
        return

    comparison = report["document_comparisons"][document_key]

    print(f"\n🔍 Detailed Analysis: {comparison['document_name']}")
    print("="*50)

    print(f"📄 Description: {comparison['description']}")
    print(f"🎯 Expected fixes: {comparison['expected_fixes']}")

    print("\n📊 Field-by-Field Comparison:")
    print("-"*30)

    for field, change_data in comparison["field_changes"].items():
        if change_data.get("changed", False):
            print(f"🔄 {field}:")
            print(f"   v2.0.0: '{change_data['old_value']}'")
            print(f"   v2.1.0: '{change_data['new_value']}'")
            print()
        else:
            print(f"✅ {field}: '{change_data['value']}' (unchanged)")

# Example usage:
show_detailed_changes("us_passport_venezuela")

## 📁 File Outputs

All test results are saved to:

- `outputs/batch_results.json` - Raw test results
- `outputs/comparison/comparison_report.json` - Formatted comparison report  
- `outputs/before/` - v2.0.0 results by document
- `outputs/after/` - v2.1.0 results by document
- `outputs/test_log.txt` - Execution logs

## 🔧 Adding New Test Documents

To test additional documents, modify `TEST_DOCUMENTS` in `config.py`:

```python
TEST_DOCUMENTS["new_document"] = {
    "name": "Description",
    "description": "What should be fixed",
    "s3_path": "s3://bucket/path/file.pdf", 
    "expected_fixes": {"field": "expected_value"}
}
```