# Fiddler Sensitive Information Guardrail Quick Start Guide

## Goal
Learn how to detect and protect sensitive information in text data using Fiddler's Sensitive Information Guardrail, including PII, PHI, and custom entity detection.

## Sensitive Information Guardrail Observability
The [Sensitive Information Guardrail](https://docs.fiddler.ai/product-guide/llm-monitoring/guardrails#fast-personally-identifiable-information-pii-guardrails) provides comprehensive protection against data leakage by detecting and flagging sensitive information across multiple categories. This guardrail helps organizations maintain compliance with privacy regulations like GDPR, HIPAA, and CCPA by identifying personally identifiable information (PII), protected health information (PHI), and custom-defined sensitive entities. The guardrail integrates seamlessly with Fiddler's broader platform, enabling comprehensive monitoring, alerting, and compliance reporting for sensitive information detection across your AI and ML applications and data pipelines.

Key capabilities include real-time detection of 35+ PII entity types, 7 PHI entity types, and user-defined custom entities. The guardrail operates with high performance, using a 0.1 confidence threshold and can process up to 1,024 entities per request, making it suitable for both real-time applications and batch processing workflows.

## About Fiddler
Fiddler is the all-in-one AI Observability and Security platform for responsible AI. Monitoring and analytics capabilities provide a common language, centralized controls, and actionable insights to operationalize production ML models, GenAI, AI agents, and LLM applications with trust. An integral part of the platform, the Fiddler Trust Service provides quality and moderation controls for LLM applications. Powered by cost-effective, task-specific, and scalable Fiddler-developed trust models—including cloud and VPC deployments for secure environments—it delivers the fastest guardrails in the industry. Fortune 500 organizations utilize Fiddler to scale LLM and ML deployments, delivering high-performance AI, reducing costs, and ensuring responsible governance.


### Getting Started

1. Connect to Fiddler
2. Define Helper Functions
3. Example: PII Detection (Default Configuration)
4. Example: PHI Detection for Healthcare Data
5. Example: Combined PII + PHI Detection
6. Example: Custom Entity Detection
7. Batch Processing for Datasets
8. Summary and Best Practices

## 0. Imports


In [None]:
%pip install -q fiddler-client

import json
import pandas as pd
import requests
import time as time
from typing import Any, Dict, List, Optional, Tuple
import fiddler as fdl

print(f"Running Fiddler Python client version {fdl.__version__}")

## 1. Connect to Fiddler

Before you can add information about your model with Fiddler, you'll need to connect using the Fiddler Python client.

---

**We need a couple pieces of information to get started.**
1. The URL you're using to connect to Fiddler
2. Your authorization token

Your authorization token can be found by navigating to the **Credentials** tab on the **Settings** page of your Fiddler environment.

In [None]:
URL = ''  # Make sure to include the full URL (including https:// e.g. 'https://your_company_name.fiddler.ai').
TOKEN = '' 

Constants for this example notebook, change as needed to create your own versions:

In [None]:
# API Configuration for Sensitive Information Guardrail
SENSITIVE_INFORMATION_URL = f"{URL}/v3/guardrails/sensitive-information"
FIDDLER_HEADERS = {
    "Authorization": f"Bearer {TOKEN}",
    "Content-Type": "application/json",
}

# Sample data for testing different entity types
SAMPLE_PII_TEXT = """
I'm John Doe and I live at 1234 Maple Street, Springfield, IL 62704. 
You can reach me at john.doe@email.com or call me at (217) 555-1234.
My social security number is 123-45-6789, and I was born on January 15, 1987.
"""

SAMPLE_PHI_TEXT = """
Patient report: John Smith was prescribed metformin for his diabetes condition.
His health insurance number is HI-987654321, and medical record shows 
serial number MED-2024-001 for his glucose monitor device.
"""

Next we use these credentials to connect to the Fiddler API:

In [None]:
# Validation for Sensitive Information Guardrail
assert TOKEN != "", "Please set your Fiddler API token"
assert URL != "", "Please set your Fiddler base URL"

# Connect to Fiddler
fdl.init(url=URL, token=TOKEN)

print("✅ Connected to Fiddler successfully!")
print(f"📡 Sensitive Information Guardrail URL: {SENSITIVE_INFORMATION_URL}")

## 2. Define Helper Functions

These helper functions will be used throughout the notebook to interact with the Sensitive Information Guardrail API.

In [None]:
def get_sensitive_information_response(
    text: str, 
    entity_categories: str | List[str] = 'PII',
    custom_entities: Optional[List[str]] = None,
) -> Tuple[Dict[str, Any], float]:
    """
    Invokes the Sensitive Information Guardrail with configurable entity detection.
    
    Args:
        text (str): Input text to analyze
        entity_categories (str | List[str]): Detection categories ('PII', 'PHI', 'Custom Entities')
        custom_entities (List[str], optional): Custom entity list when using 'Custom Entities'
    
    Returns:
        Tuple[Dict[str, Any], float]: Full API response and latency in seconds
    """
    # Prepare request payload
    data = {'input': text}
    
    # Add entity configuration if different from default
    if entity_categories != 'PII' or custom_entities:
        data['entity_categories'] = entity_categories
        if custom_entities:
            data['custom_entities'] = custom_entities
    
    start_time = time.monotonic()
    
    try:
        response = requests.post(
            SENSITIVE_INFORMATION_URL,
            headers=FIDDLER_HEADERS,
            json={'data': data},
        )
        response.raise_for_status()
        return response.json(), (time.monotonic() - start_time)
    
    except (requests.exceptions.RequestException, json.JSONDecodeError) as e:
        print(f'❌ API call failed: {e}')
        return {}, (time.monotonic() - start_time)


def extract_entities(api_response: Dict[str, Any]) -> List[Dict[str, Any]]:
    """
    Extract entities from the API response.
    
    Args:
        api_response: Full API response dictionary
        
    Returns:
        List of detected entities with start, end, text, label, score
    """
    return api_response.get('fdl_sensitive_information_scores', [])


def print_detection_results(entities: List[Dict[str, Any]], latency: float, text: str = None):
    """
    Pretty print detection results.
    
    Args:
        entities: List of detected entities
        latency: API call latency in seconds
        text: Original input text (optional, for highlighting)
    """
    print(f"\n🔍 **Detection Results** (⏱️ {latency:.3f}s)")
    print(f"📊 **Total Entities Found:** {len(entities)}\n")
    
    if not entities:
        print("✅ No sensitive information detected.")
        return
    
    # Group by entity type
    by_type = {}
    for entity in entities:
        label = entity.get('label', 'unknown')
        if label not in by_type:
            by_type[label] = []
        by_type[label].append(entity)
    
    # Print grouped results
    for label, group in sorted(by_type.items()):
        print(f"🏷️  **{label.upper()}** ({len(group)} found):")
        for entity in group:
            score = entity.get('score', 0)
            text_span = entity.get('text', '')
            start = entity.get('start', 0)
            end = entity.get('end', 0)
            print(f"   • '{text_span}' (confidence: {score:.3f}, position: {start}-{end})")
        print()

print("✅ Core functions defined successfully!")


## 3. Example: PII Detection (Default Configuration)

In [None]:
# Sample text with various PII types
sample_text = """
I'm John Doe and I live at 1234 Maple Street, Springfield, IL 62704. 
You can reach me at john.doe@email.com or call me at (217) 555-1234.
My social security number is 123-45-6789, and I was born on January 15, 1987.
My credit card number is 4111 1111 1111 1111 with CVV 123.
For official documents, my passport number is X1234567.
"""

print("🧪 **Testing PII Detection (Default Configuration)**")
print("📄 Input Text:")
print(sample_text)

# Call the API with default PII configuration
response, latency = get_sensitive_information_response(sample_text)
entities = extract_entities(response)

# Display results
print_detection_results(entities, latency, sample_text)


## 4. Example: PHI Detection for Healthcare Data

In [None]:
# Sample text with PHI information
healthcare_text = """
Patient report: John Smith was prescribed metformin for his diabetes condition.
His health insurance number is HI-987654321, and medical record shows 
serial number MED-2024-001 for his glucose monitor device.
Birth certificate number is BC-IL-1987-001234.
Current medication includes aspirin and lisinopril for blood pressure management.
"""

print("🏥 **Testing PHI Detection for Healthcare Data**")
print("📄 Input Text:")
print(healthcare_text)

# Call the API with PHI configuration
response, latency = get_sensitive_information_response(
    healthcare_text, entity_categories="PHI"
)
entities = extract_entities(response)

# Display results
print_detection_results(entities, latency, healthcare_text)


## 5. Example: Combined PII + PHI Detection

In [None]:
# Sample text with both PII and PHI
combined_text = """
Patient: Sarah Johnson, DOB: 03/15/1985, SSN: 456-78-9012
Address: 5678 Oak Avenue, Chicago, IL 60611
Contact: sarah.j@email.com, (312) 555-9876
Insurance: Health Plan ID HI-555-2024, Policy BC-CH-1985-5678
Current medications: insulin, metformin
Medical condition: Type 2 diabetes, hypertension
Device serial number: GLU-2024-789
"""

print("🔄 **Testing Combined PII + PHI Detection**")
print("📄 Input Text:")
print(combined_text)

# Call the API with combined configuration
response, latency = get_sensitive_information_response(
    combined_text, 
    entity_categories=['PII', 'PHI']
)
entities = extract_entities(response)

# Display results
print_detection_results(entities, latency, combined_text)


## 6. Example: Custom Entity Detection

In [None]:
# Sample text with custom entities
custom_text = """
Employee ID: EMP-2024-001, Badge Number: BD-789456
Project code: PROJ-AI-2024, Server hostname: srv-prod-01
API key: sk-abc123xyz789, Database connection string: mysql://user:pass@host
Internal ticket: TICK-2024-5678, Customer reference: CUST-VIP-001
"""

# Define custom entities for this organization
custom_entities = [
    'employee id',
    'badge number', 
    'project code',
    'api key',
    'server hostname',
    'database connection',
    'ticket number',
    'customer reference'
]

print("🎯 **Testing Custom Entity Detection**")
print(f"🏷️ Custom entities: {custom_entities}")
print("📄 Input Text:")
print(custom_text)

# Call the API with custom entity configuration
response, latency = get_sensitive_information_response(
    custom_text, 
    entity_categories='Custom Entities',
    custom_entities=custom_entities
)
entities = extract_entities(response)

# Display results
print_detection_results(entities, latency, custom_text)


## 7. Batch Processing for Datasets

For processing multiple texts efficiently, here's a batch processing function:


In [None]:
def process_dataset_batch(
    texts: List[str], 
    entity_categories: str | List[str] = 'PII',
    custom_entities: Optional[List[str]] = None,
    max_records: int = 10
) -> pd.DataFrame:
    """
    Process a batch of texts through the sensitive information guardrail.
    
    Args:
        texts: List of input texts to process
        entity_categories: Entity categories to detect
        custom_entities: Custom entity list if applicable
        max_records: Maximum number of records to process
        
    Returns:
        DataFrame with results including entities, counts, and latency
    """
    results = []
    total_latency = 0.0
    
    print(f"🔄 Processing {min(len(texts), max_records)} texts...")
    
    for i, text in enumerate(texts[:max_records]):
        if not text or not isinstance(text, str):
            continue
            
        print(f"📝 Processing text {i+1}/{min(len(texts), max_records)}...", end=" ")
        
        # Get API response
        response, latency = get_sensitive_information_response(
            text, entity_categories, custom_entities
        )
        entities = extract_entities(response)
        
        # Aggregate by entity type
        entity_counts = {}
        for entity in entities:
            label = entity.get('label', 'unknown')
            entity_counts[label] = entity_counts.get(label, 0) + 1
        
        results.append({
            'text_id': i,
            'text_length': len(text),
            'total_entities': len(entities),
            'entity_counts': entity_counts,
            'entities': entities,
            'latency_seconds': latency,
            'text_preview': text[:100] + '...' if len(text) > 100 else text
        })
        
        total_latency += latency
        print(f"✅ {len(entities)} entities found ({latency:.3f}s)")
    
    df = pd.DataFrame(results)
    
    if len(df) > 0:
        print("\n📊 **Batch Processing Summary:**")
        print(f"   • Processed: {len(df)} texts")
        print(f"   • Total entities: {df['total_entities'].sum()}")
        print(f"   • Average entities per text: {df['total_entities'].mean():.1f}")
        print(f"   • Total latency: {total_latency:.3f}s")
        print(f"   • Average latency: {df['latency_seconds'].mean():.3f}s per text")
    
    return df


# Sample dataset for batch processing
sample_dataset = [
    "Contact John Smith at john.smith@company.com or call (555) 123-4567.",
    "Patient ID: 12345, prescribed metformin for diabetes management.",
    "SSN: 123-45-6789, Credit Card: 4111 1111 1111 1111, CVV: 123",
    "Address: 123 Main St, Anytown, ST 12345. DOB: 01/15/1990",
    "Health insurance: HI-789456, Medical condition: hypertension",
    "API Key: sk-abc123xyz, Employee ID: EMP-001, Badge: BD-789",
    "Email: user@domain.com, Phone: +1-800-555-0199, Website: https://example.com",
    "Passport: X1234567, Driver's License: D987654321, IBAN: GB82WEST12345698765432"
]

print("📊 **Testing Batch Processing with Sample Dataset**")
print(f"📄 Dataset size: {len(sample_dataset)} texts\n")

# Process with PII detection
results_df = process_dataset_batch(
    sample_dataset, 
    entity_categories='PII',
    max_records=10
)


## 8. Summary and Best Practices

### 🎯 Key Takeaways:

1. **Flexible Configuration**: Choose from PII, PHI, or custom entities based on your use case
2. **High Performance**: 0.1 confidence threshold with top-1024 entity filtering
3. **Comprehensive Coverage**: 35 PII + 7 PHI entity types supported
4. **Easy Integration**: Simple API calls with configurable parameters

### 🛡️ Best Practices:

- **Start with defaults**: Use PII detection first, then add PHI/custom as needed
- **Batch processing**: Process multiple texts efficiently for large datasets
- **Error handling**: Implement retry logic for production systems
- **Monitor performance**: Track latency and detection rates
- **Custom entities**: Define organization-specific sensitive data types

### 🔗 Next Steps:

- Integrate into your data pipelines
- Set up monitoring and alerting
- Customize entity lists for your domain
- Implement automated redaction workflows
- Establish compliance reporting

---

**🚀 Happy detecting! Your sensitive information is now protected with Fiddler's advanced guardrails.**
