# üõ°Ô∏è Engineering Trustworthy AI: The Decoupled Shield Pattern

**Red Hat Summit Connect 2025**

## What You'll Learn

Today we're going to solve a critical problem: **How do you prevent AI agents from leaking sensitive data?**

We'll demonstrate:
1. **The Trust Gap** - Why unprotected AI is a compliance nightmare
2. **The Decoupled Shield Pattern** - Architectural safety vs. model behavior
3. **Defense-in-Depth** - Multiple layers of protection
4. **Real-World Impact** - What this means for your organization

Let's start by setting up our environment.

In [None]:
# Install dependencies
!pip install llama-stack-client pandas ipywidgets -q

# Core imports
from llama_stack_client import LlamaStackClient
from llama_stack_client.lib.agents.agent import Agent
from uuid import uuid4
import logging

# Demo helper utilities
try:
    from shield_demo_helpers import (
        ShieldMetrics,
        show_hero_banner,
        show_problem_statement,
        show_attack_surface,
        show_result_card,
        show_comparison_matrix,
        show_compliance_savings,
        create_interactive_tester,
        TEST_PROMPTS,
        SHIELD_CONFIG
    )
except ImportError:
    print("‚ö†Ô∏è Helper file not found. Make sure shield_demo_helpers.py is in the same directory.")
    print("   Demo will continue with basic functionality.")
    # Define fallback functions if needed
    def show_hero_banner(): print("üõ°Ô∏è Engineering Trustworthy AI Demo")
    def show_result_card(t, s, m, d=None): print(f"{t}: {m}")

# Suppress verbose HTTP logging
logging.getLogger("httpx").setLevel(logging.WARNING)
logging.getLogger("llama_stack_client").setLevel(logging.WARNING)

print("‚úÖ Environment ready")

In [None]:
# Display welcome banner
show_hero_banner()

## Configuration

Update these values to match your OpenShift deployment:

In [None]:
# Configure your deployment
LLAMASTACK_URL = "http://llamastack-trustyai-fms-service.summit-connect-2025.svc.cluster.local:8321"
MODEL_NAME = "tinyllama-1b"

# Initialize client and metrics tracker
try:
    client = LlamaStackClient(base_url=LLAMASTACK_URL)
    metrics = ShieldMetrics()
    
    # Quick connectivity test
    models = list(client.models.list())
    
    print(f"‚úÖ Connected to LlamaStack at {LLAMASTACK_URL}")
    print(f"üìä Using model: {MODEL_NAME}")
    print(f"üîç Found {len(models)} available models")
    
except Exception as e:
    print(f"‚ùå Connection failed: {e}")
    print(f"\nTroubleshooting:")
    print(f"1. Verify the LlamaStack service is running:")
    print(f"   oc get pods -n summit-connect-2025 | grep llamastack")
    print(f"2. Check if you need to use the external route instead:")
    print(f"   oc get route llamastack-trustyai-fms -n summit-connect-2025")
    raise

---

## Part 1: The Trust Gap - Why Unprotected AI is Risky

### The Problem

Imagine you've deployed a customer service chatbot. It's helpful, fast, and users love it. But here's the issue:

**What happens when a user accidentally shares their email, SSN, or credit card?**

Without protection:
- ‚ùå PII flows into logs and databases
- ‚ùå Sensitive data may be used for model training
- ‚ùå Compliance violations (GDPR, HIPAA, PCI-DSS)
- ‚ùå No audit trail of what was exposed

Let's see this in action with a **completely unprotected agent**:

In [None]:
# Show what an unprotected system looks like
show_attack_surface({
    'name': '‚ùå Unprotected Agent (Current Reality)',
    'input': False,
    'output': False
})

In [None]:
# Test with a risky prompt containing multiple types of PII
risky_prompt = TEST_PROMPTS["pii_multiple"]

print("üì® User sends this message:")
print(f"   '{risky_prompt['prompt']}'\n")
print("üîç What happens in an unprotected system?\n")

try:
    response = client.chat.completions.create(
        model=MODEL_NAME,
        messages=[
            {"role": "system", "content": "You are a helpful customer service agent."},
            {"role": "user", "content": risky_prompt['prompt']}
        ],
        stream=False
    )
    
    response_text = response.choices[0].message.content
    
    show_result_card(
        "‚ö†Ô∏è PROCESSED WITHOUT VALIDATION",
        "error",
        response_text[:200] + "...",
        """<strong>The Problem:</strong>
        <ul>
            <li>Email, SSN, and credit card were sent to the model</li>
            <li>This data is now in logs, metrics, and possibly training data</li>
            <li>No way to know if PII was exposed</li>
            <li>Potential compliance violation</li>
        </ul>"""
    )
    
    print("\nüíÄ This is the trust gap. Let's fix it.")
    
except Exception as e:
    print(f"‚ùå Error: {e}")

---

## Part 2: The Solution - TrustyAI Shields

### What Are Shields?

Shields are **independent validation services** that check messages for policy violations. Think of them like security checkpoints:

```
User Message ‚Üí [Shield Checkpoint] ‚Üí ‚úÖ/‚ùå ‚Üí Model
```

### Why TrustyAI?

TrustyAI provides:
- **Fast, deterministic detection** - Regex-based PII detection in milliseconds
- **Enterprise-ready** - Runs as a service in OpenShift
- **Configurable** - Add/remove detectors via configuration
- **Observable** - Complete audit trail of all detections

### What We're Detecting

Our shield will catch:
- üìß Email addresses
- üî¢ Social Security Numbers (SSN)
- üí≥ Credit card numbers

Let's register it:

In [None]:
# Register the TrustyAI PII shield
print("üõ°Ô∏è Registering TrustyAI shield...\n")

try:
    client.shields.register(**SHIELD_CONFIG)
    
    show_result_card(
        "‚úÖ Shield Registered Successfully",
        "allowed",
        "TrustyAI PII shield is now active and protecting your agents",
        """<strong>Configuration:</strong>
        <ul>
            <li>Provider: TrustyAI FMS</li>
            <li>Detectors: Email, SSN, Credit Card</li>
            <li>Confidence Threshold: 0.8 (80%)</li>
            <li>Applies to: User messages, System messages, and Model outputs</li>
        </ul>"""
    )
    
    # Verify registration
    shields = list(client.shields.list())
    print(f"\nüìä Total registered shields: {len(shields)}")
    for shield in shields:
        print(f"   - {shield.identifier} ({shield.provider_id})")
    
except Exception as e:
    if "already exists" in str(e).lower():
        show_result_card(
            "‚ÑπÔ∏è Shield Already Registered",
            "allowed",
            "The PII shield is already active - we're ready to go!"
        )
    else:
        print(f"‚ùå Registration failed: {e}")
        print(f"\nPossible causes:")
        print(f"1. TrustyAI service not running")
        print(f"2. Network connectivity issue")
        print(f"3. Configuration mismatch")
        raise

### Quick Shield Test

Before we use the shield in an agent, let's verify it works:

In [None]:
# Test the shield directly
print("üß™ Testing shield with sample inputs...\n")

test_cases = [
    ("Safe message", "How do I reset my password?", False),
    ("Email PII", "My email is test@example.com", True),
    ("SSN PII", "My SSN is 123-45-6789", True),
]

for name, message, should_block in test_cases:
    result = client.safety.run_shield(
        shield_id="pii_shield",
        messages=[{"role": "user", "content": message}],
        params={}
    )
    
    blocked = result.violation is not None
    status = "‚úÖ" if blocked == should_block else "‚ùå"
    
    print(f"{status} {name}: {'BLOCKED' if blocked else 'ALLOWED'} (Expected: {'BLOCK' if should_block else 'ALLOW'})")

print("\n‚úÖ Shield is working correctly!")

---

## Part 3: Input Shields - First Line of Defense

### What is an Input Shield?

An input shield validates **user messages BEFORE they reach the model**. This is your first line of defense.

**Benefits:**
- üõ°Ô∏è Prevents PII from entering logs
- üõ°Ô∏è Stops sensitive data from reaching the model
- üõ°Ô∏è No PII in training data
- üõ°Ô∏è Immediate feedback to users

**Flow:**
```
User Message ‚Üí [INPUT SHIELD] ‚Üí ‚úÖ/‚ùå ‚Üí Model ‚Üí Response
                     ‚Üì
              Blocks PII here
```

In [None]:
# Visualize input shield protection
show_attack_surface({
    'name': 'üõ°Ô∏è Input Shield Only',
    'input': True,
    'output': False
})

# Create agent with input shield
agent_input = Agent(
    client,
    model=MODEL_NAME,
    instructions='You are a helpful customer service agent.',
    input_shields=['pii_shield'],  # üõ°Ô∏è Input validation enabled
    output_shields=[],              # ‚ùå Output validation disabled
    enable_session_persistence=False,
    sampling_params={'max_tokens': 512}
)

print("‚úÖ Agent created with INPUT shield protection")

In [None]:
# Test input shield with PII
print("üß™ Testing INPUT shield with risky message...\n")

risky_message = TEST_PROMPTS["pii_multiple"]["prompt"]
print(f"üì® User sends: {risky_message[:70]}...\n")

session = agent_input.create_session(f"session-{uuid4()}")
response = agent_input.create_turn(
    messages=[{"role": "user", "content": risky_message}],
    session_id=session,
    stream=False
)

# Analyze what happened
print("üîç What happened:\n")
for i, step in enumerate(response.steps, 1):
    if step.step_type == 'shield_call':
        if step.violation:
            print(f"   Step {i}: INPUT Shield ‚Üí ‚õî BLOCKED")
            print(f"            Reason: {step.violation.user_message}")
            
            show_result_card(
                "üõ°Ô∏è INPUT SHIELD BLOCKED THE REQUEST",
                "blocked",
                "PII detected and blocked before reaching the model",
                """<strong>What was protected:</strong>
                <ul>
                    <li>‚úÖ PII never reached the model</li>
                    <li>‚úÖ No sensitive data in logs</li>
                    <li>‚úÖ User notified immediately</li>
                    <li>‚úÖ Compliance requirement met</li>
                </ul>"""
            )
            metrics.record(blocked=True, pii_type="input_multiple")
            break
        else:
            print(f"   Step {i}: INPUT Shield ‚Üí ‚úÖ Passed")
    elif step.step_type == 'inference':
        print(f"   Step {i}: Model Processing ‚Üí ü§ñ (This step was skipped due to shield block)")

In [None]:
# Test with a safe message
print("\nüß™ Now testing with a SAFE message...\n")

safe_message = TEST_PROMPTS["normal"]["prompt"]
print(f"üì® User sends: {safe_message}\n")

session = agent_input.create_session(f"session-{uuid4()}")
response = agent_input.create_turn(
    messages=[{"role": "user", "content": safe_message}],
    session_id=session,
    stream=False
)

print("üîç What happened:\n")
for i, step in enumerate(response.steps, 1):
    if step.step_type == 'shield_call':
        print(f"   Step {i}: INPUT Shield ‚Üí ‚úÖ Passed validation")
    elif step.step_type == 'inference':
        print(f"   Step {i}: Model Processing ‚Üí ü§ñ Generated response")

if hasattr(response, 'output_message') and response.output_message:
    print(f"\nüí¨ Agent Response: {response.output_message.content[:150]}...")
    
    show_result_card(
        "‚úÖ SAFE MESSAGE PROCESSED NORMALLY",
        "allowed",
        "Shield validated the message and allowed it through",
        "Input shields don't block legitimate business interactions - only PII violations."
    )
    metrics.record(blocked=False)

print("\nüéØ Key Point: Input shields protect WITHOUT disrupting normal operations")

---

## Part 4: Output Shields - Second Line of Defense

### What is an Output Shield?

An output shield validates **model responses BEFORE they reach users**. This catches:

- üõ°Ô∏è Models accidentally generating PII
- üõ°Ô∏è Models hallucinating realistic-looking sensitive data
- üõ°Ô∏è Prompt injection attacks that leak data

**Flow:**
```
User Message ‚Üí Model ‚Üí [OUTPUT SHIELD] ‚Üí ‚úÖ/‚ùå ‚Üí User
                              ‚Üì
                       Blocks PII here
```

In [None]:
# Visualize output shield protection
show_attack_surface({
    'name': 'üõ°Ô∏è Output Shield Only',
    'input': False,
    'output': True
})

# Create agent with output shield
agent_output = Agent(
    client,
    model=MODEL_NAME,
    instructions='You are a helpful customer service agent.',
    input_shields=[],               # ‚ùå Input validation disabled
    output_shields=['pii_shield'],  # üõ°Ô∏è Output validation enabled
    enable_session_persistence=False,
    sampling_params={'max_tokens': 512}
)

print("‚úÖ Agent created with OUTPUT shield protection")

In [None]:
# Test output shield
print("üß™ Testing OUTPUT shield...\n")
print("   Asking model to generate test data with PII...\n")

# This prompt might cause the model to generate PII
test_prompt = "Generate 3 example customer records with names, emails, and phone numbers for testing."
print(f"üì® User sends: {test_prompt}\n")

session = agent_output.create_session(f"session-{uuid4()}")
response = agent_output.create_turn(
    messages=[{"role": "user", "content": test_prompt}],
    session_id=session,
    stream=False
)

print("üîç What happened:\n")
for i, step in enumerate(response.steps, 1):
    if step.step_type == 'inference':
        print(f"   Step {i}: Model Processing ‚Üí ü§ñ Generated response")
    elif step.step_type == 'shield_call':
        if step.violation:
            print(f"   Step {i}: OUTPUT Shield ‚Üí ‚õî BLOCKED")
            print(f"            Reason: {step.violation.user_message}")
            
            show_result_card(
                "üõ°Ô∏è OUTPUT SHIELD BLOCKED MODEL RESPONSE",
                "blocked",
                "Model generated PII but it was caught before reaching the user",
                """<strong>What was protected:</strong>
                <ul>
                    <li>‚úÖ Prevented model-generated PII from being exposed</li>
                    <li>‚úÖ User never saw sensitive data</li>
                    <li>‚úÖ Caught hallucinated PII</li>
                    <li>‚úÖ Defense against prompt injection</li>
                </ul>"""
            )
            metrics.record(blocked=True, pii_type="output_generated")
            break
        else:
            print(f"   Step {i}: OUTPUT Shield ‚Üí ‚úÖ Passed")

print("\nüéØ Key Point: Output shields catch what the model generates, not just what users send")

---

## Part 5: Defense-in-Depth - Maximum Protection

### The Complete Solution

The most secure approach uses **BOTH input AND output shields**:

```
User Message ‚Üí [INPUT SHIELD] ‚Üí Model ‚Üí [OUTPUT SHIELD] ‚Üí User
                     ‚Üì                         ‚Üì
              Blocks inbound PII      Blocks outbound PII
```

**Why Both?**
- üîí Complete coverage of the attack surface
- üîí Defense against both user mistakes AND model issues
- üîí Protects against prompt injection attacks
- üîí Multiple layers = multiple chances to catch violations

In [None]:
# Visualize complete protection
show_attack_surface({
    'name': 'üõ°Ô∏èüõ°Ô∏è Both Shields - Maximum Security (Recommended)',
    'input': True,
    'output': True
})

# Create fully protected agent
agent_both = Agent(
    client,
    model=MODEL_NAME,
    instructions='You are a helpful customer service agent.',
    input_shields=['pii_shield'],   # üõ°Ô∏è Input validation
    output_shields=['pii_shield'],  # üõ°Ô∏è Output validation
    enable_session_persistence=False,
    sampling_params={'max_tokens': 512}
)

print("‚úÖ Agent created with COMPLETE shield protection")
print("üîí This is the recommended configuration for production")

---

## Part 6: The Complete Picture

Let's see how all three configurations compare:

In [None]:
show_comparison_matrix()

---

## Part 7: Real-World Impact

Why does this matter for your organization?

In [None]:
show_compliance_savings()

---

## Part 8: Try It Yourself

Now you can test the shield with your own messages:

In [None]:
create_interactive_tester(client, MODEL_NAME)

---

## Summary & Key Takeaways

### What We Demonstrated

1. **The Trust Gap** - Unprotected AI agents are a compliance nightmare
2. **Decoupled Shield Pattern** - Safety as an independent service
3. **Defense-in-Depth** - Multiple layers of protection
4. **Real Impact** - Faster development, better compliance, lower risk

### The Decoupled Shield Advantage

| Traditional Approach | Decoupled Shield Pattern |
|---------------------|-------------------------|
| ‚ùå Safety logic embedded in code | ‚úÖ External shield service |
| ‚ùå Hard to verify and test | ‚úÖ Independently testable |
| ‚ùå Not reusable across agents | ‚úÖ One shield, unlimited agents |
| ‚ùå Code changes to update rules | ‚úÖ Configuration changes only |
| ‚ùå No central audit trail | ‚úÖ Complete observability |
| ‚ùå Vendor lock-in | ‚úÖ Swap providers anytime |

### Next Steps

**Deploy this in your environment:**
```bash
git clone <your-repo>
oc apply -k demo/
```

**Learn more:**
- TrustyAI: https://trustyai.org
- LlamaStack: https://llama-stack.readthedocs.io/
- OpenShift AI: https://www.redhat.com/en/technologies/cloud-computing/openshift/openshift-ai

---

### Remember:

**Trustworthiness is an engineered quality of the system, not a model property.**

*Thank you for attending Red Hat Summit Connect 2025!*

In [None]:
# Display final metrics
print("\n" + "="*60)
print("üìä Demo Session Metrics")
print("="*60)
metrics.display()