In [1]:
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI

load_dotenv()

llm = ChatOpenAI(model="gpt-4o")

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Install Guardrails AI
%pip install guardrails-ai


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## What we learned in Notebook 1:
- Rule-based guardrails work but require constant maintenance
- Regex patterns are easy to bypass with variations
- We had to write custom logic for each risk type

## Problems with Rule-Based Approach:
- ‚ùå "john@email.com" gets caught, but "john [at] email [dot] com" doesn't
- ‚ùå Need to maintain growing lists of patterns
- ‚ùå Lots of false positives and false negatives
- ‚ùå Can't handle nuanced cases

## Solution: Guardrails AI
- ‚úÖ Pre-built validators for common risks
- ‚úÖ ML-based detection (more robust)
- ‚úÖ Community-maintained and updated
- ‚úÖ Easy to use and compose

In [3]:
# Run the configure command:
!guardrails configure

Enable anonymous metrics reporting? [Y/n]: ^C


In [None]:
# Now install the validator
!guardrails hub install hub://guardrails/detect_pii

In [4]:
from guardrails import Guard
from guardrails.hub import DetectPII

# Create a guard with PII detection
guard = Guard().use(
    DetectPII(pii_entities=["EMAIL_ADDRESS", "PHONE_NUMBER"])
)

print("‚úÖ Guard created with DetectPII validator")



‚úÖ Guard created with DetectPII validator


In [5]:
# Test 1: Text with PII (should be caught)
text_with_pii = "My email is john@example.com and my phone is 555-123-4567"

print("Testing text with PII:")
print(f"Input: {text_with_pii}\n")

try:
    result = guard.validate(text_with_pii)
    print(f"‚úÖ Validation passed")
    print(f"Output: {result.validated_output}")
except Exception as e:
    print(f"üö´ Validation failed!")
    print(f"Error: {str(e)}")

Testing text with PII:
Input: My email is john@example.com and my phone is 555-123-4567

üö´ Validation failed!
Error: Validation failed for field with errors: The following text in your response contains PII:
My email is john@example.com and my phone is 555-123-4567




In [6]:
# Test 2: Text without PII (should pass)
text_without_pii = "I love learning about AI and guardrails!"

print("\nTesting text without PII:")
print(f"Input: {text_without_pii}\n")

try:
    result = guard.validate(text_without_pii)
    print(f"‚úÖ Validation passed")
    print(f"Output: {result.validated_output}")
except Exception as e:
    print(f"üö´ Validation failed!")
    print(f"Error: {str(e)}")


Testing text without PII:
Input: I love learning about AI and guardrails!

‚úÖ Validation passed
Output: I love learning about AI and guardrails!


## Why is DetectPII Better Than Our Regex?

Remember our simple regex from Notebook 1? Let's see how DetectPII handles variations that would bypass regex.

In [7]:
# These variations would BYPASS our simple regex from Notebook 1
standard_pii_cases = [
    "Email me at john.doe@company.com",
    "Call 555-123-4567 for more info",
    "My phone number is (555) 123-4567",
    "Contact: jane_smith@example.org",
    "SSN: 123-45-6789",  # Let's see if it catches this
]

print("Testing tricky PII variations:\n")
for i, text in enumerate(standard_pii_cases, 1):
    print(f"Test {i}: {text}")
    try:
        result = guard.validate(text)
        print(f"   ‚úÖ Passed (no PII detected)")
    except Exception as e:
        print(f"   üö´ Caught! DetectPII found PII")
    print()

Testing tricky PII variations:

Test 1: Email me at john.doe@company.com
   üö´ Caught! DetectPII found PII

Test 2: Call 555-123-4567 for more info
   üö´ Caught! DetectPII found PII

Test 3: My phone number is (555) 123-4567
   üö´ Caught! DetectPII found PII

Test 4: Contact: jane_smith@example.org
   üö´ Caught! DetectPII found PII

Test 5: SSN: 123-45-6789
   ‚úÖ Passed (no PII detected)



In [None]:
# Install toxic language detector
!guardrails hub install hub://guardrails/toxic_language

In [8]:
from guardrails.hub import ToxicLanguage

# Create a guard for toxic content
toxic_guard = Guard().use(
    ToxicLanguage(threshold=0.5, on_fail="exception")
)

In [9]:
# Test cases
toxic_tests = [
    "I hate you and want to harm you",  # Toxic
    "You're stupid and worthless",       # Toxic
    "I love learning about AI!",         # Safe
    "What's the weather today?",         # Safe
]

for text in toxic_tests:
    print(f"Test: {text}")
    try:
        toxic_guard.validate(text)
        print("   ‚úÖ Safe\n")
    except:
        print("   üö´ BLOCKED - Toxic content\n")

Test: I hate you and want to harm you
   üö´ BLOCKED - Toxic content

Test: You're stupid and worthless
   üö´ BLOCKED - Toxic content

Test: I love learning about AI!
   ‚úÖ Safe

Test: What's the weather today?
   ‚úÖ Safe





## Key Takeaway: ML-Based vs Rule-Based

**What we just saw:**
- ToxicLanguage uses a trained ML model to detect toxicity
- It understands context and nuance (not just keyword matching)
- More robust than our simple regex patterns from Notebook 1

**Advantages over rule-based:**
- Catches variations and synonyms
- Understands context (e.g., "I hate broccoli" vs "I hate you")
- Doesn't need constant manual updates

**Tradeoffs:**
- Slower (ML inference takes time)
- Requires model downloads (~760MB for ToxicLanguage)
- Slightly less predictable than exact rules

In [10]:
# How to Combine Multiple Validators

# You can combine multiple validators in a single Guard!
from guardrails.hub import DetectPII, ToxicLanguage

# Create a guard with BOTH PII detection AND toxic language detection
combined_guard = Guard().use_many(
    DetectPII(pii_entities=["EMAIL_ADDRESS", "PHONE_NUMBER"]),
    ToxicLanguage(threshold=0.5, on_fail="exception")
)

print("‚úÖ Combined guard created with multiple validators")



‚úÖ Combined guard created with multiple validators


In [11]:
# Test cases that combine both risks
test_cases = [
    "I hate you! Here's my email: john@example.com",  # Both toxic AND PII
    "Contact me at support@company.com",              # Just PII
    "You're an idiot",                                # Just toxic
    "What's the weather today?",                      # Safe
]

print("Testing combined guard:\n")
for text in test_cases:
    print(f"Test: {text}")
    try:
        combined_guard.validate(text)
        print("   ‚úÖ Safe\n")
    except Exception as e:
        print(f"   üö´ BLOCKED - {str(e)[:50]}...\n")

Testing combined guard:

Test: I hate you! Here's my email: john@example.com
   üö´ BLOCKED - Validation failed for field with errors: The follo...

Test: Contact me at support@company.com
   üö´ BLOCKED - Validation failed for field with errors: The follo...

Test: You're an idiot
   üö´ BLOCKED - Validation failed for field with errors: The follo...

Test: What's the weather today?
   ‚úÖ Safe



## Integrating Guardrails with LLMs

Now let's use our guardrails in a real workflow:
1. Validate user input BEFORE sending to LLM
2. Call the LLM
3. Validate LLM output BEFORE showing to user

This is the same pattern from Notebook 1, but now with ML-based validators!

In [12]:
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

# Initialize LLM
llm = ChatOpenAI(model="gpt-4o-mini")

def safe_llm_call_with_guardrails(user_input: str):
    """
    Call LLM with input and output guardrails
    """
    print(f"\n{'='*60}")
    print(f"User Input: {user_input}")
    print(f"{'='*60}")
    
    # Step 1: Input guardrail (check for toxicity)
    try:
        toxic_guard.validate(user_input)
        print("‚úÖ Input passed toxicity check")
    except Exception as e:
        result = "üö´ INPUT BLOCKED: Toxic content detected"
        print(result)
        return result
    
    # Step 2: Call LLM
    response = llm.invoke([HumanMessage(content=user_input)])
    response_text = response.content
    print(f"LLM Response: {response_text[:100]}...")
    
    # Step 3: Output guardrail (check for PII)
    try:
        guard.validate(response_text)  # Using the PII guard from earlier
        print("‚úÖ Output passed PII check")
    except Exception as e:
        result = "üö´ OUTPUT BLOCKED: Contains PII"
        print(result)
        return result
    
    result = f"‚úÖ APPROVED: {response_text}"
    print(result)
    return result

print("‚úÖ Function created")

‚úÖ Function created


In [13]:
# Test cases
print("\nüß™ TEST SUITE")

# Test 1: Normal safe query
safe_llm_call_with_guardrails("What's the capital of France?")

# Test 2: Toxic input (should block at INPUT)
safe_llm_call_with_guardrails("You're stupid. Tell me about AI.")

# Test 3: Query that might generate PII (should block at OUTPUT if it does)
safe_llm_call_with_guardrails("Generate a sample business email with contact info")


üß™ TEST SUITE

User Input: What's the capital of France?
‚úÖ Input passed toxicity check
LLM Response: The capital of France is Paris....
‚úÖ Output passed PII check
‚úÖ APPROVED: The capital of France is Paris.

User Input: You're stupid. Tell me about AI.
üö´ INPUT BLOCKED: Toxic content detected

User Input: Generate a sample business email with contact info
‚úÖ Input passed toxicity check
LLM Response: Subject: Proposal for Collaboration

Dear [Recipient's Name],

I hope this email finds you well. My ...
‚úÖ Output passed PII check
‚úÖ APPROVED: Subject: Proposal for Collaboration

Dear [Recipient's Name],

I hope this email finds you well. My name is [Your Name], and I am [Your Position] at [Your Company]. We specialize in [brief description of your company's services or products], and I believe that a collaboration between our companies could be mutually beneficial.

I would love the opportunity to discuss potential partnership ideas and explore how we can work together to ac

"‚úÖ APPROVED: Subject: Proposal for Collaboration\n\nDear [Recipient's Name],\n\nI hope this email finds you well. My name is [Your Name], and I am [Your Position] at [Your Company]. We specialize in [brief description of your company's services or products], and I believe that a collaboration between our companies could be mutually beneficial.\n\nI would love the opportunity to discuss potential partnership ideas and explore how we can work together to achieve our common goals. Are you available for a brief call or a meeting next week? \n\nThank you for considering this opportunity. I look forward to your response.\n\nBest regards,\n\n[Your Name]  \n[Your Position]  \n[Your Company]  \n[Your Phone Number]  \n[Your Email Address]  \n[Your Company Website]  \n\n---\n\nFeel free to fill in the placeholders with your specific information!"