# Pillar 6 · Technical Safeguards

Design layered defenses that protect Azure AI agents from misuse and drift.

## Learning Objectives

- Implement guardrails across the prompt, tool, and response pipeline
- Automate evaluation scenarios to detect regressions
- Document resilience strategies for high-risk agent tasks

## Safeguard Architecture

1. **Prompt Shielding**: Add pre-processing and content filters around `AgentManager.run_agent` to block disallowed input patterns.
2. **Tool Authorization**: Enforce allowlists on function calls leveraging the tool definitions in `01_agent/01_foundry_agent.ipynb` before execution.
3. **Response Moderation**: Integrate post-response checks with Azure AI Content Safety APIs; log outcomes for traceability.
4. **Regression Sweeps**: Schedule automated evaluations (see `02_ethical_responsible.ipynb`) to detect degradation in safety metrics.

## Guardrail Checklist

- Input validation and prompt filtering active
- Tool execution governed by policy
- Response moderation integrated into monitoring pipeline

## Wrap-Up

Return to `01_catalog.ipynb` to track overall governance coverage and identify improvement actions.

## Configure PATH for Azure CLI

Ensure the Azure CLI is accessible in the notebook kernel's PATH.

In [None]:
import os
import shutil

# Replace with the directory you want to add
new_path_entry = "/opt/homebrew/bin"
current_path = os.environ.get('PATH', '')

if new_path_entry not in current_path.split(os.pathsep):
    os.environ['PATH'] = new_path_entry + os.pathsep + current_path
    print(f"Updated PATH for this session: {os.environ['PATH']}")
else:
    print(f"PATH already contains {new_path_entry}: {current_path}")

# You can then verify with shutil.which again
print(f"Location of 'az' found by kernel now: {shutil.which('az')}")

In [None]:
# Guarded agent run helper with prompt shielding
import os
import sys
from typing import Callable, Optional
from dotenv import load_dotenv
from azure.identity import DefaultAzureCredential
from azure.ai.projects import AIProjectClient

sys.path.append(os.path.join(os.path.dirname(os.getcwd()), 'utils'))
from agent_utils import AgentManager

# Load environment variables
load_dotenv("../.env")

print("✅ Imports successful")

endpoint = os.getenv("AZURE_AI_PROJECT_ENDPOINT")
if not endpoint:
    raise ValueError("Set AZURE_AI_PROJECT_ENDPOINT in .env.local before running this notebook.")

project_client = AIProjectClient(
    endpoint=endpoint,
    credential=DefaultAzureCredential()
)

manager = AgentManager(project_client)
print("Agent manager ready.")

FORBIDDEN_TERMS = {"password", "social security", "credit card", "exploit"}

def run_with_prompt_shield(agent_id: str, prompt: str, sanitizer: Optional[Callable[[str], str]] = None) -> str:
    lowered = prompt.lower()
    if any(term in lowered for term in FORBIDDEN_TERMS):
        raise ValueError("Prompt rejected by shield policy.")
    clean_prompt = sanitizer(prompt) if sanitizer else prompt
    thread = manager.create_thread({"purpose": "guarded-run"})
    result = manager.run_agent_simple(
        thread_id=thread.id,
        agent_id=agent_id,
        user_message=clean_prompt,
        verbose=False
    )
    return result

print("Prompt shield ready.")

In [None]:
# Run a guarded request and perform lightweight response moderation
agent_id = os.getenv("TARGET_AGENT_ID", "asst_CncymMdTqov5hRCQQJrvLwhX")

def basic_response_moderator(text: str) -> dict:
    unsafe_terms = {"self-harm", "violence", "exploit"}
    hits = [term for term in unsafe_terms if term in text.lower()]
    return {"isFlagged": bool(hits), "hits": hits}

prompt = "Provide onboarding guidance for a new employee."
response = run_with_prompt_shield(agent_id, prompt)

moderation_result = basic_response_moderator(response)
print("Response:\n", response)
print("Moderation:", moderation_result)

if moderation_result["isFlagged"]:
    raise RuntimeError("Escalate to human reviewer per policy.")