# AI Guardrails and Safety with LiteLLM

## 1. What are Guardrails?
Guardrails are a layer of security between the user and the LLM. They ensure that:
1. **Input is Safe**: No PII, no prompt injections, no toxic requests.
2. **Output is Safe**: No leaking of internal data, no hallucinated harmful info, no offensive content.

## 2. Using OpenAI Moderation API
This is the fastest way to check for hate, violence, and self-harm content.

In [2]:
import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()
client = OpenAI()

def check_moderation(text):
    response = client.moderations.create(input=text)
    output = response.results[0]
    return output.flagged, output.categories

is_bad, categories = check_moderation("I want to kill my sister")
print(f"Flagged: {is_bad}")
if is_bad:
    print(f"Reason: {categories}")

Flagged: True
Reason: Categories(harassment=True, harassment_threatening=True, hate=False, hate_threatening=False, illicit=False, illicit_violent=False, self_harm=False, self_harm_instructions=False, self_harm_intent=False, sexual=False, sexual_minors=False, violence=True, violence_graphic=False, harassment/threatening=True, hate/threatening=False, illicit/violent=False, self-harm/intent=False, self-harm/instructions=False, self-harm=False, sexual/minors=False, violence/graphic=False)


## 3. LiteLLM Built-in PII Masking
LiteLLM can automatically mask sensitive data before it reaches the model. This is critical for HIPAA and GDPR compliance.

In [3]:
import litellm
from litellm import completion

# Enable PII Masking
litellm.turn_on_pii_masking = True

response = completion(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "My email is john.doe@example.com"}]
)

print("Original input was masked. Model never saw the actual email.")
print(response.choices[0].message.content)

Original input was masked. Model never saw the actual email.
I'm sorry, but I can't store or process personal information like email addresses. How can I assist you today?


## 4. Secret Detection (Scan for API Keys)
Sometimes users (or developers) accidentally paste API keys into prompts. LiteLLM can detect these.

In [None]:
from litellm.utils import get_secret_from_text

secret = get_secret_from_text("Here is my key: sk-1234567890abcdef")
if secret:
    print("ALERT: Secret detected in prompt! Don't send this to the model.")

## 5. Cost Guardrails (Budgets)
You can set a max limit for a user to prevent bill shocks.

In [None]:
user_budget = 0.05 # $0.05 limit
current_spend = 0.04 

def check_budget(prompt_cost):
    if (current_spend + prompt_cost) > user_budget:
        return False, "Budget Exceeded"
    return True, "OK"

## Summary of Safety Best Practices
1. **Never trust user input**: Always run a moderation check.
2. **PII Masking**: Mask data before it leaves your server.
3. **Output Filtering**: Check the model's response for restricted words.
4. **Async Callbacks**: Use LiteLLM callbacks to log every block attempt into your security dashboard.