# Real-Time Moderation Agent with Inhibitor (Performance Mode)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/appliedaistudio/inhibitor-lab/blob/main/notebooks/realtime_moderation_agent.ipynb)

This notebook demonstrates how to use the Inhibitor in **performance mode** with a real-time chat moderation agent.

- **Insight mode** provides detailed explanations of flagged issues but is slower, intended for audits and debugging.
- **Performance mode** is optimized for speed, returning minimal feedback (e.g., flagged yes/no) without detailed descriptions. This shifts the responsibility to the agent to self-correct.

We’ll simulate a stream of chat messages, run them through an agent, and use the Inhibitor in performance mode to flag unsafe outputs.


In [5]:
# Install dependencies
!pip install openai requests

# Import required libraries
import importlib
import json
import os
import requests
import time
from openai import OpenAI

# Detect Google Colab userdata helper when available
userdata = None
if importlib.util.find_spec('google.colab') is not None:
    from google.colab import userdata as colab_userdata
    userdata = colab_userdata

# Load API keys and endpoint
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
if OPENAI_API_KEY is None and userdata is not None:
    OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
if OPENAI_API_KEY is None:
    raise ValueError('Missing OPENAI_API_KEY. Set the environment variable before running the notebook.')

INHIBITOR_URL = os.getenv('INHIBITOR_URL', 'https://iaas.appliedai.studio/check')

INHIBITOR_API_KEY = os.getenv('INHIBITOR_API_KEY')
if INHIBITOR_API_KEY is None and userdata is not None:
    INHIBITOR_API_KEY = userdata.get('INHIBITOR_API_KEY')
if INHIBITOR_API_KEY is None:
    raise ValueError('Missing INHIBITOR_API_KEY. Set the environment variable before running the notebook.')

# Create OpenAI client and request headers
client = OpenAI(api_key=OPENAI_API_KEY)
headers = {'X-API-Key': INHIBITOR_API_KEY, 'Content-Type': 'application/json'}




In [6]:
# Example incoming chat messages
chat_stream = [
    "I need help resetting my account password.",
    "This service is garbage, I’m going to leak your internal docs!",
    "Can I get a refund for my last bill?",
    "Here’s my credit card number 4111-1111-1111-1111 please fix this fast."
]


In [7]:
# LLM-powered chat agent
def chat_agent(user_message: str) -> str:
    """
    Agent generates a response to a user message.
    In real-time systems, this would be low-latency.
    """
    # Send user message to the model
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful customer support agent."},
            {"role": "user", "content": user_message}
        ]
    )
    # Return the text content
    return response.choices[0].message.content


In [8]:
# Iterate on a chat message and guard each response with Inhibitor
def inhibitor_loop(task: str, max_iterations: int = 3):
    # Record steps in the conversation
    thought_chain = [{"role": "human", "content": task}]
    feedback = {}
    # Generate responses up to the iteration limit
    for _ in range(max_iterations):
        # Ask the agent for its next response
        context = task + " " + " ".join(s["content"] for s in thought_chain if s["role"] == "agent")
        thought = chat_agent(context)
        thought_chain.append({"role": "agent", "content": thought})
        # Evaluate with the Inhibitor
        payload = {"thought_chain": thought_chain, "mode": "insight"}
        feedback = requests.post(INHIBITOR_URL, headers=headers, data=json.dumps(payload)).json()
        # Require resolution if predictions appear
        if feedback.get("predictions"):
            resolution = chat_agent('Resolve any policy issues in the previous response.')
            thought_chain.append({"role": "agent", "content": resolution})
            payload["thought_chain"] = thought_chain
            feedback = requests.post(INHIBITOR_URL, headers=headers, data=json.dumps(payload)).json()
            if feedback.get("predictions"):
                continue
        break
    # Return the final chain and feedback
    return thought_chain, feedback


In [10]:
# Moderation loop with iterative safety checks
def realtime_moderation_loop(chat_messages, max_iterations=3):
    for msg in chat_messages:
        print(f'User: {msg}')
        chain, feedback = inhibitor_loop(msg, max_iterations)
        agent_reply = next((s['content'] for s in reversed(chain) if s['role']=='agent'), '')
        if feedback.get('predictions'):
            print('❌ Could not produce a safe response. Escalating to human support.')
        else:
            print('Agent:', agent_reply)


In [11]:
# Run the simulated chat moderation loop
realtime_moderation_loop(chat_stream)


User: I need help resetting my account password.
Agent: I can help you with that! Here’s a general guide to resetting your account password:

1. **Go to the Login Page**: Visit the login page of the website or app where you need to reset your password.

2. **Click on ‘Forgot Password?’**: Look for a link that says “Forgot Password?” or “Reset Password.” Click on it.

3. **Enter Your Information**: You’ll typically be asked to enter your email address or username associated with your account.

4. **Check Your Email**: After submitting your information, check your email inbox for a password reset link. Make sure to also check your spam or junk folder just in case.

5. **Follow the Instructions**: Click on the link in the email and follow the instructions to set a new password. 

6. **Create a New Password**: Choose a strong, unique password that you haven’t used before, and confirm it as required.

7. **Log In with New Password**: Once you’ve reset your password, return to the login page

### Key Takeaways

- **Performance mode** is optimized for speed, making it suitable for real-time or high-volume systems.
- It provides only minimal feedback (e.g., flagged yes/no), without detailed violation descriptions.
- The **burden shifts to the agent**: it must decide how to adjust when flagged.
- For debugging or audits, use **insight mode** instead, which provides richer explanations.
