Cell 1: Install Dependencies & Basic Imports

In this cell, we install the necessary packages and import required modules.
We install langgraph, langchain-openai, pinecone-client, and the CLIP model from GitHub,
as well as opencv-python and torch.

In [8]:
!pip install --upgrade langgraph langchain-openai pinecone-client
!pip install git+https://github.com/openai/CLIP.git
!pip install opencv-python torch

import os
os.environ["USER_AGENT"] = "your_user_agent"  # Set your user agent

# Imports for text moderation (using ChatOpenAI)
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

# Imports for image moderation (using CLIP)
import torch, clip
from PIL import Image

# Imports for agent creation (using langgraph)
from langgraph.prebuilt import create_react_agent
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, StateGraph
from langgraph.graph.message import add_messages
from langchain.schema import AIMessage

Collecting git+https://github.com/openai/CLIP.git
  Cloning https://github.com/openai/CLIP.git to /tmp/pip-req-build-yh1cgl1i
  Running command git clone --filter=blob:none --quiet https://github.com/openai/CLIP.git /tmp/pip-req-build-yh1cgl1i
  Resolved https://github.com/openai/CLIP.git to commit dcba3cb2e2827b402d2701e7e1c7d9fed8a20ef1
  Preparing metadata (setup.py) ... [?25l[?25hdone


Cell 2: Initialize Models for Moderation

 Here we initialize two models:
 1. A ChatOpenAI model for text moderation (e.g., GPT‑4).
2. A CLIP model for image moderation.
 Replace "YOUR_OPENAI_API_KEY" with your actual OpenAI API key.

In [12]:
OPENAI_API_KEY = "YOUR_OPENAI_API_KEY"
model_text = ChatOpenAI(
    model_name="gpt-4",
    openai_api_key=OPENAI_API_KEY,
    temperature=0.0
)

# Initialize the CLIP model (using ViT-B/32) for image processing.
# The model will use CUDA if available, otherwise CPU.
clip_device = "cuda" if torch.cuda.is_available() else "cpu"
clip_model, clip_preprocess = clip.load("ViT-B/32", device=clip_device)

Cell 3: Define Moderation Functions

 In this cell we define two key functions:
 1. moderate_review_tool: Analyzes a review text using GPT-4 to check for offensive language,
    aggression, insults, and calls for violence. It returns a rejection message if any issues are found.
 2. moderate_image_tool: Loads an image and computes its embedding with CLIP, then compares it to a set
    of predefined descriptions of inappropriate content. If the similarity for any description exceeds
    a threshold, the image is flagged.

In [13]:
def moderate_review_tool(review: str) -> str:
    """
    Moderates text reviews.
    Analyzes the input text for offensive language, aggression, insults, and calls for violence.
    Returns a rejection message with an explanation if any issues are detected,
    otherwise confirms that the review passed moderation.
    """
    try:
        prompt = f"""
You are a content moderator. Analyze the following text for the presence of:
- Offensive language,
- Aggression or insults,
- Implicit metaphorical insults,
- Calls for violence.

If you find any of these elements, politely refuse to publish the review, explain what was found,
and suggest a more appropriate version. If the text is clean, answer: "The review passed moderation and can be published."

Review:
{review}

Answer:
"""
        response = model_text.invoke([HumanMessage(content=prompt)])
        return response.content.strip()
    except Exception as e:
        return f"Error moderating text: {str(e)}"

def moderate_image_tool(image_path: str) -> str:
    """
    Moderates images.
    Loads an image, computes its embedding using CLIP, and compares it against a set of predefined
    descriptions of inappropriate content.
    If the similarity with any description exceeds the threshold, the image is flagged as inappropriate.
    """
    try:
        image = Image.open(image_path).convert("RGB")
        image_input = clip_preprocess(image).unsqueeze(0).to(clip_device)
        # Predefined descriptions of inappropriate content
        prompts = [
            "pornographic content",
            "explicit sexual content",
            "extremist propaganda",
            "fascist symbols",
            "graphic violence",
            "topless woman",
            "nudity",
            "bare breasts",
            "exposed breasts"
        ]
        text_inputs = torch.cat([clip.tokenize(prompt) for prompt in prompts]).to(clip_device)
        with torch.no_grad():
            image_features = clip_model.encode_image(image_input)
            text_features = clip_model.encode_text(text_inputs)
        # Normalize features to compute cosine similarity
        image_features /= image_features.norm(dim=-1, keepdim=True)
        text_features /= text_features.norm(dim=-1, keepdim=True)
        similarities = (image_features @ text_features.T).squeeze(0)
        threshold = 0.25  # Adjust this threshold as needed
        flagged_prompts = [prompts[i] for i, sim in enumerate(similarities) if sim > threshold]
        if flagged_prompts:
            return (f"Inappropriate content detected: {', '.join(flagged_prompts)}. "
                    "The image cannot be published.")
        else:
            return "The image passed moderation and can be published."
    except Exception as e:
        return f"Error moderating image: {str(e)}"

Cell 4: Wrap Functions into Structured Tools
==============================================================================
 In this cell, we define Pydantic schemas for the input data and wrap our moderation functions
 as StructuredTool objects. This allows the agent to call them based on commands.

In [14]:
from pydantic import BaseModel, Field
from langchain_core.tools import StructuredTool

class ModerateReviewInput(BaseModel):
    review: str = Field(..., description="The review text to be moderated.")

class ModerateImageInput(BaseModel):
    image_path: str = Field(..., description="The file path to the image to be moderated.")

moderate_review_tool_structured = StructuredTool.from_function(
    func=moderate_review_tool,
    name="moderate_review_tool",
    description=(
        "Moderates text reviews by checking for offensive language, aggression, and calls for violence. "
        "Returns a rejection with an explanation if issues are found."
    ),
    args_schema=ModerateReviewInput
)

moderate_image_tool_structured = StructuredTool.from_function(
    func=moderate_image_tool,
    name="moderate_image_tool",
    description=(
        "Moderates images by comparing them with predefined descriptions of inappropriate content (e.g., pornography, extremism, fascist symbols, violence). "
        "Returns a message if inappropriate content is detected."
    ),
    args_schema=ModerateImageInput
)

Cell 5: Create the Moderation Agent
 ==============================================================================
 Now we create an agent that uses only our two moderation tools.
 We initialize a MemorySaver to keep track of the conversation history and use a StateGraph to manage dialogue state.


In [15]:
tools = [
    moderate_review_tool_structured,
    moderate_image_tool_structured
]

# Initialize agent memory for conversation tracking.
memory = MemorySaver()

# Create the agent with a system prompt focused on content moderation.
agent_executor = create_react_agent(
    model=model_text,  # We use the text model for processing commands
    tools=tools,
    checkpointer=memory,
    state_modifier=(
        "You are a content moderator. Your task is to check text reviews and images for inappropriate content. "
        "Use the provided moderation tools and return an appropriate message based on your analysis."
    )
)

# Define the conversation state schema.
from typing import Sequence, TypedDict
from typing_extensions import Annotated

class State(TypedDict):
    messages: Annotated[Sequence, add_messages]

def call_agent(state: State):
    response = agent_executor.invoke({"messages": state["messages"]})
    return {"messages": state["messages"] + response["messages"]}

# Create the state graph and compile the agent.
workflow = StateGraph(state_schema=State)
workflow.add_edge(START, "agent")
workflow.add_node("agent", call_agent)
app = workflow.compile(checkpointer=memory)

Cell 6: Interactive Chat for Testing the Moderation Agent
 ==============================================================================
 This cell launches an interactive console chat.
 You can input commands like:
   "Use moderate_image_tool {\"image_path\": \"/path/to/image.jpg\"}"
 or any text review that the agent will moderate.

In [16]:
def interactive_chat():
    print("Content Moderation Agent is running. Type 'exit' to quit.")
    state = {"messages": []}
    config = {"configurable": {"thread_id": "moderation_session_1"}}
    while True:
        user_input = input("You: ")
        if user_input.lower() == "exit":
            print("Session terminated.")
            break
        # Create a human message and add it to the conversation state.
        from langchain_core.messages import HumanMessage
        human_message = HumanMessage(content=user_input)
        state["messages"].append(human_message)
        try:
            result = app.invoke({"messages": state["messages"]}, config=config)
            ai_response = result["messages"][-1]
            state["messages"].append(ai_response)
            print(f"Agent: {ai_response.content}")
        except Exception as e:
            print(f"Error calling the agent: {e}")

if __name__ == "__main__":
    interactive_chat()

Content Moderation Agent is running. Type 'exit' to quit.
You: /content/drive/MyDrive/Guapos/640.gallery4_8761197d325541f926b015130fbb6bd0.jpg
Agent: The image you've submitted contains inappropriate content, specifically nudity. As per our community guidelines, we cannot publish this image. Please submit a different image that adheres to our content policies.
You: exit
Session terminated.


Final Notes
Concept Overview:
This notebook is designed as a "recipe" for building a content moderation agent. It demonstrates how to use GPT‑4 via ChatOpenAI to moderate text reviews and CLIP to moderate images.

Agent Structure:
The agent uses LangGraph's MemorySaver and StateGraph to manage dialogue state, making it easy to integrate into an interactive session.

Usage:
To test image moderation, input a command in the chat window such as:



 Use moderate_image_tool {"image_path": "/path/to/your/image.jpg"}


For text moderation, simply type a review message.