# **Guardrails**

Summary: OpenAI have also included guardrails in the Agents SDK. These come as input guardrails and output guardrails, the input_guardrail checks that the input going into your LLM is "safe" and the output_guardrail checks that the output from your LLM is "safe".

Guardrails run in parallel to your agents, enabling you to do checks and validations of user input. For example, imagine you have an agent that uses a very smart (and hence slow/expensive) model to help with customer requests. You wouldn't want malicious users to ask the model to help them with their math homework. So, you can run a guardrail with a fast/cheap model. If the guardrail detects malicious usage, it can immediately raise an error, which stops the expensive model from running and saves you time/money.

There are two kinds of guardrails:

1. Input guardrails run on the initial user input
2. Output guardrails run on the final agent output

Reference:
https://openai.github.io/openai-agents-python/guardrails/


## Input guardrails

Input guardrails run in 3 steps:

1. First, the guardrail receives the same input passed to the agent.
2. Next, the guardrail function runs to produce a GuardrailFunctionOutput, which is then wrapped in an InputGuardrailResult
3. Finally, we check if .tripwire_triggered is true. If true, an InputGuardrailTripwireTriggered exception is raised, so you can appropriately respond to the user or handle the exception.

Input guardrails are intended to run on user input, so an agent's guardrails only run if the agent is the first agent. You might wonder, why is the guardrails property on the agent instead of passed to Runner.run? It's because guardrails tend to be related to the actual Agent - you'd run different guardrails for different agents, so colocating the code is useful for readability.

## Tripwires

If the input or output fails the guardrail, the Guardrail can signal this with a tripwire. As soon as we see a guardrail that has triggered the tripwires, we immediately raise a {Input,Output}GuardrailTripwireTriggered exception and halt the Agent execution.

# Install openai-agents SDK

In [13]:
!pip install -Uq openai-agents pydantic

# Make your Notebook capable of running asynchronous functions.
Both Jupyter notebooks and Python’s asyncio library utilize event loops, but they serve different purposes and can sometimes interfere with each other.

The nest_asyncio library allows the existing event loop to accept nested event loops, enabling asyncio code to run within environments that already have an event loop, such as Jupyter notebooks.

In summary, both Jupyter notebooks and Python’s asyncio library utilize event loops to manage asynchronous operations. When working within Jupyter notebooks, it’s essential to be aware of the existing event loop to effectively run asyncio code without conflicts.

In [14]:
import nest_asyncio
nest_asyncio.apply()

In [15]:
from pydantic import BaseModel
from agents import (
    Agent,
    GuardrailFunctionOutput,
    InputGuardrailTripwireTriggered,
    OutputGuardrailTripwireTriggered,
    RunContextWrapper,
    Runner,
    TResponseInputItem,
    input_guardrail,
    output_guardrail,
    AsyncOpenAI,
    OpenAIChatCompletionsModel,
    RunConfig
)
from google.colab import userdata


In [16]:
gemini_api_key = userdata.get("GEMINI_API_KEY")




In [17]:
#Reference: https://ai.google.dev/gemini-api/docs/openai
external_client = AsyncOpenAI(
    api_key=gemini_api_key,
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
)

model = OpenAIChatCompletionsModel(
    model="gemini-2.0-flash",
    openai_client=external_client
)

config = RunConfig(
    model=model,
    model_provider=external_client,
    tracing_disabled=True
)


## Implementation of Input Guardrail:
Checks that the input going into your LLM is "safe"

In [18]:
class MathHomeworkOutput(BaseModel):
    is_math_homework: bool
    reasoning: str

guardrail_agent = Agent(
    name="Guardrail check",
    instructions="Check if the user is asking you to do their math homework.",
    output_type=MathHomeworkOutput,
    model=model
)

out = Runner.run_sync(guardrail_agent,"What is the capital of Pakistan")
print(out.final_output)
print(out.final_output.model_dump())
print(out.final_output.is_math_homework)
print(out.final_output.reasoning)
print(out.final_output.model_dump()['is_math_homework'])
print(out.final_output.model_dump()['reasoning'])



is_math_homework=False reasoning='The user is asking a question about geography, not math.'
{'is_math_homework': False, 'reasoning': 'The user is asking a question about geography, not math.'}
False
The user is asking a question about geography, not math.
False
The user is asking a question about geography, not math.


In [27]:

@input_guardrail
async def math_guardrail(
    ctx: RunContextWrapper[None], agent: Agent, input: str | list[TResponseInputItem]
) -> GuardrailFunctionOutput:
    result = await Runner.run(guardrail_agent, input, context=ctx.context, run_config = config)

    return GuardrailFunctionOutput(
        output_info=result.final_output,
        tripwire_triggered=result.final_output.is_math_homework is False,
        # tripwire_triggered=result.final_output.is_math_homework,
    )

In [28]:
agent = Agent(
    name="Customer support agent",
    instructions="You are a customer support agent. You help customers with their questions.",
    input_guardrails=[math_guardrail],
)

In [29]:
# This should trip the guardrail

try:
    result = await Runner.run(agent, "Hello, can you help me solve for x: 2x + 3 = 11?", run_config = config)
    print("Guardrail didn't trip - this is unexpected")
    print(result.final_output)

except InputGuardrailTripwireTriggered:
    print("Math homework guardrail tripped")


Guardrail didn't trip - this is unexpected
I can definitely help you with that! Here's how to solve for x in the equation 2x + 3 = 11:

1.  **Isolate the term with x:**
    *   Subtract 3 from both sides of the equation:
        *   2x + 3 - 3 = 11 - 3
        *   2x = 8

2.  **Solve for x:**
    *   Divide both sides of the equation by 2:
        *   2x / 2 = 8 / 2
        *   x = 4

Therefore, the solution is **x = 4**.

Let me know if you have any other questions!



In [30]:
try:
    result = await Runner.run(agent, "Hello", run_config = config)
    print(result.final_output)

except InputGuardrailTripwireTriggered:
    print("Math homework guardrail tripped")

Math homework guardrail tripped


In [31]:
try:
    result = await Runner.run(agent, "can you solve 2+3 for me", run_config = config)
    print(result.final_output)

except InputGuardrailTripwireTriggered:
    print("Math homework guardrail tripped")

2 + 3 = 5



## Implementation of Onput Guardrail:
The output guardrail checks that the output from your LLM is "safe".

In [32]:
class MessageOutput(BaseModel):
    response: str

class MathOutput(BaseModel):
    is_math: bool
    reasoning: str

guardrail_agent2 = Agent(
    name="Guardrail check",
    instructions="Check if the output includes any math.",
    output_type=MathOutput,
)

@output_guardrail
async def math_guardrail2(
    ctx: RunContextWrapper, agent: Agent, output: MessageOutput
) -> GuardrailFunctionOutput:
    result = await Runner.run(guardrail_agent2, output.response, context=ctx.context, run_config = config)

    return GuardrailFunctionOutput(
        output_info=result.final_output,
        tripwire_triggered=result.final_output.is_math,
    )

In [33]:
agent2 = Agent(
    name="Customer support agent",
    instructions="You are a customer support agent. You help customers with their questions.",
    output_guardrails=[math_guardrail2],
    output_type=MessageOutput,
)

In [34]:
# This should trip the guardrail
try:
    await Runner.run(agent2, "Hello, can you help me solve for x: 2x + 3 = 11?", run_config = config)
    print("Guardrail didn't trip - this is unexpected")

except OutputGuardrailTripwireTriggered:
    print("Math output guardrail tripped")

Math output guardrail tripped


In [38]:
try:
    a = await Runner.run(agent2, "Hello", run_config = config)
    print("Guardrail didn't trip - this is unexpected")
    print(a.final_output.model_dump())
    print(a.final_output.response)

except OutputGuardrailTripwireTriggered:
    print("Math output guardrail tripped")

Guardrail didn't trip - this is unexpected
{'response': 'Hello! How can I help you today?'}
Hello! How can I help you today?
