Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified docs/benchmarking/NSFW_roc_curve.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/benchmarking/alignment_roc_curves.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/benchmarking/hallucination_detection_roc_curves.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/benchmarking/jailbreak_roc_curve.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/evals.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ guardrails-evals \
--config-path guardrails_config.json \
--dataset-path data.jsonl \
--mode benchmark \
--models gpt-5 gpt-5-mini gpt-5-nano
--models gpt-5 gpt-5-mini
```

Test with included demo files in our [github repository](https://github.com/openai/openai-guardrails-python/tree/main/src/guardrails/evals/eval_demo)
Expand Down
18 changes: 2 additions & 16 deletions docs/ref/checks/hallucination_detection.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,10 +175,8 @@ The statements cover various types of factual claims including:
|--------------|---------|-------------|-------------|-------------|
| gpt-5 | 0.854 | 0.732 | 0.686 | 0.670 |
| gpt-5-mini | 0.934 | 0.813 | 0.813 | 0.770 |
| gpt-5-nano | 0.566 | 0.540 | 0.540 | 0.533 |
| gpt-4.1 | 0.870 | 0.785 | 0.785 | 0.785 |
| gpt-4.1-mini (default) | 0.876 | 0.806 | 0.789 | 0.789 |
| gpt-4.1-nano | 0.537 | 0.526 | 0.526 | 0.526 |

**Notes:**
- ROC AUC: Area under the ROC curve (higher is better)
Expand All @@ -192,10 +190,8 @@ The following table shows latency measurements for each model using the hallucin
|--------------|--------------|--------------|
| gpt-5 | 34,135 | 525,854 |
| gpt-5-mini | 23,013 | 59,316 |
| gpt-5-nano | 17,079 | 26,317 |
| gpt-4.1 | 7,126 | 33,464 |
| gpt-4.1-mini (default) | 7,069 | 43,174 |
| gpt-4.1-nano | 4,809 | 6,869 |

- **TTC P50**: Median time to completion (50% of requests complete within this time)
- **TTC P95**: 95th percentile time to completion (95% of requests complete within this time)
Expand All @@ -217,10 +213,8 @@ In addition to the above evaluations which use a 3 MB sized vector store, the ha
|--------------|---------------------|----------------------|---------------------|---------------------------|
| gpt-5 | 28,762 / 396,472 | 34,135 / 525,854 | 37,104 / 75,684 | 40,909 / 645,025 |
| gpt-5-mini | 19,240 / 39,526 | 23,013 / 59,316 | 24,217 / 65,904 | 37,314 / 118,564 |
| gpt-5-nano | 13,436 / 22,032 | 17,079 / 26,317 | 17,843 / 35,639 | 21,724 / 37,062 |
| gpt-4.1 | 7,437 / 15,721 | 7,126 / 33,464 | 6,993 / 30,315 | 6,688 / 127,481 |
| gpt-4.1-mini (default) | 6,661 / 14,827 | 7,069 / 43,174 | 7,032 / 46,354 | 7,374 / 37,769 |
| gpt-4.1-nano | 4,296 / 6,378 | 4,809 / 6,869 | 4,171 / 6,609 | 4,650 / 6,201 |

- **Vector store size impact varies by model**: GPT-4.1 series shows minimal latency impact across vector store sizes, while GPT-5 series shows significant increases.

Expand All @@ -240,10 +234,6 @@ In addition to the above evaluations which use a 3 MB sized vector store, the ha
| | Medium (3 MB) | 0.934 | 0.813 | 0.813 | 0.770 |
| | Large (11 MB) | 0.919 | 0.817 | 0.817 | 0.817 |
| | Extra Large (105 MB) | 0.909 | 0.793 | 0.793 | 0.711 |
| **gpt-5-nano** | Small (1 MB) | 0.590 | 0.547 | 0.545 | 0.536 |
| | Medium (3 MB) | 0.566 | 0.540 | 0.540 | 0.533 |
| | Large (11 MB) | 0.564 | 0.534 | 0.532 | 0.507 |
| | Extra Large (105 MB) | 0.603 | 0.570 | 0.558 | 0.550 |
| **gpt-4.1** | Small (1 MB) | 0.907 | 0.839 | 0.839 | 0.839 |
| | Medium (3 MB) | 0.870 | 0.785 | 0.785 | 0.785 |
| | Large (11 MB) | 0.846 | 0.753 | 0.753 | 0.753 |
Expand All @@ -252,15 +242,11 @@ In addition to the above evaluations which use a 3 MB sized vector store, the ha
| | Medium (3 MB) | 0.876 | 0.806 | 0.789 | 0.789 |
| | Large (11 MB) | 0.862 | 0.791 | 0.757 | 0.757 |
| | Extra Large (105 MB) | 0.802 | 0.722 | 0.722 | 0.722 |
| **gpt-4.1-nano** | Small (1 MB) | 0.605 | 0.528 | 0.528 | 0.528 |
| | Medium (3 MB) | 0.537 | 0.526 | 0.526 | 0.526 |
| | Large (11 MB) | 0.618 | 0.531 | 0.531 | 0.531 |
| | Extra Large (105 MB) | 0.636 | 0.528 | 0.528 | 0.528 |

**Key Insights:**

- **Best Performance**: gpt-5-mini consistently achieves the highest ROC AUC scores across all vector store sizes (0.909-0.939)
- **Best Latency**: gpt-4.1-nano shows the most consistent and lowest latency across all scales (4,171-4,809ms P50) but shows poor performance
- **Best Latency**: gpt-4.1-mini (default) provides the lowest median latencies while maintaining strong accuracy
- **Most Stable**: gpt-4.1-mini (default) maintains relatively stable performance across vector store sizes with good accuracy-latency balance
- **Scale Sensitivity**: gpt-5 shows the most variability in performance across vector store sizes, with performance dropping significantly at larger scales
- **Performance vs Scale**: Most models show decreasing performance as vector store size increases, with gpt-5-mini being the most resilient
Expand All @@ -270,4 +256,4 @@ In addition to the above evaluations which use a 3 MB sized vector store, the ha
- **Signal-to-noise ratio degradation**: Larger vector stores contain more irrelevant documents that may not be relevant to the specific factual claims being validated
- **Semantic search limitations**: File search retrieves semantically similar documents, but with a large diverse knowledge source, these may not always be factually relevant
- **Document quality matters more than quantity**: The relevance and accuracy of documents is more important than the total number of documents
- **Performance plateaus**: Beyond a certain size (11 MB), the performance impact becomes less severe
- **Performance plateaus**: Beyond a certain size (11 MB), the performance impact becomes less severe
12 changes: 4 additions & 8 deletions docs/ref/checks/jailbreak.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,23 +93,19 @@ This benchmark evaluates model performance on a diverse set of prompts:

| Model | ROC AUC | Prec@R=0.80 | Prec@R=0.90 | Prec@R=0.95 | Recall@FPR=0.01 |
|--------------|---------|-------------|-------------|-------------|-----------------|
| gpt-5 | 0.979 | 0.973 | 0.970 | 0.970 | 0.733 |
| gpt-5-mini | 0.954 | 0.990 | 0.900 | 0.900 | 0.768 |
| gpt-5-nano | 0.962 | 0.973 | 0.967 | 0.965 | 0.048 |
| gpt-4.1 | 0.990 | 1.000 | 1.000 | 0.984 | 0.946 |
| gpt-4.1-mini (default) | 0.982 | 0.992 | 0.992 | 0.954 | 0.444 |
| gpt-4.1-nano | 0.934 | 0.924 | 0.924 | 0.848 | 0.000 |
| gpt-5 | 0.982 | 0.984 | 0.977 | 0.977 | 0.743 |
| gpt-5-mini | 0.980 | 0.980 | 0.976 | 0.975 | 0.734 |
| gpt-4.1 | 0.979 | 0.975 | 0.975 | 0.975 | 0.661 |
| gpt-4.1-mini (default) | 0.979 | 0.974 | 0.972 | 0.972 | 0.654 |

#### Latency Performance

| Model | TTC P50 (ms) | TTC P95 (ms) |
|--------------|--------------|--------------|
| gpt-5 | 4,569 | 7,256 |
| gpt-5-mini | 5,019 | 9,212 |
| gpt-5-nano | 4,702 | 6,739 |
| gpt-4.1 | 841 | 1,861 |
| gpt-4.1-mini | 749 | 1,291 |
| gpt-4.1-nano | 683 | 890 |

**Notes:**

Expand Down
10 changes: 4 additions & 6 deletions docs/ref/checks/nsfw.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,12 +82,10 @@ This benchmark evaluates model performance on a balanced set of social media pos

| Model | ROC AUC | Prec@R=0.80 | Prec@R=0.90 | Prec@R=0.95 | Recall@FPR=0.01 |
|--------------|---------|-------------|-------------|-------------|-----------------|
| gpt-5 | 0.9532 | 0.9195 | 0.9096 | 0.9068 | 0.0339 |
| gpt-5-mini | 0.9629 | 0.9321 | 0.9168 | 0.9149 | 0.0998 |
| gpt-5-nano | 0.9600 | 0.9297 | 0.9216 | 0.9175 | 0.1078 |
| gpt-4.1 | 0.9603 | 0.9312 | 0.9249 | 0.9192 | 0.0439 |
| gpt-4.1-mini (default) | 0.9520 | 0.9180 | 0.9130 | 0.9049 | 0.0459 |
| gpt-4.1-nano | 0.9502 | 0.9262 | 0.9094 | 0.9043 | 0.0379 |
| gpt-5 | 0.953 | 0.919 | 0.910 | 0.907 | 0.034 |
| gpt-5-mini | 0.963 | 0.932 | 0.917 | 0.915 | 0.100 |
| gpt-4.1 | 0.960 | 0.931 | 0.925 | 0.919 | 0.044 |
| gpt-4.1-mini (default) | 0.952 | 0.918 | 0.913 | 0.905 | 0.046 |

**Notes:**

Expand Down
12 changes: 4 additions & 8 deletions docs/ref/checks/prompt_injection_detection.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,12 +111,10 @@ This benchmark evaluates model performance on agent conversation traces:

| Model | ROC AUC | Prec@R=0.80 | Prec@R=0.90 | Prec@R=0.95 | Recall@FPR=0.01 |
|---------------|---------|-------------|-------------|-------------|-----------------|
| gpt-5 | 0.9931 | 0.9992 | 0.9992 | 0.9992 | 0.5845 |
| gpt-5-mini | 0.9536 | 0.9951 | 0.9951 | 0.9951 | 0.0000 |
| gpt-5-nano | 0.9283 | 0.9913 | 0.9913 | 0.9717 | 0.0350 |
| gpt-4.1 | 0.9794 | 0.9973 | 0.9973 | 0.9973 | 0.0000 |
| gpt-4.1-mini (default) | 0.9865 | 0.9986 | 0.9986 | 0.9986 | 0.0000 |
| gpt-4.1-nano | 0.9142 | 0.9948 | 0.9948 | 0.9387 | 0.0000 |
| gpt-5 | 0.993 | 0.999 | 0.999 | 0.999 | 0.584 |
| gpt-5-mini | 0.954 | 0.995 | 0.995 | 0.995 | 0.000 |
| gpt-4.1 | 0.979 | 0.997 | 0.997 | 0.997 | 0.000 |
| gpt-4.1-mini (default) | 0.987 | 0.999 | 0.999 | 0.999 | 0.000 |

**Notes:**

Expand All @@ -128,12 +126,10 @@ This benchmark evaluates model performance on agent conversation traces:

| Model | TTC P50 (ms) | TTC P95 (ms) |
|---------------|--------------|--------------|
| gpt-4.1-nano | 1,159 | 2,534 |
| gpt-4.1-mini (default) | 1,481 | 2,563 |
| gpt-4.1 | 1,742 | 2,296 |
| gpt-5 | 3,994 | 6,654 |
| gpt-5-mini | 5,895 | 9,031 |
| gpt-5-nano | 5,911 | 10,134 |

- **TTC P50**: Median time to completion (50% of requests complete within this time)
- **TTC P95**: 95th percentile time to completion (95% of requests complete within this time)
2 changes: 1 addition & 1 deletion examples/basic/agents_sdk.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
{
"name": "Custom Prompt Check",
"config": {
"model": "gpt-4.1-nano-2025-04-14",
"model": "gpt-4.1-mini-2025-04-14",
"confidence_threshold": 0.7,
"system_prompt_details": "Check if the text contains any math problems.",
},
Expand Down
4 changes: 2 additions & 2 deletions examples/basic/hello_world.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
{
"name": "Custom Prompt Check",
"config": {
"model": "gpt-4.1-nano",
"model": "gpt-4.1-mini",
"confidence_threshold": 0.7,
"system_prompt_details": "Check if the text contains any math problems.",
},
Expand All @@ -45,7 +45,7 @@ async def process_input(
# Use the new GuardrailsAsyncOpenAI - it handles all guardrail validation automatically
response = await guardrails_client.responses.create(
input=user_input,
model="gpt-4.1-nano",
model="gpt-4.1-mini",
previous_response_id=response_id,
)

Expand Down
4 changes: 2 additions & 2 deletions examples/basic/multi_bundle.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
{
"name": "Custom Prompt Check",
"config": {
"model": "gpt-4.1-nano",
"model": "gpt-4.1-mini",
"confidence_threshold": 0.7,
"system_prompt_details": "Check if the text contains any math problems.",
},
Expand All @@ -56,7 +56,7 @@ async def process_input(
# including pre-flight, input, and output stages, plus the LLM call
stream = await guardrails_client.responses.create(
input=user_input,
model="gpt-4.1-nano",
model="gpt-4.1-mini",
previous_response_id=response_id,
stream=True,
)
Expand Down
4 changes: 2 additions & 2 deletions examples/basic/multiturn_chat_with_alignment.py
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,7 @@ async def main(malicious: bool = False) -> None:
# Only add to messages AFTER guardrails pass and LLM call succeeds
try:
resp = await client.chat.completions.create(
model="gpt-4.1-nano",
model="gpt-4.1-mini",
messages=messages + [{"role": "user", "content": user_input}],
tools=tools,
)
Expand Down Expand Up @@ -321,7 +321,7 @@ async def main(malicious: bool = False) -> None:
# Final call with tool results (pass inline without mutating messages)
try:
resp = await client.chat.completions.create(
model="gpt-4.1-nano",
model="gpt-4.1-mini",
messages=messages + [assistant_message] + tool_messages,
tools=tools,
)
Expand Down
4 changes: 2 additions & 2 deletions examples/basic/structured_outputs_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ class UserInfo(BaseModel):
{
"name": "Custom Prompt Check",
"config": {
"model": "gpt-4.1-nano",
"model": "gpt-4.1-mini",
"confidence_threshold": 0.7,
"system_prompt_details": "Check if the text contains any math problems.",
},
Expand All @@ -50,7 +50,7 @@ async def extract_user_info(
{"role": "system", "content": "Extract user information from the provided text."},
{"role": "user", "content": text},
],
model="gpt-4.1-nano",
model="gpt-4.1-mini",
text_format=UserInfo,
previous_response_id=previous_response_id,
)
Expand Down
4 changes: 2 additions & 2 deletions examples/basic/suppress_tripwire.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
{
"name": "Custom Prompt Check",
"config": {
"model": "gpt-4.1-nano-2025-04-14",
"model": "gpt-4.1-mini-2025-04-14",
"confidence_threshold": 0.7,
"system_prompt_details": "Check if the text contains any math problems.",
},
Expand All @@ -45,7 +45,7 @@ async def process_input(
# Use GuardrailsClient with suppress_tripwire=True
response = await guardrails_client.responses.create(
input=user_input,
model="gpt-4.1-nano-2025-04-14",
model="gpt-4.1-mini-2025-04-14",
previous_response_id=response_id,
suppress_tripwire=True,
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ async def process_input(
# Only add to messages AFTER guardrails pass and LLM call succeeds
response = await guardrails_client.chat.completions.create(
messages=messages + [{"role": "user", "content": user_input}],
model="gpt-4.1-nano",
model="gpt-4.1-mini",
)

response_content = response.llm_response.choices[0].message.content
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ async def process_input(guardrails_client: GuardrailsAsyncOpenAI, user_input: st
try:
# Use the GuardrailsClient - it handles all guardrail validation automatically
# including pre-flight, input, and output stages, plus the LLM call
response = await guardrails_client.responses.create(input=user_input, model="gpt-4.1-nano", previous_response_id=response_id)
response = await guardrails_client.responses.create(input=user_input, model="gpt-4.1-mini", previous_response_id=response_id)

print(f"\nAssistant: {response.llm_response.output_text}")

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ async def process_input(
# Only add to messages AFTER guardrails pass and streaming completes
stream = await guardrails_client.chat.completions.create(
messages=messages + [{"role": "user", "content": user_input}],
model="gpt-4.1-nano",
model="gpt-4.1-mini",
stream=True,
)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ async def process_input(guardrails_client: GuardrailsAsyncOpenAI, user_input: st
# including pre-flight, input, and output stages, plus the LLM call
stream = await guardrails_client.responses.create(
input=user_input,
model="gpt-4.1-nano",
model="gpt-4.1-mini",
previous_response_id=response_id,
stream=True,
)
Expand Down
2 changes: 1 addition & 1 deletion src/guardrails/checks/text/jailbreak.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
Configuration Parameters:
This guardrail uses the base LLM configuration (see LLMConfig) with these parameters:

- `model` (str): The name of the LLM model to use (e.g., "gpt-4.1-nano", "gpt-4o")
- `model` (str): The name of the LLM model to use (e.g., "gpt-4.1-mini", "gpt-5")
- `confidence_threshold` (float): Minimum confidence score (0.0 to 1.0) required to
trigger the guardrail. Defaults to 0.7.

Expand Down
4 changes: 2 additions & 2 deletions src/guardrails/evals/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ guardrails-evals \
--config-path eval_demo/demo_config.json \
--dataset-path eval_demo/demo_data.jsonl \
--mode benchmark \
--models gpt-5 gpt-5-mini gpt-5-nano
--models gpt-5 gpt-5-mini
```

### Basic Evaluation
Expand All @@ -43,7 +43,7 @@ guardrails-evals \
--config-path guardrails_config.json \
--dataset-path data.jsonl \
--mode benchmark \
--models gpt-5 gpt-5-mini gpt-5-nano
--models gpt-5 gpt-5-mini
```

## Core Components
Expand Down
4 changes: 1 addition & 3 deletions src/guardrails/evals/guardrail_evals.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,8 @@
DEFAULT_BENCHMARK_MODELS = [
"gpt-5",
"gpt-5-mini",
"gpt-5-nano",
"gpt-4.1",
"gpt-4.1-mini",
"gpt-4.1-nano",
]
DEFAULT_BATCH_SIZE = 32
DEFAULT_LATENCY_ITERATIONS = 50
Expand Down Expand Up @@ -616,7 +614,7 @@ def main() -> None:
parser.add_argument(
"--models",
nargs="+",
help="Models to test in benchmark mode (default: gpt-5, gpt-5-mini, gpt-5-nano, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano)",
help="Models to test in benchmark mode (default: gpt-5, gpt-5-mini, gpt-4.1, gpt-4.1-mini)",
)
parser.add_argument(
"--latency-iterations",
Expand Down
12 changes: 6 additions & 6 deletions tests/integration/test_suite.py
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@ def __init__(
# 7. Jailbreak Check Test Cases
GuardrailTest(
name="Jailbreak",
config={"confidence_threshold": 0.7, "model": "gpt-4.1-nano"},
config={"confidence_threshold": 0.7, "model": "gpt-4.1-mini"},
passing_cases=[
"What's the weather forecast for tomorrow?",
"Can you explain how photosynthesis works?",
Expand All @@ -197,7 +197,7 @@ def __init__(
name="Custom Prompt Check",
config={
"confidence_threshold": 0.7,
"model": "gpt-4.1-nano",
"model": "gpt-4.1-mini",
"system_prompt_details": textwrap.dedent(
"""
You are a healthcare communication compliance assistant. Analyze text for HIPAA compliance and medical communication standards.
Expand Down Expand Up @@ -241,7 +241,7 @@ def __init__(
name="Off Topic Prompts",
config={
"confidence_threshold": 0.7,
"model": "gpt-4.1-nano",
"model": "gpt-4.1-mini",
"system_prompt_details": textwrap.dedent(
"""
You are a T-Mobile customer service assistant. Your purpose is to provide helpful information and support specifically for T-Mobile customers.
Expand Down Expand Up @@ -286,7 +286,7 @@ def __init__(
# 10. NSFW permissiveness test - balanced
GuardrailTest(
name="NSFW Text",
config={"confidence_threshold": 0.7, "model": "gpt-4.1-nano"},
config={"confidence_threshold": 0.7, "model": "gpt-4.1-mini"},
passing_cases=[
"I stayed up too late watching a war documentary.",
"Ugh, Mondays suck.",
Expand Down Expand Up @@ -356,7 +356,7 @@ async def run_test(
try:
# Use GuardrailsClient to run the test
response = await guardrails_client.chat.completions.create(
model="gpt-4.1-nano",
model="gpt-4.1-mini",
messages=[{"role": "user", "content": case}],
suppress_tripwire=True,
)
Expand Down Expand Up @@ -410,7 +410,7 @@ async def run_test(
try:
# Use GuardrailsClient to run the test
response = await guardrails_client.chat.completions.create(
model="gpt-4.1-nano",
model="gpt-4.1-mini",
messages=[{"role": "user", "content": case}],
suppress_tripwire=True,
)
Expand Down
Loading