openai · gabor-openai · Dec 12, 2025 · Dec 10, 2025 · Dec 10, 2025 · Dec 10, 2025
diff --git a/docs/ref/checks/custom_prompt_check.md b/docs/ref/checks/custom_prompt_check.md
@@ -10,7 +10,8 @@ Implements custom content checks using configurable LLM prompts. Uses your custo
     "config": {
         "model": "gpt-5",
         "confidence_threshold": 0.7,
-        "system_prompt_details": "Determine if the user's request needs to be escalated to a senior support agent. Indications of escalation include: ..."
+        "system_prompt_details": "Determine if the user's request needs to be escalated to a senior support agent. Indications of escalation include: ...",
+        "include_reasoning": false
     }
 }
 ```
@@ -20,6 +21,10 @@ Implements custom content checks using configurable LLM prompts. Uses your custo
 - **`model`** (required): Model to use for the check (e.g., "gpt-5")
 - **`confidence_threshold`** (required): Minimum confidence score to trigger tripwire (0.0 to 1.0)
 - **`system_prompt_details`** (required): Custom instructions defining the content detection criteria
+- **`include_reasoning`** (optional): Whether to include reasoning/explanation fields in the guardrail output (default: `false`)
+    - When `false`: The LLM only generates the essential fields (`flagged` and `confidence`), reducing token generation costs
+    - When `true`: Additionally, returns detailed reasoning for its decisions
+    - **Use Case**: Keep disabled for production to minimize costs; enable for development and debugging
 
 ## Implementation Notes
 
@@ -42,3 +47,4 @@ Returns a `GuardrailResult` with the following `info` dictionary:
 - **`flagged`**: Whether the custom validation criteria were met
 - **`confidence`**: Confidence score (0.0 to 1.0) for the validation
 - **`threshold`**: The confidence threshold that was configured
+- **`reason`**: Explanation of why the input was flagged (or not flagged) - *only included when `include_reasoning=true`*
diff --git a/docs/ref/checks/hallucination_detection.md b/docs/ref/checks/hallucination_detection.md
@@ -14,7 +14,8 @@ Flags model text containing factual claims that are clearly contradicted or not
     "config": {
         "model": "gpt-4.1-mini",
         "confidence_threshold": 0.7,
-        "knowledge_source": "vs_abc123"
+        "knowledge_source": "vs_abc123",
+        "include_reasoning": false
     }
 }
 ```
@@ -24,6 +25,10 @@ Flags model text containing factual claims that are clearly contradicted or not
 - **`model`** (required): OpenAI model (required) to use for validation (e.g., "gpt-4.1-mini")
 - **`confidence_threshold`** (required): Minimum confidence score to trigger tripwire (0.0 to 1.0)
 - **`knowledge_source`** (required): OpenAI vector store ID starting with "vs_" containing reference documents
+- **`include_reasoning`** (optional): Whether to include detailed reasoning fields in the output (default: `false`)
+    - When `false`: Returns only `flagged` and `confidence` to save tokens
+    - When `true`: Additionally, returns `reasoning`, `hallucination_type`, `hallucinated_statements`, and `verified_statements`
+    - Recommended: Keep disabled for production (default); enable for development/debugging
 
 ### Tuning guidance
 
@@ -103,7 +108,9 @@ See [`examples/`](https://github.com/openai/openai-guardrails-js/tree/main/examp
 
 ## What It Returns
 
-Returns a `GuardrailResult` with the following `info` dictionary:
+Returns a `GuardrailResult` with the following `info` dictionary.
+
+**With `include_reasoning=true`:**
 
 ```json
 {
@@ -118,15 +125,15 @@ Returns a `GuardrailResult` with the following `info` dictionary:
 }
 ```
 
+### Fields
+
 - **`flagged`**: Whether the content was flagged as potentially hallucinated
 - **`confidence`**: Confidence score (0.0 to 1.0) for the detection
-- **`reasoning`**: Explanation of why the content was flagged
-- **`hallucination_type`**: Type of issue detected (e.g., "factual_error", "unsupported_claim")
-- **`hallucinated_statements`**: Specific statements that are contradicted or unsupported
-- **`verified_statements`**: Statements that are supported by your documents
 - **`threshold`**: The confidence threshold that was configured
-
-Tip: `hallucination_type` is typically one of `factual_error`, `unsupported_claim`, or `none`.
+- **`reasoning`**: Explanation of why the content was flagged - *only included when `include_reasoning=true`*
+- **`hallucination_type`**: Type of issue detected (e.g., "factual_error", "unsupported_claim", "none") - *only included when `include_reasoning=true`*
+- **`hallucinated_statements`**: Specific statements that are contradicted or unsupported - *only included when `include_reasoning=true`*
+- **`verified_statements`**: Statements that are supported by your documents - *only included when `include_reasoning=true`*
 
 ## Benchmark Results
 

diff --git a/docs/ref/checks/jailbreak.md b/docs/ref/checks/jailbreak.md
@@ -33,7 +33,8 @@ Detects attempts to bypass safety or policy constraints via manipulation (prompt
     "name": "Jailbreak",
     "config": {
         "model": "gpt-4.1-mini",
-        "confidence_threshold": 0.7
+        "confidence_threshold": 0.7,
+        "include_reasoning": false
     }
 }
 ```
@@ -42,6 +43,10 @@ Detects attempts to bypass safety or policy constraints via manipulation (prompt
 
 - **`model`** (required): Model to use for detection (e.g., "gpt-4.1-mini")
 - **`confidence_threshold`** (required): Minimum confidence score to trigger tripwire (0.0 to 1.0)
+- **`include_reasoning`** (optional): Whether to include reasoning/explanation fields in the guardrail output (default: `false`)
+    - When `false`: The LLM only generates the essential fields (`flagged` and `confidence`), reducing token generation costs
+    - When `true`: Additionally, returns detailed reasoning for its decisions
+    - **Use Case**: Keep disabled for production to minimize costs; enable for development and debugging
 
 ### Tuning guidance
 
@@ -68,7 +73,7 @@ Returns a `GuardrailResult` with the following `info` dictionary:
 - **`flagged`**: Whether a jailbreak attempt was detected
 - **`confidence`**: Confidence score (0.0 to 1.0) for the detection
 - **`threshold`**: The confidence threshold that was configured
-- **`reason`**: Natural language rationale describing why the request was (or was not) flagged
+- **`reason`**: Natural language rationale describing why the request was (or was not) flagged - *only included when `include_reasoning=true`*
 - **`used_conversation_history`**: Indicates whether prior conversation turns were included
 - **`checked_text`**: JSON payload containing the conversation slice and latest input analyzed
 

diff --git a/docs/ref/checks/llm_base.md b/docs/ref/checks/llm_base.md
@@ -11,7 +11,8 @@ Base configuration for LLM-based guardrails. Provides common configuration optio
     "name": "NSFW Text",  // or "Jailbreak", "Hallucination Detection", etc.
     "config": {
         "model": "gpt-5",
-        "confidence_threshold": 0.7
+        "confidence_threshold": 0.7,
+        "include_reasoning": false
     }
 }
 ```
@@ -20,6 +21,10 @@ Base configuration for LLM-based guardrails. Provides common configuration optio
 
 - **`model`** (required): OpenAI model to use for the check (e.g., "gpt-5")
 - **`confidence_threshold`** (required): Minimum confidence score to trigger tripwire (0.0 to 1.0)
+- **`include_reasoning`** (optional): Whether to include reasoning/explanation fields in the guardrail output (default: `false`)
+    - When `false`: The LLM only generates the essential fields (`flagged` and `confidence`), reducing token generation costs
+    - When `true`: Additionally, returns detailed reasoning for its decisions
+    - **Use Case**: Keep disabled for production to minimize costs; enable for development and debugging
 
 ## What It Does
 

diff --git a/docs/ref/checks/nsfw.md b/docs/ref/checks/nsfw.md
@@ -20,7 +20,8 @@ Flags workplace‑inappropriate model outputs: explicit sexual content, profanit
     "name": "NSFW Text",
     "config": {
         "model": "gpt-4.1-mini",
-        "confidence_threshold": 0.7
+        "confidence_threshold": 0.7,
+        "include_reasoning": false
     }
 }
 ```
@@ -29,6 +30,10 @@ Flags workplace‑inappropriate model outputs: explicit sexual content, profanit
 
 - **`model`** (required): Model to use for detection (e.g., "gpt-4.1-mini")
 - **`confidence_threshold`** (required): Minimum confidence score to trigger tripwire (0.0 to 1.0)
+- **`include_reasoning`** (optional): Whether to include reasoning/explanation fields in the guardrail output (default: `false`)
+    - When `false`: The LLM only generates the essential fields (`flagged` and `confidence`), reducing token generation costs
+    - When `true`: Additionally, returns detailed reasoning for its decisions
+    - **Use Case**: Keep disabled for production to minimize costs; enable for development and debugging
 
 ### Tuning guidance
 
@@ -51,6 +56,7 @@ Returns a `GuardrailResult` with the following `info` dictionary:
 - **`flagged`**: Whether NSFW content was detected
 - **`confidence`**: Confidence score (0.0 to 1.0) for the detection
 - **`threshold`**: The confidence threshold that was configured
+- **`reason`**: Explanation of why the input was flagged (or not flagged) - *only included when `include_reasoning=true`*
 
 ### Examples
 

diff --git a/docs/ref/checks/off_topic_prompts.md b/docs/ref/checks/off_topic_prompts.md
@@ -10,7 +10,8 @@ Ensures content stays within defined business scope using LLM analysis. Flags co
     "config": {
         "model": "gpt-5",
         "confidence_threshold": 0.7,
-        "system_prompt_details": "Customer support for our e-commerce platform. Topics include order status, returns, shipping, and product questions."
+        "system_prompt_details": "Customer support for our e-commerce platform. Topics include order status, returns, shipping, and product questions.",
+        "include_reasoning": false
     }
 }
 ```
@@ -20,6 +21,10 @@ Ensures content stays within defined business scope using LLM analysis. Flags co
 - **`model`** (required): Model to use for analysis (e.g., "gpt-5")
 - **`confidence_threshold`** (required): Minimum confidence score to trigger tripwire (0.0 to 1.0)
 - **`system_prompt_details`** (required): Description of your business scope and acceptable topics
+- **`include_reasoning`** (optional): Whether to include reasoning/explanation fields in the guardrail output (default: `false`)
+    - When `false`: The LLM only generates the essential fields (`flagged` and `confidence`), reducing token generation costs
+    - When `true`: Additionally, returns detailed reasoning for its decisions
+    - **Use Case**: Keep disabled for production to minimize costs; enable for development and debugging
 
 ## Implementation Notes
 
@@ -40,7 +45,7 @@ Returns a `GuardrailResult` with the following `info` dictionary:
 }
 ```
 
-- **`flagged`**: Whether the content aligns with your business scope
-- **`confidence`**: Confidence score (0.0 to 1.0) for the prompt injection detection assessment
+- **`flagged`**: Whether the content is off-topic (outside your business scope)
+- **`confidence`**: Confidence score (0.0 to 1.0) for the assessment
 - **`threshold`**: The confidence threshold that was configured
-- **`business_scope`**: Copy of the scope provided in configuration
+- **`reason`**: Explanation of why the input was flagged (or not flagged) - *only included when `include_reasoning=true`*
diff --git a/docs/ref/checks/prompt_injection_detection.md b/docs/ref/checks/prompt_injection_detection.md
@@ -31,7 +31,8 @@ After tool execution, the prompt injection detection check validates that the re
     "name": "Prompt Injection Detection",
     "config": {
         "model": "gpt-4.1-mini",
-        "confidence_threshold": 0.7
+        "confidence_threshold": 0.7,
+        "include_reasoning": false
     }
 }
 ```
@@ -40,6 +41,10 @@ After tool execution, the prompt injection detection check validates that the re
 
 - **`model`** (required): Model to use for prompt injection detection analysis (e.g., "gpt-4.1-mini")
 - **`confidence_threshold`** (required): Minimum confidence score to trigger tripwire (0.0 to 1.0)
+- **`include_reasoning`** (optional): Whether to include detailed reasoning fields (`observation` and `evidence`) in the output (default: `false`)
+    - When `false`: Returns only `flagged` and `confidence` to save tokens
+    - When `true`: Additionally, returns `observation` and `evidence` fields
+    - Recommended: Keep disabled for production (default); enable for development/debugging
 
 **Flags as MISALIGNED:**
 
@@ -85,15 +90,15 @@ Returns a `GuardrailResult` with the following `info` dictionary:
 }
 ```
 
-- **`observation`**: What the AI action is doing
 - **`flagged`**: Whether the action is misaligned (boolean)
 - **`confidence`**: Confidence score (0.0 to 1.0) that the action is misaligned
-- **`evidence`**: Specific evidence from conversation history that supports the decision (null when aligned)
 - **`threshold`**: The confidence threshold that was configured
 - **`user_goal`**: The tracked user intent from conversation
 - **`action`**: The list of function calls or tool outputs analyzed for alignment
 - **`recent_messages`**: Most recent conversation slice evaluated during the check
 - **`recent_messages_json`**: JSON-serialized snapshot of the recent conversation slice
+- **`observation`**: What the AI action is doing - *only included when `include_reasoning=true`*
+- **`evidence`**: Specific evidence from conversation history that supports the decision (null when aligned) - *only included when `include_reasoning=true`*
 
 ## Benchmark Results