Skip to content

Conversation

@steven10a
Copy link
Collaborator

Allow users to toggle reason on and off for the LLM based guardrails via the config file

    • include_reasoning (optional): Whether to include reasoning/explanation fields in the guardrail output (default: false)
      • When false: The LLM only generates the essential fields (flagged and confidence), reducing token generation costs
      • When true: Additionally, returns detailed reasoning for its decisions
      • Use Case: Keep disabled for production to minimize costs; enable for development and debugging

Updated docs and tests

Copilot AI review requested due to automatic review settings December 10, 2025 22:19
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds the include_reasoning configuration parameter to LLM-based guardrails, allowing users to toggle detailed reasoning output on or off. The feature defaults to false to minimize token generation costs in production, while enabling it for development and debugging provides detailed explanations of guardrail decisions.

Key changes:

  • Added include_reasoning boolean parameter (default: false) to LLMConfig base configuration
  • Implemented conditional reasoning output across all LLM guardrails, with custom reasoning fields for specialized guardrails
  • Updated documentation across all affected guardrails with consistent descriptions and examples

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/checks/llm-base.ts Added include_reasoning config field and LLMReasoningOutput schema; updated createLLMCheckFn to conditionally select output model based on reasoning parameter
src/checks/user-defined-llm.ts Removed custom output schema; now delegates to createLLMCheckFn for automatic reasoning handling
src/checks/topical-alignment.ts Removed custom output schema reference; uses base implementation for reasoning
src/checks/nsfw.ts Updated to use automatic reasoning handling via base implementation
src/checks/jailbreak.ts Added conditional output model selection based on include_reasoning parameter
src/checks/prompt_injection_detection.ts Added include_reasoning config field with custom reasoning fields (observation and evidence); implements conditional prompt building and result field inclusion
src/checks/hallucination-detection.ts Added include_reasoning config field with custom reasoning fields (reasoning, hallucination_type, hallucinated_statements, verified_statements); implements conditional prompt building and result field inclusion
src/tests/unit/llm-base.test.ts Added comprehensive tests for include_reasoning config and LLMReasoningOutput schema
src/tests/unit/prompt_injection_detection.test.ts Added tests verifying inclusion/exclusion of reasoning fields based on config; updated existing tests to enable reasoning
src/tests/unit/checks/user-defined-llm.test.ts Updated test to verify reasoning field support when enabled
src/tests/unit/checks/jailbreak.test.ts Updated tests to enable include_reasoning for consistency
src/tests/unit/checks/hallucination-detection.test.ts New comprehensive test file covering include_reasoning behavior, error handling, and tripwire logic
docs/ref/checks/prompt_injection_detection.md Updated with include_reasoning parameter documentation and output field descriptions
docs/ref/checks/off_topic_prompts.md Added include_reasoning parameter documentation and updated output field descriptions
docs/ref/checks/nsfw.md Added include_reasoning parameter documentation
docs/ref/checks/llm_base.md Added include_reasoning parameter documentation to base config
docs/ref/checks/jailbreak.md Added include_reasoning parameter documentation
docs/ref/checks/hallucination_detection.md Added include_reasoning parameter documentation with example and detailed field descriptions
docs/ref/checks/custom_prompt_check.md Added include_reasoning parameter documentation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@steven10a
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@steven10a steven10a requested a review from Copilot December 10, 2025 22:57
@steven10a
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@steven10a
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@steven10a
Copy link
Collaborator Author

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Collaborator

@gabor-openai gabor-openai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TY!

- **`model`** (required): Model to use for the check (e.g., "gpt-5")
- **`confidence_threshold`** (required): Minimum confidence score to trigger tripwire (0.0 to 1.0)
- **`system_prompt_details`** (required): Custom instructions defining the content detection criteria
- **`include_reasoning`** (optional): Whether to include reasoning/explanation fields in the guardrail output (default: `false`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include something about how this influences classifier performance?

@gabor-openai gabor-openai merged commit 43d9f2b into main Dec 12, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants