-
Notifications
You must be signed in to change notification settings - Fork 11
parameterize LLM returning reasoning #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds the include_reasoning configuration parameter to LLM-based guardrails, allowing users to toggle detailed reasoning output on or off. The feature defaults to false to minimize token generation costs in production, while enabling it for development and debugging provides detailed explanations of guardrail decisions.
Key changes:
- Added
include_reasoningboolean parameter (default:false) toLLMConfigbase configuration - Implemented conditional reasoning output across all LLM guardrails, with custom reasoning fields for specialized guardrails
- Updated documentation across all affected guardrails with consistent descriptions and examples
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/checks/llm-base.ts | Added include_reasoning config field and LLMReasoningOutput schema; updated createLLMCheckFn to conditionally select output model based on reasoning parameter |
| src/checks/user-defined-llm.ts | Removed custom output schema; now delegates to createLLMCheckFn for automatic reasoning handling |
| src/checks/topical-alignment.ts | Removed custom output schema reference; uses base implementation for reasoning |
| src/checks/nsfw.ts | Updated to use automatic reasoning handling via base implementation |
| src/checks/jailbreak.ts | Added conditional output model selection based on include_reasoning parameter |
| src/checks/prompt_injection_detection.ts | Added include_reasoning config field with custom reasoning fields (observation and evidence); implements conditional prompt building and result field inclusion |
| src/checks/hallucination-detection.ts | Added include_reasoning config field with custom reasoning fields (reasoning, hallucination_type, hallucinated_statements, verified_statements); implements conditional prompt building and result field inclusion |
| src/tests/unit/llm-base.test.ts | Added comprehensive tests for include_reasoning config and LLMReasoningOutput schema |
| src/tests/unit/prompt_injection_detection.test.ts | Added tests verifying inclusion/exclusion of reasoning fields based on config; updated existing tests to enable reasoning |
| src/tests/unit/checks/user-defined-llm.test.ts | Updated test to verify reasoning field support when enabled |
| src/tests/unit/checks/jailbreak.test.ts | Updated tests to enable include_reasoning for consistency |
| src/tests/unit/checks/hallucination-detection.test.ts | New comprehensive test file covering include_reasoning behavior, error handling, and tripwire logic |
| docs/ref/checks/prompt_injection_detection.md | Updated with include_reasoning parameter documentation and output field descriptions |
| docs/ref/checks/off_topic_prompts.md | Added include_reasoning parameter documentation and updated output field descriptions |
| docs/ref/checks/nsfw.md | Added include_reasoning parameter documentation |
| docs/ref/checks/llm_base.md | Added include_reasoning parameter documentation to base config |
| docs/ref/checks/jailbreak.md | Added include_reasoning parameter documentation |
| docs/ref/checks/hallucination_detection.md | Added include_reasoning parameter documentation with example and detailed field descriptions |
| docs/ref/checks/custom_prompt_check.md | Added include_reasoning parameter documentation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 19 out of 19 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review |
|
Codex Review: Didn't find any major issues. 🚀 ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
gabor-openai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TY!
| - **`model`** (required): Model to use for the check (e.g., "gpt-5") | ||
| - **`confidence_threshold`** (required): Minimum confidence score to trigger tripwire (0.0 to 1.0) | ||
| - **`system_prompt_details`** (required): Custom instructions defining the content detection criteria | ||
| - **`include_reasoning`** (optional): Whether to include reasoning/explanation fields in the guardrail output (default: `false`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we include something about how this influences classifier performance?
Allow users to toggle
reasonon and off for the LLM based guardrails via the config fileinclude_reasoning(optional): Whether to include reasoning/explanation fields in the guardrail output (default:false)false: The LLM only generates the essential fields (flaggedandconfidence), reducing token generation coststrue: Additionally, returns detailed reasoning for its decisionsUpdated docs and tests