parameterize LLM returning reasoning #54

steven10a · 2025-12-10T22:19:35Z

Allow users to toggle reason on and off for the LLM based guardrails via the config file

- include_reasoning (optional): Whether to include reasoning/explanation fields in the guardrail output (default: false)
  - When false: The LLM only generates the essential fields (flagged and confidence), reducing token generation costs
  - When true: Additionally, returns detailed reasoning for its decisions
  - Use Case: Keep disabled for production to minimize costs; enable for development and debugging

Updated docs and tests

Copilot

Pull request overview

This pull request adds the include_reasoning configuration parameter to LLM-based guardrails, allowing users to toggle detailed reasoning output on or off. The feature defaults to false to minimize token generation costs in production, while enabling it for development and debugging provides detailed explanations of guardrail decisions.

Key changes:

Added include_reasoning boolean parameter (default: false) to LLMConfig base configuration
Implemented conditional reasoning output across all LLM guardrails, with custom reasoning fields for specialized guardrails
Updated documentation across all affected guardrails with consistent descriptions and examples

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/checks/llm-base.ts	Added `include_reasoning` config field and `LLMReasoningOutput` schema; updated `createLLMCheckFn` to conditionally select output model based on reasoning parameter
src/checks/user-defined-llm.ts	Removed custom output schema; now delegates to `createLLMCheckFn` for automatic reasoning handling
src/checks/topical-alignment.ts	Removed custom output schema reference; uses base implementation for reasoning
src/checks/nsfw.ts	Updated to use automatic reasoning handling via base implementation
src/checks/jailbreak.ts	Added conditional output model selection based on `include_reasoning` parameter
src/checks/prompt_injection_detection.ts	Added `include_reasoning` config field with custom reasoning fields (`observation` and `evidence`); implements conditional prompt building and result field inclusion
src/checks/hallucination-detection.ts	Added `include_reasoning` config field with custom reasoning fields (`reasoning`, `hallucination_type`, `hallucinated_statements`, `verified_statements`); implements conditional prompt building and result field inclusion
src/tests/unit/llm-base.test.ts	Added comprehensive tests for `include_reasoning` config and `LLMReasoningOutput` schema
src/tests/unit/prompt_injection_detection.test.ts	Added tests verifying inclusion/exclusion of reasoning fields based on config; updated existing tests to enable reasoning
src/tests/unit/checks/user-defined-llm.test.ts	Updated test to verify reasoning field support when enabled
src/tests/unit/checks/jailbreak.test.ts	Updated tests to enable `include_reasoning` for consistency
src/tests/unit/checks/hallucination-detection.test.ts	New comprehensive test file covering `include_reasoning` behavior, error handling, and tripwire logic
docs/ref/checks/prompt_injection_detection.md	Updated with `include_reasoning` parameter documentation and output field descriptions
docs/ref/checks/off_topic_prompts.md	Added `include_reasoning` parameter documentation and updated output field descriptions
docs/ref/checks/nsfw.md	Added `include_reasoning` parameter documentation
docs/ref/checks/llm_base.md	Added `include_reasoning` parameter documentation to base config
docs/ref/checks/jailbreak.md	Added `include_reasoning` parameter documentation
docs/ref/checks/hallucination_detection.md	Added `include_reasoning` parameter documentation with example and detailed field descriptions
docs/ref/checks/custom_prompt_check.md	Added `include_reasoning` parameter documentation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/checks/prompt_injection_detection.ts

src/checks/jailbreak.ts

steven10a · 2025-12-10T22:29:35Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

src/checks/prompt_injection_detection.ts

steven10a · 2025-12-10T22:57:21Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

src/checks/prompt_injection_detection.ts

Copilot

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

steven10a · 2025-12-10T23:09:25Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

src/checks/prompt_injection_detection.ts

steven10a · 2025-12-10T23:19:37Z

@codex review

chatgpt-codex-connector · 2025-12-10T23:25:49Z

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

gabor-openai

TY!

gabor-openai · 2025-12-12T17:55:01Z

docs/ref/checks/custom_prompt_check.md

 - **`model`** (required): Model to use for the check (e.g., "gpt-5")
 - **`confidence_threshold`** (required): Minimum confidence score to trigger tripwire (0.0 to 1.0)
 - **`system_prompt_details`** (required): Custom instructions defining the content detection criteria
+- **`include_reasoning`** (optional): Whether to include reasoning/explanation fields in the guardrail output (default: `false`)


Can we include something about how this influences classifier performance?

parameterize LLM returning reasoning

628aa82

Copilot AI review requested due to automatic review settings December 10, 2025 22:19

Copilot started reviewing on behalf of steven10a December 10, 2025 22:20 View session

Copilot AI reviewed Dec 10, 2025

View reviewed changes

src/checks/prompt_injection_detection.ts Outdated Show resolved Hide resolved

src/checks/jailbreak.ts Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Dec 10, 2025

View reviewed changes

src/checks/prompt_injection_detection.ts Outdated Show resolved Hide resolved

Preserve reason field in error fallback message

715df1c

steven10a requested a review from Copilot December 10, 2025 22:57

Copilot started reviewing on behalf of steven10a December 10, 2025 22:57 View session

chatgpt-codex-connector bot reviewed Dec 10, 2025

View reviewed changes

src/checks/prompt_injection_detection.ts Outdated Show resolved Hide resolved

Copilot AI reviewed Dec 10, 2025

View reviewed changes

Making new param optional

4bda9b6

chatgpt-codex-connector bot reviewed Dec 10, 2025

View reviewed changes

src/checks/prompt_injection_detection.ts Show resolved Hide resolved

Fix prompt injection reporting errors

b1a75bc

steven10a requested a review from gabor-openai December 10, 2025 23:27

gabor-openai approved these changes Dec 12, 2025

View reviewed changes

gabor-openai merged commit 43d9f2b into main Dec 12, 2025
1 check passed

parameterize LLM returning reasoning #54

parameterize LLM returning reasoning #54

Uh oh!

Conversation

steven10a commented Dec 10, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

steven10a commented Dec 10, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

steven10a commented Dec 10, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

steven10a commented Dec 10, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

steven10a commented Dec 10, 2025

Uh oh!

chatgpt-codex-connector bot commented Dec 10, 2025

Uh oh!

gabor-openai left a comment

Choose a reason for hiding this comment

Uh oh!

gabor-openai Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants