-
Notifications
You must be signed in to change notification settings - Fork 9
Pass Agent level guardrails conversation history #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
steven10a
commented
Nov 19, 2025
- Pass conversation history to guardrails that need them when using Agents
- Further optimized JB system prompt
- Updated tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances guardrail functionality by passing conversation history to guardrails that require it when using Agents, optimizes the jailbreak detection system prompt with clearer banned content categories, and updates corresponding tests.
- Adds dual access pattern for conversation history (both property and method) for improved compatibility
- Optimizes performance by conditionally loading conversation history only when needed
- Expands jailbreak system prompt with explicit banned content categories
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/types.ts | Adds conversationHistory property to GuardrailLLMContextWithHistory interface for direct access alongside existing method accessor |
| src/evals/core/async-engine.ts | Introduces extractTextFromContent helper, removes prompt injection-specific logic, and adds conversation history extraction for non-conversation-aware guardrails |
| src/checks/jailbreak.ts | Adds comprehensive "BANNED CONTENT CATEGORIES" section to system prompt for clearer detection guidance |
| src/base-client.ts | Updates createContextWithConversation to expose conversation history via both property and method |
| src/agents.ts | Refactors context creation and adds optimization to conditionally load conversation history |
| src/tests/unit/evals/async-engine.test.ts | Adds test coverage for multi-part content extraction in non-conversation-aware guardrails |
| src/tests/unit/base-client.test.ts | Adds test verifying dual access pattern for conversation history |
| src/tests/unit/agents.test.ts | Updates test to mark guardrail as conversation-aware to trigger proper context creation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review |
|
Codex Review: Didn't find any major issues. You're on a roll. ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |