-
Notifications
You must be signed in to change notification settings - Fork 3
Update PI sys prompt and new eval #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Updates prompt injection detection to use richer context and incremental support, and aligns evals with the new behavior.
- Revamps the prompt and analysis logic to evaluate all LLM actions since the last user message, removing last_checked_index usage.
- Adds incremental guardrail execution in the async engine for the Prompt Injection Detection guardrail, including conversation parsing.
- Updates unit tests and expected observations accordingly.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
File | Description |
---|---|
src/evals/core/async-engine.ts | Runs the prompt injection guardrail with incremental history slices; adds conversation parsing and serialization helpers. |
src/checks/prompt_injection_detection.ts | Replaces prompt, reworks parsing and selection of actionable messages, builds new analysis prompt, and removes last_checked_index logic. |
src/tests/unit/prompt_injection_detection.test.ts | Adjusts tests to new skip messages and confirms no lastCheckedIndex updates. |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 7 out of 8 changed files in this pull request and generated 6 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
- **Data type**: Internal synthetic dataset simulating realistic agent traces | ||
- **Test scenarios**: Multi-turn conversations with function calls and tool outputs | ||
- **Synthetic dataset**: 1,000 samples with 500 positive cases (50% prevalence) simulating realistic agent traces | ||
- **AgentDojo dataset**: 1,046 samples from AgentDojo's workspace, travel, banking, and Slack suite combined with the "important_instructions" attack (949 positive cases, 97 negative samples) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a follow-up PR, could you include a link to this dataset?
Uh oh!
There was an error while loading. Please reload this page.