Update PI sys prompt and new eval #16

steven10a · 2025-10-16T01:09:00Z

Updated the system prompt for the prompt injection detection
Updated with new eval results
No longer use last_checked_index, use all of the llm actions since the last user message for more context
Updated the eval tool to run prompt injection in multi-turn incrementally running each step

Copilot

Pull Request Overview

Updates prompt injection detection to use richer context and incremental support, and aligns evals with the new behavior.

Revamps the prompt and analysis logic to evaluate all LLM actions since the last user message, removing last_checked_index usage.
Adds incremental guardrail execution in the async engine for the Prompt Injection Detection guardrail, including conversation parsing.
Updates unit tests and expected observations accordingly.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

File	Description
src/evals/core/async-engine.ts	Runs the prompt injection guardrail with incremental history slices; adds conversation parsing and serialization helpers.
src/checks/prompt_injection_detection.ts	Replaces prompt, reworks parsing and selection of actionable messages, builds new analysis prompt, and removes last_checked_index logic.
src/tests/unit/prompt_injection_detection.test.ts	Adjusts tests to new skip messages and confirms no lastCheckedIndex updates.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/checks/prompt_injection_detection.ts

src/evals/core/async-engine.ts

src/checks/prompt_injection_detection.ts

Copilot

Pull Request Overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 6 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/checks/prompt_injection_detection.ts

src/evals/core/async-engine.ts

src/checks/prompt_injection_detection.ts

src/evals/core/async-engine.ts

docs/ref/checks/prompt_injection_detection.md

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

gabor-openai · 2025-10-16T17:38:29Z

docs/ref/checks/prompt_injection_detection.md

- **Data type**: Internal synthetic dataset simulating realistic agent traces
- **Test scenarios**: Multi-turn conversations with function calls and tool outputs
+- **Synthetic dataset**: 1,000 samples with 500 positive cases (50% prevalence) simulating realistic agent traces
+- **AgentDojo dataset**: 1,046 samples from AgentDojo's workspace, travel, banking, and Slack suite combined with the "important_instructions" attack (949 positive cases, 97 negative samples)


In a follow-up PR, could you include a link to this dataset?

update PI sys prompt and new eval

d99a3ae

Copilot AI review requested due to automatic review settings October 16, 2025 01:09

removed last checked index

e5471b0

Copilot AI reviewed Oct 16, 2025

View reviewed changes

addressing copilot comments

666f4ee

steven10a requested review from Copilot and gabor-openai October 16, 2025 14:16

Updating PI docs with results

3ff410a

steven10a requested review from Copilot and removed request for Copilot October 16, 2025 15:07

Copilot AI reviewed Oct 16, 2025

View reviewed changes

extract constants, added conversation util

eba2e7c

steven10a requested a review from Copilot October 16, 2025 15:37

Copilot AI reviewed Oct 16, 2025

View reviewed changes

gabor-openai approved these changes Oct 16, 2025

View reviewed changes

gabor-openai merged commit d75cfb3 into main Oct 16, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update PI sys prompt and new eval #16

Update PI sys prompt and new eval #16

Uh oh!

steven10a commented Oct 16, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

gabor-openai Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Update PI sys prompt and new eval #16

Update PI sys prompt and new eval #16

Uh oh!

Conversation

steven10a commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

gabor-openai Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

steven10a commented Oct 16, 2025 •

edited

Loading