Skip to content

Conversation

steven10a
Copy link
Collaborator

@steven10a steven10a commented Oct 16, 2025

  • Updated the system prompt for the prompt injection detection
  • Updated with new eval results
  • No longer use last_checked_index, use all of the llm actions since the last user message for more context
  • Updated the eval tool to run prompt injection in multi-turn incrementally running each step

@Copilot Copilot AI review requested due to automatic review settings October 16, 2025 01:09
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Updates prompt injection detection to use richer context and incremental support, and aligns evals with the new behavior.

  • Revamps the prompt and analysis logic to evaluate all LLM actions since the last user message, removing last_checked_index usage.
  • Adds incremental guardrail execution in the async engine for the Prompt Injection Detection guardrail, including conversation parsing.
  • Updates unit tests and expected observations accordingly.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

File Description
src/evals/core/async-engine.ts Runs the prompt injection guardrail with incremental history slices; adds conversation parsing and serialization helpers.
src/checks/prompt_injection_detection.ts Replaces prompt, reworks parsing and selection of actionable messages, builds new analysis prompt, and removes last_checked_index logic.
src/tests/unit/prompt_injection_detection.test.ts Adjusts tests to new skip messages and confirms no lastCheckedIndex updates.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@steven10a steven10a requested review from Copilot and removed request for Copilot October 16, 2025 15:07
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 6 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@steven10a steven10a requested a review from Copilot October 16, 2025 15:37
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

- **Data type**: Internal synthetic dataset simulating realistic agent traces
- **Test scenarios**: Multi-turn conversations with function calls and tool outputs
- **Synthetic dataset**: 1,000 samples with 500 positive cases (50% prevalence) simulating realistic agent traces
- **AgentDojo dataset**: 1,046 samples from AgentDojo's workspace, travel, banking, and Slack suite combined with the "important_instructions" attack (949 positive cases, 97 negative samples)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a follow-up PR, could you include a link to this dataset?

@gabor-openai gabor-openai merged commit d75cfb3 into main Oct 16, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants