Skip to content

Conversation

@steven10a
Copy link
Collaborator

@steven10a steven10a commented Nov 5, 2025

Update PII guardrail to be more robust against adversarial inputs

  • Implemented Unicode normalization
  • Added the ability to decode Base64, URL, and Hex encoded content for entity detection
  • Accept an optional parameter detect_encoded_pii that can be set via the config to turn this on
  • Added CVV/BIC detection and email-in-URL contexts
  • Updated tests
  • Updated docs
  • Removed checked_field from result object for guardrails that don't use it

Copilot AI review requested due to automatic review settings November 5, 2025 18:04
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the PII detection guardrail with advanced detection capabilities including Unicode normalization, encoded PII detection (Base64, URL-encoded, hex), and additional entity types (CVV, BIC_SWIFT, KR_RRN). The implementation refactors the detection logic from a simple pattern-matching approach to a more robust system supporting multiple patterns per entity and prioritized span replacement.

  • Adds encoded PII detection with configurable detect_encoded_pii flag
  • Introduces Unicode normalization (NFKC) and zero-width character stripping to prevent obfuscation bypasses
  • Adds new PII entity types: CVV (credit card security codes), BIC_SWIFT (bank codes), and KR_RRN (Korean Resident Registration Numbers)

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/checks/pii.ts Core implementation: refactored detection logic to support multiple patterns per entity, added encoded PII detection functions, Unicode normalization, and new entity patterns
src/tests/unit/checks/pii.test.ts Comprehensive test coverage for new features including KR_RRN validation, Unicode normalization, encoded PII detection, and new entity types
examples/basic/pii_mask_example.ts New interactive example demonstrating PII masking in pre-flight stage and blocking in output stage with encoded PII detection enabled
docs/ref/checks/pii.md Updated documentation describing new detection capabilities, encoded PII feature, and expanded entity list
.gitignore Minor comment clarification

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@steven10a steven10a requested a review from Copilot November 5, 2025 21:22
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 36 out of 36 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@steven10a steven10a requested a review from Copilot November 5, 2025 21:48
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 36 out of 36 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

anonymized_text: checkedText, // Legacy compatibility
checked_text: checkedText, // Primary field for preflight modifications
anonymized_text: checkedText,
checked_text: checkedText,
Copy link

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checked_text field is being set unconditionally in the PII guardrail, but according to the PR's purpose (making checked_text optional and only including it when content is modified), this should only be included when hasPii is true. Currently, it's set to the original text when no PII is found, which contradicts the design goal of omitting checked_text when no modifications occur.

Suggested change
checked_text: checkedText,
...(hasPii ? { checked_text: checkedText } : {}),

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WAI. The design is to always use the checked_text field

Comment on lines +205 to +207
if (typeof result.info.checked_text === 'string' && !maskedTextOverride) {
maskedTextOverride = result.info.checked_text;
}
Copy link

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition !maskedTextOverride means only the first preflight result with checked_text will be used. If multiple PII guardrails run in sequence and both mask different entities, later masks will be ignored. This could lead to incomplete PII masking. Consider either merging masked texts or documenting this first-wins behavior.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WAI. There will not be multiple PII guardrails running

@steven10a steven10a requested a review from Copilot November 5, 2025 22:34
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 36 out of 36 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@gabor-openai gabor-openai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gabor-openai gabor-openai merged commit 54e5806 into main Nov 6, 2025
7 checks passed
@steven10a steven10a deleted the dev/steven/pii_update branch November 10, 2025 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants