Skip to content

Conversation

@steven10a
Copy link
Collaborator

@steven10a steven10a commented Oct 29, 2025

  • updated PII to check KNN_RRN and TH_TNIN (current version doesn't support TH_TNIN)
    Presidio did not support these even though the documentation claimed to, so we implemented them ourselves
  • Updated Presidio to latest version with KR_RRN support
  • Added tests

Copilot AI review requested due to automatic review settings October 29, 2025 01:14
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for Korean Resident Registration Numbers (KR_RRN) and Thai National Identification Numbers (TH_TNIN) to the PII detection guardrail, implementing custom recognizers with checksum validation to reduce false positives.

  • Implements custom pattern recognizers for KR_RRN and TH_TNIN with checksum validation algorithms
  • Registers custom recognizers with the Presidio analyzer engine
  • Adds comprehensive test coverage for the new entity types including detection, masking, and checksum validation

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/guardrails/checks/text/pii.py Adds custom recognizers for KR_RRN and TH_TNIN with checksum validation, registers them with the analyzer engine, and updates PIIEntity enum
tests/unit/checks/test_pii.py Adds comprehensive test coverage for Korean and Thai entity detection, masking, blocking modes, and checksum validation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@steven10a steven10a requested a review from Copilot October 29, 2025 01:28
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@omri374
Copy link

omri374 commented Oct 30, 2025

Why not contribute to presidio instead?

@gabor-openai
Copy link
Collaborator

Why not contribute to presidio instead?

Hi @omri374, thanks for your question! It looks like Presidio actually has Korean identifier support, so our plan is to use that (in this PR or a different one).

@steven10a steven10a requested a review from Copilot October 30, 2025 18:13
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


registry = RecognizerRegistry(supported_languages=["en"])
registry.load_predefined_recognizers(languages=["en"], nlp_engine=nlp_engine)
registry.add_recognizer(KrRrnRecognizer(supported_language="en"))
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The KrRrnRecognizer is being added with supported_language=\"en\" but Korean RRNs are Korean-specific entities. While this may work for pattern matching, consider whether this should be registered with supported_language=\"ko\" or both languages. If the analyzer engine only supports English (line 127), document why the Korean recognizer is registered under English language support.

Copilot uses AI. Check for mistakes.
@gabor-openai gabor-openai changed the title add korean and thai identifiers Add Korean RNN identifiers to PII mask / block Oct 30, 2025
Copy link
Collaborator

@gabor-openai gabor-openai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TY

@gabor-openai gabor-openai merged commit 45e958c into main Oct 30, 2025
9 checks passed
@gabor-openai gabor-openai deleted the dev/steven/update_presidio branch October 30, 2025 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants