-
Notifications
You must be signed in to change notification settings - Fork 17
Add Korean RNN identifiers to PII mask / block #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for Korean Resident Registration Numbers (KR_RRN) and Thai National Identification Numbers (TH_TNIN) to the PII detection guardrail, implementing custom recognizers with checksum validation to reduce false positives.
- Implements custom pattern recognizers for KR_RRN and TH_TNIN with checksum validation algorithms
- Registers custom recognizers with the Presidio analyzer engine
- Adds comprehensive test coverage for the new entity types including detection, masking, and checksum validation
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/guardrails/checks/text/pii.py | Adds custom recognizers for KR_RRN and TH_TNIN with checksum validation, registers them with the analyzer engine, and updates PIIEntity enum |
| tests/unit/checks/test_pii.py | Adds comprehensive test coverage for Korean and Thai entity detection, masking, blocking modes, and checksum validation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Why not contribute to presidio instead? |
Hi @omri374, thanks for your question! It looks like Presidio actually has Korean identifier support, so our plan is to use that (in this PR or a different one). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| registry = RecognizerRegistry(supported_languages=["en"]) | ||
| registry.load_predefined_recognizers(languages=["en"], nlp_engine=nlp_engine) | ||
| registry.add_recognizer(KrRrnRecognizer(supported_language="en")) |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The KrRrnRecognizer is being added with supported_language=\"en\" but Korean RRNs are Korean-specific entities. While this may work for pattern matching, consider whether this should be registered with supported_language=\"ko\" or both languages. If the analyzer engine only supports English (line 127), document why the Korean recognizer is registered under English language support.
gabor-openai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TY
KNN_RRNand(current version doesn't supportTH_TNINTH_TNIN)Presidio did not support these even though the documentation claimed to, so we implemented them ourselvesKR_RRNsupport