Skip to content

feat: add Turkish National ID (TR_NATIONAL_ID) recognizer#1995

Merged
SharonHart merged 3 commits into
microsoft:mainfrom
mrcuren:feat/turkey-national-id-recognizer
Apr 23, 2026
Merged

feat: add Turkish National ID (TR_NATIONAL_ID) recognizer#1995
SharonHart merged 3 commits into
microsoft:mainfrom
mrcuren:feat/turkey-national-id-recognizer

Conversation

@mrcuren
Copy link
Copy Markdown
Contributor

@mrcuren mrcuren commented Apr 20, 2026

Adds Turkish National ID (TCKN / TC Kimlik Numarası) recognizer to Presidio Analyzer.

The TCKN is an 11-digit identification number issued to Turkish citizens and foreign residents. It is the primary PII under Turkish KVKK (Personal Data Protection Law).

Features:

  • Pattern recognition for 11-digit numbers starting with 1-9
  • Algorithmic validation of 10th and 11th digits per official NVI specification
  • Context words for higher confidence detection (TC Kimlik, TCKN, etc.)
  • Disabled by default as per country-specific recognizer guidelines

Issue reference

Part of #1973

Testing

  • Added test_tr_national_id_recognizer.py with >=90% coverage on changed lines
  • Tests include valid checksums, invalid checksums, and false positive checks (short/long inputs, non-digits)
  • All existing tests continue to pass

Checklist

  • I have reviewed the contribution guidelines
  • My code follows the project style guidelines (ruff, pytest)
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have updated the CHANGELOG.md under the Unreleased section
  • I have updated the supported_entities.md documentation
  • I have added my recognizer to default_recognizers.yaml with enabled: false
  • I have added my recognizer to __init__.py and __all__

- Add TrNationalIdRecognizer with NVI checksum validation
- Add country_specific/turkey/ directory structure
- Add unit tests with valid/invalid TCKN cases
- Update default_recognizers.yaml, __init__.py, supported_entities.md, CHANGELOG.md

Part of microsoft#1973
@mrcuren
Copy link
Copy Markdown
Contributor Author

mrcuren commented Apr 20, 2026

@microsoft-github-policy-service agree

@mrcuren
Copy link
Copy Markdown
Contributor Author

mrcuren commented Apr 20, 2026

Thanks for the CLA approval.

This PR is now ready for review. Happy to make any changes if needed, especially around naming or checksum validation logic.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new country-specific predefined recognizer to Presidio Analyzer for detecting Turkish National Identification Numbers (TCKN / TR_NATIONAL_ID), including checksum validation and registration in the default recognizer configuration and public docs.

Changes:

  • Introduces TrNationalIdRecognizer with regex detection + checksum validation and context terms.
  • Registers the recognizer in predefined_recognizers/__init__.py and conf/default_recognizers.yaml (disabled by default).
  • Adds unit tests and updates supported_entities.md and CHANGELOG.md.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
presidio-analyzer/presidio_analyzer/predefined_recognizers/country_specific/turkey/tr_national_id_recognizer.py Implements the TR national ID recognizer (pattern + checksum validation + context).
presidio-analyzer/presidio_analyzer/predefined_recognizers/country_specific/turkey/init.py Exposes the Turkey-specific recognizer module.
presidio-analyzer/presidio_analyzer/predefined_recognizers/init.py Exports TrNationalIdRecognizer from the predefined recognizers package.
presidio-analyzer/presidio_analyzer/conf/default_recognizers.yaml Adds recognizer to default registry (disabled by default).
presidio-analyzer/tests/test_tr_national_id_recognizer.py Adds unit tests for detection and checksum validation.
docs/supported_entities.md Documents TR_NATIONAL_ID under a new “Turkey” section.
CHANGELOG.md Notes the addition under the Unreleased Analyzer additions.

Comment thread presidio-analyzer/tests/test_tr_national_id_recognizer.py
@mrcuren
Copy link
Copy Markdown
Contributor Author

mrcuren commented Apr 22, 2026

@copilot Thank you for the review!

Regarding the NVI algorithm source: I've added a reference to the official NVI portal (https://tckimlik.nvi.gov.tr/) in the docstring at line 15-16 to provide a concrete source for the checksum validation logic.

Regarding the false-positive tests: I've updated the PR description to remove the phone number reference since those test cases aren't included in the test suite. The current tests cover the essential false-positive scenarios (short/long inputs, non-digits, invalid checksums).

Happy to make any additional adjustments if needed!

@SharonHart SharonHart merged commit 286af66 into microsoft:main Apr 23, 2026
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants