Fix incorrect PESEL checksum validation in PlPeselRecognizer by BlaiseCz · Pull Request #1520 · microsoft/presidio

BlaiseCz · 2025-01-31T11:29:43Z

Bug Description

The PESEL checksum validation in PlPeselRecognizer.validate_result() is incorrect. The current implementation does not correctly compute the control digit, leading to false negatives, where valid PESEL numbers are incorrectly rejected.

This affects Presidio's ability to correctly recognize and validate PESEL numbers, impacting anonymization and sensitive data detection.

To Reproduce

Run the following test:

from presidio_analyzer.predefined_recognizers import PlPeselRecognizer

pesel_recognizer = PlPeselRecognizer()

valid_pesel = "44051401359"  # This is a valid PESEL
print(pesel_recognizer.validate_result(valid_pesel))  # Expected: True, Actual: False

**Note if unsure, check this: https://kalkulatory.gofin.pl/kalkulatory/sprawdzanie-pesel-weryfikacja-pesel

Observed Behavior

The function returns False for a valid PESEL due to incorrect checksum computation.

Expected Behavior

A valid PESEL (with the correct checksum) should return True.

Root Cause: Incorrect Checksum Calculation

The issue lies in the final checksum validation step. The existing code:

checksum = sum(digit * weight for digit, weight in zip(digits[:10], weights))
checksum %= 10

return checksum == digits[10]  # ❌ Incorrect final checksum check!

This incorrectly compares checksum directly to the last digit of PESEL instead of computing the correct control digit.

Proposed Fix

The correct formula to compute the PESEL checksum is:

def validate_result(self, pattern_text: str) -> bool:  # noqa D102
    if len(pattern_text) != 11 or not pattern_text.isdigit():
        return False  # Ensure the input is a valid 11-digit number

    digits = [int(digit) for digit in pattern_text]
    weights = [1, 3, 7, 9, 1, 3, 7, 9, 1, 3]  # Correct weights

    checksum = sum(digit * weight for digit, weight in zip(digits[:10], weights)) % 10
    check_digit = (10 - checksum) % 10  # ✅ Corrected final checksum computation

    return check_digit == digits[10]  # ✅ Now correctly compares with the last digit

Why This Fix Works

Ensures the checksum modulo 10 logic is correctly applied.
Guarantees that only valid PESELs pass the validation.
Fixes the false-negative issue without introducing false positives.

Additional Context

This issue impacts Polish users relying on PESEL validation in Presidio.
The bug affects data masking and validation accuracy.
Fixing this ensures compliance with official PESEL formatting rules.

omri374 · 2025-02-04T07:54:27Z

Thanks!

omri374 · 2025-02-04T07:54:32Z

/azp run

azure-pipelines · 2025-02-04T07:54:46Z

Azure Pipelines successfully started running 1 pipeline(s).

omri374 · 2025-02-05T06:21:06Z

note that we have another issue which affects the CI (to be fixed in #1522), however the pesel recognizer tests are failing too. Would you mind taking a look?

omri374 · 2025-02-06T01:58:02Z

/azp run

azure-pipelines · 2025-02-06T01:58:15Z

Azure Pipelines successfully started running 1 pipeline(s).

omri374 · 2025-03-31T18:34:08Z

/azp run

azure-pipelines · 2025-03-31T18:34:21Z

Azure Pipelines successfully started running 1 pipeline(s).

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull Request Overview

The PR fixes an incorrect PESEL checksum validation in the PlPeselRecognizer that was causing valid PESEL numbers to be falsely rejected.

Added input validation to check for an 11-digit number.
Updated the checksum calculation to correctly compute the control digit.
Modified the final check to compare the computed control digit with the last digit of the PESEL.

omri374 · 2025-04-06T16:01:11Z

@BlaiseCz this is a great addition. Would you be interested in fixing the tests and merging this into the package?

omri374 · 2025-06-01T07:46:36Z

Closing due to missing tests, feel free to re-open if you're interested in completing the tests or get any support from the maintenance team.

Fix incorrect PESEL checksum validation in PlPeselRecognizer

b05d826

Merge branch 'main' into fix-pesel-checksum

5a5e054

omri374 added 4 commits March 3, 2025 11:01

Merge branch 'main' into fix-pesel-checksum

4c5a736

Merge branch 'main' into fix-pesel-checksum

caa85ab

Merge branch 'main' into fix-pesel-checksum

769cac1

Merge branch 'main' into fix-pesel-checksum

7b6aba1

omri374 requested a review from Copilot March 31, 2025 18:34

Copilot AI reviewed Mar 31, 2025

View reviewed changes

omri374 requested a review from Copilot March 31, 2025 18:44

Copilot AI reviewed Mar 31, 2025

View reviewed changes

Merge branch 'main' into fix-pesel-checksum

2198cb4

omri374 closed this Jun 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix incorrect PESEL checksum validation in PlPeselRecognizer#1520

Fix incorrect PESEL checksum validation in PlPeselRecognizer#1520
BlaiseCz wants to merge 7 commits intomicrosoft:mainfrom
BlaiseCz:fix-pesel-checksum

BlaiseCz commented Jan 31, 2025 •

edited

Loading

Uh oh!

omri374 commented Feb 4, 2025

Uh oh!

omri374 commented Feb 4, 2025

Uh oh!

azure-pipelines bot commented Feb 4, 2025

Uh oh!

omri374 commented Feb 5, 2025

Uh oh!

omri374 commented Feb 6, 2025

Uh oh!

azure-pipelines bot commented Feb 6, 2025

Uh oh!

omri374 commented Mar 31, 2025

Uh oh!

azure-pipelines bot commented Mar 31, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

omri374 commented Apr 6, 2025

Uh oh!

omri374 commented Jun 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

BlaiseCz commented Jan 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug Description

To Reproduce

Observed Behavior

Expected Behavior

Root Cause: Incorrect Checksum Calculation

Proposed Fix

Why This Fix Works

Additional Context

Uh oh!

omri374 commented Feb 4, 2025

Uh oh!

omri374 commented Feb 4, 2025

Uh oh!

azure-pipelines bot commented Feb 4, 2025

Uh oh!

omri374 commented Feb 5, 2025

Uh oh!

omri374 commented Feb 6, 2025

Uh oh!

azure-pipelines bot commented Feb 6, 2025

Uh oh!

omri374 commented Mar 31, 2025

Uh oh!

azure-pipelines bot commented Mar 31, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

omri374 commented Apr 6, 2025

Uh oh!

omri374 commented Jun 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BlaiseCz commented Jan 31, 2025 •

edited

Loading