Fix incorrect PESEL checksum validation in PlPeselRecognizer by sienioApius · Pull Request #1998 · microsoft/presidio

sienioApius · 2026-04-23T12:01:58Z

Change Description

The Polish PESEL check-digit algorithm is check = (10 - weighted_sum % 10) % 10
(see https://en.wikipedia.org/wiki/PESEL#Check_digit), but
PlPeselRecognizer.validate_result currently compares the raw
weighted_sum % 10 to the check digit, which rejects real valid PESEL numbers.

Example — 44051401458 is the canonical example used in official Polish
documentation and is algorithmically valid, but today:

>>> from presidio_analyzer.predefined_recognizers import PlPeselRecognizer
>>> PlPeselRecognizer().validate_result("44051401458")
False   # expected True

This PR:

Corrects the check-digit formula.
Guards against non-11-digit / non-numeric inputs (previously would raise
IndexError on a regex-produced string shorter than 11).
Replaces the existing test fixtures — which were generated using the buggy
formula and would not pass real-world PESEL validators — with PESEL numbers
that are valid under the true algorithm, plus negative cases covering wrong
check digits, wrong length, and non-digit input.

The previous "valid" fixture 11111111114 has a real check digit of 6, not
4. The updated fixtures (44051401458, 02070803628, 11111111116) verify
against the actual PESEL standard.

Issue reference

This completes the work started in #1520 (closed on 2025-06-01 due to missing
tests). Attribution to the original author @BlaiseCz for the analysis.

Checklist

I have reviewed the contribution guidelines
I have signed the CLA (if required) (will sign when CLA bot prompts)
My code includes unit tests
All unit tests and lint checks pass locally (pytest tests/test_pl_pesel_recognizer.py → 18 passed)
My PR contains documentation updates / additions if required

The PESEL check-digit algorithm is `check = (10 - weighted_sum % 10) % 10` (https://en.wikipedia.org/wiki/PESEL#Check_digit). The previous implementation compared the raw `weighted_sum % 10` to the check digit, which incorrectly rejects valid PESEL numbers such as 44051401458 (the canonical example cited in official Polish documentation) and accepts nothing that a real PESEL-issuing authority would produce. This completes microsoft#1520, which was closed due to missing test coverage. Changes: - Correct the check-digit formula in `validate_result`. - Guard against non-11-digit / non-numeric inputs. - Replace the previous test fixtures (which relied on the buggy formula) with PESELs that are valid under the real algorithm, plus negative cases covering bad check digits, wrong length, and non-digit characters. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sienioApius · 2026-04-23T12:04:24Z

@microsoft-github-policy-service agree company="APIUS Technologies S.A."

github-actions Bot added the external label Apr 23, 2026

omri374 approved these changes Apr 23, 2026

View reviewed changes

sienioApius and others added 2 commits April 23, 2026 15:57

Merge branch 'main' into fix/pl-pesel-checksum

67c63c9

Merge branch 'main' into fix/pl-pesel-checksum

ddf759e

omri374 merged commit 453bebc into microsoft:main Apr 28, 2026
34 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix incorrect PESEL checksum validation in PlPeselRecognizer#1998

Fix incorrect PESEL checksum validation in PlPeselRecognizer#1998
omri374 merged 3 commits into
microsoft:mainfrom
sienioApius:fix/pl-pesel-checksum

sienioApius commented Apr 23, 2026 •

edited

Loading

Uh oh!

sienioApius commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sienioApius commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Description

Issue reference

Checklist

Uh oh!

sienioApius commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sienioApius commented Apr 23, 2026 •

edited

Loading