Skip to content

TD1 parsing fails for extended document numbers shorter than 13 characters (position 15 = <) #9

@mavispuford

Description

@mavispuford

Description

MrzCode.Parse() fails when parsing a TD1 MRZ with a document number of 10–11 characters. The library's TD1FirstLineLongDocument class (added in #7 for Belgian ID cards) only handles document numbers that fill positions 6–18 (approximately 12–13 chars), but the ICAO Doc 9303 standard allows any length from 10 to 13 characters in the extended format.

Root Cause

When the document number exceeds 9 characters, position 15 of the upper line is < and the number continues in positions 16+. The check digit immediately follows the last digit of the document number.

The library has two code paths:

  1. TD1FirstLine — expects the check digit at position 15: ([A-Z0-9<]{9})([0-9]{1}). Fails because position 15 is <.

  2. TD1FirstLineLongDocument — captures 13 characters as the document number (positions 6–18) and expects the check digit at position 19: ([A-Z0-9<]{13})([0-9]{1}). For a 10-char number, this incorrectly consumes the check digit as part of the document number, then fails because position 19 is <.

Neither path handles document numbers of 10–11 characters, where the check digit falls at positions 17–18.

MRZ That Fails

IDUTO123456789<07<<<<<<<<<<<<<
8601012M3001019UTO<<<<<<<<<<<6
SPECIMEN<<JANE<<<<<<<<<<<<<<<<

This is a valid TD1 MRZ with a 10-character document number 1234567890. Per ICAO Doc 9303 Part 5, Section 4.2.4:

  • Positions 6–14 (upper line): 123456789 (first 9 characters of the document number)
  • Position 15: < (filler; excluded from check digit calculation per spec)
  • Position 16: 0 (10th and final character of the document number)
  • Position 17: 7 (check digit calculated over positions 6–14 + 16, i.e. 1234567890)
  • Positions 18–30: <<<<<<<<<<<<< (remaining optional data)

Comparison with supported Belgian format

The existing TD1DocumentLongMrzCodeTest uses a 12-character document number where the check digit falls exactly at position 19 — this works. But any document number of 10–11 characters places the check digit at position 17–18, falling inside the 13-char capture group, causing both regex paths to fail.

Expected Behavior

MrzCode.Parse() should successfully parse the MRZ and return:

  • DocumentType: ID
  • CountryCode: UTO
  • DocumentNumber: 1234567890
  • BirthDate: 860101
  • Sex: M
  • ExpiryDate: 300101
  • Nationality: UTO
  • Names: SPECIMEN, JANE

Suggested Fix

The general solution per the ICAO spec is: when position 15 is <, scan positions 16–28 for the document number continuation, then expect the check digit at the position immediately following the last non-< character (which will be followed by <). This would cover document numbers of any length from 10 to 13 characters.

Alternatively, add additional line types for 10-char and 11-char document numbers (similar to how TD1FirstLineLongDocument handles 12–13 chars).

ICAO Reference

ICAO Doc 9303, Part 5 — Machine Readable Official Travel Documents (PDF)

Section 4.2.4, "Check digits in the MRZ":

Long document number check digit — Character positions used to calculate check digit: 6–14, 16–28. Note: Position 15 contains '<' and is excluded from the check digit calculation. The position of the last digit of a long document number is in the range of 16–28. Since the check digit follows the last digit of the document number, its position is in the range of 17–29. The check digit is followed by '<'.

Environment

  • MRZCodeParser NuGet package (v0.5.0)
  • .NET 10

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions