Description
MrzCode.Parse() fails when parsing a TD1 MRZ with a document number of 10–11 characters. The library's TD1FirstLineLongDocument class (added in #7 for Belgian ID cards) only handles document numbers that fill positions 6–18 (approximately 12–13 chars), but the ICAO Doc 9303 standard allows any length from 10 to 13 characters in the extended format.
Root Cause
When the document number exceeds 9 characters, position 15 of the upper line is < and the number continues in positions 16+. The check digit immediately follows the last digit of the document number.
The library has two code paths:
-
TD1FirstLine — expects the check digit at position 15: ([A-Z0-9<]{9})([0-9]{1}). Fails because position 15 is <.
-
TD1FirstLineLongDocument — captures 13 characters as the document number (positions 6–18) and expects the check digit at position 19: ([A-Z0-9<]{13})([0-9]{1}). For a 10-char number, this incorrectly consumes the check digit as part of the document number, then fails because position 19 is <.
Neither path handles document numbers of 10–11 characters, where the check digit falls at positions 17–18.
MRZ That Fails
IDUTO123456789<07<<<<<<<<<<<<<
8601012M3001019UTO<<<<<<<<<<<6
SPECIMEN<<JANE<<<<<<<<<<<<<<<<
This is a valid TD1 MRZ with a 10-character document number 1234567890. Per ICAO Doc 9303 Part 5, Section 4.2.4:
- Positions 6–14 (upper line):
123456789 (first 9 characters of the document number)
- Position 15:
< (filler; excluded from check digit calculation per spec)
- Position 16:
0 (10th and final character of the document number)
- Position 17:
7 (check digit calculated over positions 6–14 + 16, i.e. 1234567890)
- Positions 18–30:
<<<<<<<<<<<<< (remaining optional data)
Comparison with supported Belgian format
The existing TD1DocumentLongMrzCodeTest uses a 12-character document number where the check digit falls exactly at position 19 — this works. But any document number of 10–11 characters places the check digit at position 17–18, falling inside the 13-char capture group, causing both regex paths to fail.
Expected Behavior
MrzCode.Parse() should successfully parse the MRZ and return:
DocumentType: ID
CountryCode: UTO
DocumentNumber: 1234567890
BirthDate: 860101
Sex: M
ExpiryDate: 300101
Nationality: UTO
Names: SPECIMEN, JANE
Suggested Fix
The general solution per the ICAO spec is: when position 15 is <, scan positions 16–28 for the document number continuation, then expect the check digit at the position immediately following the last non-< character (which will be followed by <). This would cover document numbers of any length from 10 to 13 characters.
Alternatively, add additional line types for 10-char and 11-char document numbers (similar to how TD1FirstLineLongDocument handles 12–13 chars).
ICAO Reference
ICAO Doc 9303, Part 5 — Machine Readable Official Travel Documents (PDF)
Section 4.2.4, "Check digits in the MRZ":
Long document number check digit — Character positions used to calculate check digit: 6–14, 16–28. Note: Position 15 contains '<' and is excluded from the check digit calculation. The position of the last digit of a long document number is in the range of 16–28. Since the check digit follows the last digit of the document number, its position is in the range of 17–29. The check digit is followed by '<'.
Environment
- MRZCodeParser NuGet package (
v0.5.0)
- .NET 10
Description
MrzCode.Parse()fails when parsing a TD1 MRZ with a document number of 10–11 characters. The library'sTD1FirstLineLongDocumentclass (added in #7 for Belgian ID cards) only handles document numbers that fill positions 6–18 (approximately 12–13 chars), but the ICAO Doc 9303 standard allows any length from 10 to 13 characters in the extended format.Root Cause
When the document number exceeds 9 characters, position 15 of the upper line is
<and the number continues in positions 16+. The check digit immediately follows the last digit of the document number.The library has two code paths:
TD1FirstLine— expects the check digit at position 15:([A-Z0-9<]{9})([0-9]{1}). Fails because position 15 is<.TD1FirstLineLongDocument— captures 13 characters as the document number (positions 6–18) and expects the check digit at position 19:([A-Z0-9<]{13})([0-9]{1}). For a 10-char number, this incorrectly consumes the check digit as part of the document number, then fails because position 19 is<.Neither path handles document numbers of 10–11 characters, where the check digit falls at positions 17–18.
MRZ That Fails
This is a valid TD1 MRZ with a 10-character document number
1234567890. Per ICAO Doc 9303 Part 5, Section 4.2.4:123456789(first 9 characters of the document number)<(filler; excluded from check digit calculation per spec)0(10th and final character of the document number)7(check digit calculated over positions 6–14 + 16, i.e.1234567890)<<<<<<<<<<<<<(remaining optional data)Comparison with supported Belgian format
The existing
TD1DocumentLongMrzCodeTestuses a 12-character document number where the check digit falls exactly at position 19 — this works. But any document number of 10–11 characters places the check digit at position 17–18, falling inside the 13-char capture group, causing both regex paths to fail.Expected Behavior
MrzCode.Parse()should successfully parse the MRZ and return:DocumentType:IDCountryCode:UTODocumentNumber:1234567890BirthDate:860101Sex:MExpiryDate:300101Nationality:UTONames:SPECIMEN, JANESuggested Fix
The general solution per the ICAO spec is: when position 15 is
<, scan positions 16–28 for the document number continuation, then expect the check digit at the position immediately following the last non-<character (which will be followed by<). This would cover document numbers of any length from 10 to 13 characters.Alternatively, add additional line types for 10-char and 11-char document numbers (similar to how
TD1FirstLineLongDocumenthandles 12–13 chars).ICAO Reference
ICAO Doc 9303, Part 5 — Machine Readable Official Travel Documents (PDF)
Section 4.2.4, "Check digits in the MRZ":
Environment
v0.5.0)