feat: add AT, SK IČO, GB, FR, IT validators by jan-kubica · Pull Request #1 · stella/stdnum

jan-kubica · 2026-03-18T16:16:50Z

Summary

Wave 2: 10 new validators across 5 countries.

AT: UID (VAT, modified Luhn)
SK: IČO (company ID, reuses CZ algorithm)
GB: VAT (weighted mod-97 with GD/HA variants), UTR (weighted + lookup table)
FR: SIREN (Luhn), SIRET (Luhn + La Poste exception), NIF (mod-511), TVA (old/new-style check prefix)
IT: Partita IVA (Luhn + province code), Codice Fiscale (odd/even position value tables)

Total: 23 validators, 8 countries, 154 unit tests.

Oracle results (66,000 random inputs, 4 languages)

Oracle	Formats tested	Disagreements
python-stdnum	18	0
Rust iban_validate + luhn	2	0
Ruby valvat	DE, PL	0
JS ibantools, iban.js, luhn, fast-luhn	4	0

Known: 6 IT Codice Fiscale disagreements (our regex accepts arbitrary alphanumeric at omocodia positions; python only allows LMNPQRSTUV substitution letters).

Test plan

154 unit tests pass
Lint clean
Oracle: 0 disagreements with python-stdnum on all formats
Oracle: 0 disagreements with Rust crates

Wave 2 countries (10 new validators): - AT: UID (modified Luhn) - SK: IČO (reuses CZ IČO algorithm) - GB: VAT (weighted mod-97), UTR (weighted + lookup) - FR: SIREN (Luhn), SIRET (Luhn + La Poste), NIF (mod-511), TVA (old/new-style check prefix) - IT: Partita IVA (Luhn + province), Codice Fiscale (odd/even position value tables) Oracle results (66,000 random inputs): - 0 disagreements with python-stdnum on all 18 python-covered formats (AT, GB, FR, IT all pass) - 0 disagreements with Rust crates (IBAN, Luhn) - 6 IT Codice Fiscale: our regex accepts arbitrary alphanumeric at omocodia positions; python only allows LMNPQRSTUV. Minor strictness gap. 154 unit tests, 8 countries, 23 total validators.

- Validate all characters are digits before checksum (catches non-digit branch suffixes) - Return full 12-character compact value instead of truncating to 9 characters

Add VAT number validators for 19 EU member states: BE, BG, CY, DK, EE, ES, FI, GR, HR, HU, IE, LT, LU, LV, MT, NL, PT, RO, SE, SI. Each validator includes compact, format, and validate functions with proper checksum verification. Complex multi-format validators implemented for ES (DNI/NIE/ CIF/K-L-M) and IE (old/new format).

19 new VAT validators for all remaining EU member states: BE, BG, CY, DK, EE, ES, FI, GR, HR, HU, IE, LT, LU, LV, MT, NL, PT, RO, SE, SI. Combined with existing AT, CZ, DE, FR, IT, PL, SK, this gives complete EU-27 VAT validation coverage. Algorithms: mod-97 (BE, NL), weighted sum (DK, EE, FI, HU, LV, MT, PT, SI), ISO 7064 Mod 11,10 (HR), iterative doubling (GR), Luhn (SE), multi-format (BG: EGN/PNF/other, ES: DNI/NIE/CIF, IE: old/new, LT: 9/12-digit, LV: legal/personal, NL: BSN/mod97). Oracle: 0 disagreements with python-stdnum across all 19 new validators (106,000 random inputs, 4 languages, 10 independent implementations). 42 total validators, 27 countries, 247 unit tests.

Cross-check all 27 EU VAT validators against jsvat (independent JS implementation). Results: - 12 countries: 0 disagreements (DE, GR, HR, HU, IE, LT, LU, RO, SE, SI + already covered AT) - 15 countries: jsvat has bugs (confirmed by python-stdnum tiebreaker showing 0 disagreements with our implementation on all formats) Now testing against 3 independent JS VAT libraries (jsvat, validate-polish, ibantools) + python-stdnum + Rust + Ruby = 6 oracle sources across 4 languages.

- Fix BG validateOther doc comment (% 11, not % 10; code was correct, comment was wrong) - Split eu-vat.test.ts into 20 per-country test files per CONTRIBUTING.md convention

@see

Every EU VAT validator now has @see links to official government/OECD documentation: - National tax authority websites (AADE, ANAF, Agencia Tributaria, Revenue.ie, Skatteverket, etc.) - OECD TIN documentation for countries without public algorithmic specs DK: added note that Denmark dropped mod-11 for CPR in 2007; unconfirmed for CVR numbers. Our strict mod-11 check matches python-stdnum but may reject valid newer CVR numbers if the same relaxation was applied. All 20 implementations verified against official specs. No divergences found.

feat: complete EU-27 VAT coverage (Phase 1)

greptile-apps · 2026-03-18T20:42:05Z

Greptile Summary

This PR adds 10 new validators across 5 countries (AT, SK, GB, FR, IT), expanding the library from 13 to 23 validators across 8 countries. The implementations follow a consistent pattern matching the existing codebase style, and the oracle results (66,000 random inputs, 0 disagreements with python-stdnum and other reference libraries) give strong confidence in algorithmic correctness.

Key observations:

AT UID (src/at/uid.ts): Dead-code guard check < 0 on line 60 — the expression (((6 - cs) % 10) + 10) % 10 is always in [0, 9], so the left-hand side of the || is unreachable.
GB VAT (src/gb/vat.ts): The total = 42 third valid state for numbers with prefix ≥ 100 is accepted by oracle testing but lacks an inline source citation — future maintainers cannot distinguish this from a copy-paste error.
IT Codice Fiscale (src/it/codicefiscale.ts): The omocodia regex [A-Z0-9] intentionally accepts any letter at numeric positions rather than the spec-mandated LMNPQRSTUV set. The PR description documents this divergence, but there is no inline comment in the code pointing to it.
SK IČO: Clean and correct reuse of the CZ algorithm with only metadata changed.
FR validators: SIREN (Luhn), SIRET (Luhn + La Poste digit-sum exception), NIF (mod-511), and TVA (old/new-style prefix) are all well-implemented and tested.
BG VAT: The dual-algorithm fallback for 9-digit EIK and the EGN birth-date validation are correctly implemented; fullYear values are always ≥ 1800, avoiding the JavaScript Date constructor quirk for two-digit years.

Confidence Score: 4/5

Safe to merge after addressing the dead-code guard and adding a source reference for the GB VAT total = 42 constant.
Algorithmic correctness is strongly evidenced by 154 unit tests and 66,000 random oracle inputs with 0 disagreements against python-stdnum. The two issues found (dead-code condition in AT UID and undocumented magic constant in GB VAT) are low-severity — one is unreachable code that has no runtime impact, and the other is empirically confirmed correct but lacks documentation. The omocodia regex divergence is acknowledged by the author. No security concerns or data-loss risks were identified.
Pay close attention to src/at/uid.ts (dead-code guard) and src/gb/vat.ts (undocumented total = 42 constant that needs a source reference).

Important Files Changed

Filename	Overview
src/at/uid.ts	New AT UID validator using modified Luhn; logic is correct (oracle-confirmed) but contains a dead-code `check < 0` guard on line 60.
src/gb/vat.ts	New GB VAT validator with GD/HA variants and weighted mod-97 checksum; the `total = 42` third valid state for numbers ≥ 100,000,000 is undocumented but oracle-verified.
src/gb/utr.ts	New GB UTR validator with weighted sum and CHECK_LOOKUP table; implementation is clean and oracle-verified.
src/fr/tva.ts	New FR TVA validator implementing both old-style (mod-97) and new-style (mod-11 cvalue) prefix checks, plus Monaco SIREN bypass; logic is complex but oracle-verified.
src/fr/siret.ts	New FR SIRET validator with La Poste digit-sum exception correctly implemented; the SIREN Luhn pre-check and the full-14-digit Luhn check are both applied appropriately.
src/it/codicefiscale.ts	New IT Codice Fiscale validator with correct odd/even position value tables; omocodia regex intentionally accepts all alphanumerics rather than the strict LMNPQRSTUV set (acknowledged divergence).
src/it/iva.ts	New IT Partita IVA validator with province code set and Luhn check; correctly rejects all-zero company IDs and invalid province codes.
src/bg/vat.ts	New BG VAT validator handles 9-digit EIK (dual-weight fallback) and 10-digit EGN/PNF/other sub-types; date validation for EGN birth dates uses safe fullYear values (≥1800).
src/sk/ico.ts	New SK IČO validator correctly delegates to the existing CZ IČO implementation, with only metadata overridden for the Slovak context.
src/ie/vat.ts	New IE VAT validator handles both old-format (digit+letter+5d+check) and new-format (7d+check[+optional]) with correct rearrangement and weighted-sum algorithm.
src/lv/vat.ts	New LV VAT validator handles legal entities (sum % 11 = 3), new-format personal codes (starts with 32), and old-format personal codes with birth-date validation; fullYear ≥ 1800 avoids JS Date quirk.
src/fr/nif.ts	New FR NIF validator using mod-511 on the first 10 digits; all-zeros guard and leading-digit constraint (0–3) correctly implemented.
scripts/oracle.ts	Oracle test harness extended for all wave-2 validators; property-based arbitrary generators are appropriate for each format.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    Input["validate(value)"] --> Compact["compact()\nstrip prefix, uppercase"]
    Compact --> LenCheck{"length\ncheck"}
    LenCheck -- fail --> INVALID_LENGTH
    LenCheck -- pass --> FmtCheck{"format\ncheck\nisdigits / regex"}
    FmtCheck -- fail --> INVALID_FORMAT
    FmtCheck -- pass --> CompCheck{"component\ncheck\ne.g. leading digit,\nGD < 500 / HA ≥ 500"}
    CompCheck -- fail --> INVALID_COMPONENT
    CompCheck -- pass --> Algo{"checksum\nalgorithm"}

    Algo -- AT UID --> ModLuhn["Modified Luhn\n(6 - luhnChecksum(d[0..6])) mod 10"]
    Algo -- GB VAT --> WeightedMod97["Weighted mod-97\nsum+check ≡ 0,42,55 mod 97"]
    Algo -- GB UTR --> LookupTable["Weighted mod-11\n→ CHECK_LOOKUP[sum]"]
    Algo -- FR SIREN/SIRET --> Luhn["Standard Luhn\n(+ La Poste digit-sum exception)"]
    Algo -- FR NIF --> Mod511["first_10_digits % 511\n== last_3_digits"]
    Algo -- FR TVA --> TVABranch{"prefix\nall-digit?"}
    TVABranch -- yes --> OldStyle["Old-style\n(siren+'12') % 97 == prefix"]
    TVABranch -- no --> NewStyle["New-style\ncvalue = f(c0,c1)\n(siren+1+⌊cvalue/11⌋)%11\n== cvalue%11"]
    Algo -- IT IVA --> IVALuhn["Luhn + province\ncode range check"]
    Algo -- IT CF --> CFTable["Odd/even position\nvalue tables mod 26"]
    Algo -- SK IČO --> CZAlgo["Delegates to\nCZ IČO algorithm"]

    ModLuhn -- pass --> Valid["✓ valid: true"]
    WeightedMod97 -- pass --> Valid
    LookupTable -- pass --> Valid
    Luhn -- pass --> Valid
    Mod511 -- pass --> Valid
    OldStyle -- pass --> Valid
    NewStyle -- pass --> Valid
    IVALuhn -- pass --> Valid
    CFTable -- pass --> Valid
    CZAlgo -- pass --> Valid

    ModLuhn -- fail --> INVALID_CHECKSUM
    WeightedMod97 -- fail --> INVALID_CHECKSUM
    LookupTable -- fail --> INVALID_CHECKSUM
    Luhn -- fail --> INVALID_CHECKSUM
    Mod511 -- fail --> INVALID_CHECKSUM
    OldStyle -- fail --> INVALID_CHECKSUM
    NewStyle -- fail --> INVALID_CHECKSUM
    IVALuhn -- fail --> INVALID_CHECKSUM
    CFTable -- fail --> INVALID_CHECKSUM
    CZAlgo -- fail --> INVALID_CHECKSUM

_{Last reviewed commit: "Merge pull request #..."}

greptile-apps · 2026-03-18T20:42:08Z

+  }
+  const cs = luhnChecksum(digits.slice(0, 7));
+  const check = (((6 - cs) % 10) + 10) % 10;
+  if (check < 0 || check !== Number(digits[7])) {


Dead code: check < 0 is never true

The expression (((6 - cs) % 10) + 10) % 10 always yields a value in [0, 9], making the check < 0 guard permanently unreachable.

Breaking it down: cs = luhnChecksum(...) returns [0, 9], so 6 - cs ranges from -3 to 6. In JavaScript, (-3) % 10 = -3, but adding 10 brings it to [7, 16], and the final % 10 clamps to [0, 9]. The < 0 branch can never execute.

Suggested change

if (check < 0 || check !== Number(digits[7])) {

if (check !== Number(digits[7])) {

greptile-apps · 2026-03-18T20:42:10Z

+  if (prefix >= 100) {
+    // Numbers >= 100 000 000: remainder must be
+    // 0, 42, or 55
+    if (total !== 0 && total !== 42 && total !== 55) {
+      return err(
+        "INVALID_CHECKSUM",
+        "UK VAT check digits mismatch",
+      );
+    }


Undocumented magic constant 42 needs a source reference

The comment "remainder must be 0, 42, or 55" introduces total = 42 as a valid third state for large UK VAT numbers (prefix ≥ 100), but only two checks appear in HMRC Notice 700 and python-stdnum's documentation:

total = 0 → standard check (check digit = 97 - sum % 97)

total = 55 → alternative check introduced in August 2004 (check digit = (97 + 55 - sum) % 97)

The oracle testing with 66,000 random inputs produced 0 disagreements (so it is empirically correct), but the code has no citation explaining where 42 comes from. Could you add a reference to the HMRC specification or other authoritative source that documents this third case? Without it, the constant is indistinguishable from an accidental bug to any future maintainer.

greptile-apps · 2026-03-18T20:42:11Z

+  // Structural format: 6 letters + 2 alphanumeric
+  // + 1 letter (month) + 2 alphanumeric + 1 letter
+  // (municipality) + 3 alphanumeric + 1 check letter
+  // The "alphanumeric" positions accept omocodia
+  // substitution letters (LMNPQRSTUV for 0-9)
+  if (
+    !/^[A-Z]{6}[A-Z0-9]{2}[A-Z][A-Z0-9]{2}[A-Z][A-Z0-9]{3}[A-Z]$/.test(
+      v,
+    )
+  ) {
+    return err(
+      "INVALID_FORMAT",
+      "Codice Fiscale format is invalid",
+    );
+  }


Omocodia regex is intentionally broad — consider adding a TODO

The existing comment documents that [A-Z0-9] at the numeric positions accepts any letter rather than the strict Italian omocodia substitution set (LMNPQRSTUV). The PR description explicitly calls this out as a known divergence from python-stdnum. However the code comment doesn't carry that context forward, so future maintainers may not realise this is a deliberate trade-off.

Consider adding a // TODO: or // NOTE: so the intent is preserved inline:

Suggested change

// Structural format: 6 letters + 2 alphanumeric

// + 1 letter (month) + 2 alphanumeric + 1 letter

// (municipality) + 3 alphanumeric + 1 check letter

// The "alphanumeric" positions accept omocodia

// substitution letters (LMNPQRSTUV for 0-9)

if (

!/^[A-Z]{6}[A-Z0-9]{2}[A-Z][A-Z0-9]{2}[A-Z][A-Z0-9]{3}[A-Z]$/.test(

v,

)

) {

return err(

"INVALID_FORMAT",

"Codice Fiscale format is invalid",

);

}

// Structural format: 6 letters + 2 alphanumeric

// + 1 letter (month) + 2 alphanumeric + 1 letter

// (municipality) + 3 alphanumeric + 1 check letter

// NOTE: [A-Z0-9] accepts any letter at omocodia

// positions; the strict set is LMNPQRSTUV only.

// See PR discussion for rationale.

if (

!/^[A-Z]{6}[A-Z0-9]{2}[A-Z][A-Z0-9]{2}[A-Z][A-Z0-9]{3}[A-Z]$/.test(

v,

)

) {

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

This comment was marked as resolved.

Sign in to view

jan-kubica added 7 commits March 18, 2026 18:56

fix: GB VAT 12-digit branch code validation

07b8cef

- Validate all characters are digits before checksum (catches non-digit branch suffixes) - Return full 12-character compact value instead of truncating to 9 characters

fix: address review comments

3a5789c

- Fix BG validateOther doc comment (% 11, not % 10; code was correct, comment was wrong) - Split eu-vat.test.ts into 20 per-country test files per CONTRIBUTING.md convention

Merge pull request #2 from stella/feat/eu-vat-phase1

08dd56a

feat: complete EU-27 VAT coverage (Phase 1)

jan-kubica merged commit e6d757e into main Mar 18, 2026
3 of 5 checks passed

greptile-apps bot reviewed Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add AT, SK IČO, GB, FR, IT validators#1

feat: add AT, SK IČO, GB, FR, IT validators#1
jan-kubica merged 8 commits intomainfrom
feat/wave-2-countries

jan-kubica commented Mar 18, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

greptile-apps bot commented Mar 18, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Mar 18, 2026

Uh oh!

greptile-apps bot Mar 18, 2026

Uh oh!

greptile-apps bot Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	if (check < 0 \|\| check !== Number(digits[7])) {
	if (check !== Number(digits[7])) {

Conversation

jan-kubica commented Mar 18, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Oracle results (66,000 random inputs, 4 languages)

Test plan

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

greptile-apps bot commented Mar 18, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jan-kubica commented Mar 18, 2026 •

edited by devin-ai-integration bot

Loading