Skip to content

feat: strengthen validator correctness and repo hygiene#77

Merged
jan-kubica merged 2 commits intomainfrom
codex/stdnum-quality-correctness-pass
Mar 26, 2026
Merged

feat: strengthen validator correctness and repo hygiene#77
jan-kubica merged 2 commits intomainfrom
codex/stdnum-quality-correctness-pass

Conversation

@jan-kubica
Copy link
Copy Markdown
Contributor

@jan-kubica jan-kubica commented Mar 26, 2026

Summary

This PR strengthens stdnum across four areas:

  • fixes repo hygiene by adding typecheck/build/export checks to CI and aligning the build config
  • promotes parse() to a typed first-class capability in the public validator API
  • improves property testing by preferring examples and generate() seeds and filling generator gaps
  • tightens several validator/oracle correctness edges using canonical country sources

What changed

  • Added typed parsed identifier support via Validator<TParsed>, ParsedBirthDate, ParsedIdentifier, and ParsableValidator.
  • Updated parse-capable validators to expose typed parse() on the public validator object.
  • Reworked parse tests to auto-discover public parsers.
  • Added typecheck and sync-exports:check scripts and wired them into CI and prepublishOnly.
  • Switched tsdown from deprecated bundle: false to unbundle: true.
  • Updated property tests to prefer examples and generate() over random lucky discovery.
  • Added/fixed generator coverage and metadata in multiple validators.
  • Updated Dutch VAT to the current structural NL#########B## format.
  • Tightened source-backed rules for:
    • ee.ik
    • si.emso
    • lv.vat
    • cz.rc / sk.rc
  • Documented canonical source rationale inline next to the validators and oracle exclusions that depend on it.
  • Pruned oracle pairings that were comparing against stale or mismatched third-party implementations.

Verification

Passed locally:

  • bun run typecheck
  • bun run lint
  • bun test
  • bun run sync-exports:check
  • targeted regression suites for cz, ee, lv, si, sk

Follow-up

  • no-non-null-assertion is still disabled. I left that as a separate cleanup because the repo still has broad postfix ! usage and it deserves a focused pass rather than being mixed into this PR.
  • The oracle suite is improved but not fully green yet; remaining hotspots still need stronger official vectors before changing logic further.

Open with Devin

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 26, 2026

Greptile Summary

This PR strengthens stdnum across four well-scoped areas: it promotes parse() to a typed first-class public API via a generic Validator<TParsed>, adds CI/build/export hygiene gates, improves property-test seeding by preferring examples and generate() over lucky random discovery, and tightens validator correctness for cz.rc/sk.rc, ee.ik, lv.vat, nl.vat, and si.emso against canonical official sources.\n\nKey changes:\n\n- Types (src/types.ts): New ParsedBirthDate, ParsedIdentifier, and ParsableValidator types; Validator<TParsed> enforces parse presence via a conditional intersection type.\n- cz.rc / sk.rc: Replaces the fragile (mm % 50) % 20 || mm % 50 month decode with an explicit decodeMonth that restricts +20/+70 overflow offsets to 10-digit post-2004 numbers only; sk.rc now re-exports generate.\n- ee.ik: Drops speculative century digits 7/8 per the OECD Estonia TIN sheet; centuryMap trimmed accordingly.\n- lv.vat: Splits into three explicit branches (new-format personal 3x, entity 4/5/9, legacy birth-date), restricts legacy century digit to 0–2, removes the old checkPersonal from new-format codes per PMLP docs, and the generator now covers both 3x and entity shapes.\n- nl.vat: Removes the BSN and MOD97 checksum checks entirely; validation is now purely structural (\\d{9}B\\d{2}), matching the cited official guidance that the digits are assigned without checksum derivation.\n- si.emso: Adds a future-date guard (correctly normalised to local midnight with setHours(0,0,0,0)) and a register-code restriction (50–59).\n- CI: New build job adds typecheck, build, and sync-exports:check gates before publish; prepublishOnly mirrors these.\n- Oracle: Stale pairings (jsvat/valvat for nl.vat, lv.vat, es.vat; Luhn npm/rust oracles; ruby-ssn sk.rc) are pruned with inline rationale; semantically correct shaped generators replace generic digit-date stubs for cz.rc, ee.ik, bg.egn, si.emso, lv.vat, and cy.vat.\n- Parse tests: Auto-discovery now uses public validator objects from src/index (not filesystem glob); all existing specific date/gender assertions are preserved.

Confidence Score: 5/5

Safe to merge; the single remaining P2 note is a pre-existing style nit that the PR already fixed in a sibling file.

All prior review concerns are addressed: the si.emso timezone fix is correct, it/codicefiscale Math.random() is fixed, lv.vat now generates 3x codes, and specific parse assertions are preserved. The only remaining item is an identical Math.random() inconsistency in cz/rc.ts generate() that is pre-existing and carries no correctness risk. Validator logic changes are source-backed with inline rationale, CI gates are strengthened, and the type-system changes are sound.

src/cz/rc.ts line 174 has a trivial Math.random() nit; no files require blocking attention.

Important Files Changed

Filename Overview
src/types.ts Adds ParsedBirthDate, ParsedIdentifier, ParsableValidator, and makes Validator generic over TParsed; the conditional parse property typing is correct.
src/cz/rc.ts Replaces fragile (mm % 50) % 20
src/sk/rc.ts Now re-exports generate from cz/rc and adds parse, generate, and lengths to the public validator object; delegation pattern is clean.
src/nl/vat.ts Removes BSN and MOD97 checksum checks in favour of purely structural validation; intentional per cited official guidance.
src/lv/vat.ts Splits validation into three explicit branches (new personal 3x, entity 4/5/9, legacy personal), adds century-digit bounds check (0-2 only), and now generates both new-format personal and entity codes.
src/si/emso.ts Adds future-date guard with correct local-midnight normalization and register-code restriction (50-59); both changes are source-backed.
src/ee/ik.ts Restricts first digit to 1-6 (drops speculative 7/8 century values) per OECD Estonia TIN sheet.
src/it/codicefiscale.ts Adds generate() and exposes parse on the public validator object; prior Math.random() coin-flip replaced with randomInt(0, 1) === 0.
src/eu/vat.ts Adds generate() that delegates to each member validator's own generator; defensive guards are unreachable in practice but harmless.
test/parse.test.ts Auto-discovery now uses public validator objects from src/index instead of filesystem glob; specific date/gender assertions for all changed validators are preserved.
scripts/oracle.ts Removes stale oracle pairings and adds semantically correct shaped generators for cz.rc, ee.ik, bg.egn, si.emso, lv.vat, cy.vat with inline rationale comments.
.github/workflows/ci.yml Adds build job running typecheck, build, and sync-exports:check; gated on trust-check and required before publish.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["lv.vat validate(v)"] --> B{"v[0]==='3' &&\nv[1] in '2'-'9'?"}
    B -- yes --> C["✅ New personal code\n(syntactic only, no checksum)"]
    B -- no --> D{"v[0] in\n'4','5','9'?"}
    D -- yes --> E["weightedSum check\n(mod 11 = 3)"]
    E -- pass --> F["✅ Entity code"]
    E -- fail --> G["❌ INVALID_CHECKSUM"]
    D -- no --> H["Legacy personal code\nDDMMYY + century 0-2"]
    H --> I{"validateBirthDate?\ncentury ∈ 0-2?"}
    I -- fail --> J["❌ INVALID_COMPONENT"]
    I -- pass --> K{"checkPersonal\n(weighted sum)"}
    K -- pass --> L["✅ Legacy personal"]
    K -- fail --> M["❌ INVALID_CHECKSUM"]

    N["cz.rc decodeMonth(mm, year, len)"] --> O{"len=10 &&\nyear≥2004?"}
    O -- yes --> P["candidates: 0, 50, 20, 70"]
    O -- no --> Q["candidates: 0, 50"]
    P --> R["first offset where\nmm-offset ∈ 1-12"]
    Q --> R
    R -- none match --> S["❌ INVALID_COMPONENT"]
    R -- found --> T["✅ decoded month"]
Loading

Reviews (2): Last reviewed commit: "Address bot review feedback" | Re-trigger Greptile

greptile-apps[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@jan-kubica jan-kubica merged commit 58c345f into main Mar 26, 2026
5 of 7 checks passed
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 7 additional findings in Devin Review.

Open in Devin Review

Comment thread scripts/oracle.ts
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Multiple oracle libraries removed as cross-validators with detailed inline rationale

The PR removes several oracle pairings: cy.vat from jsvat/valvat, lv.vat from jsvat/valvat, nl.vat from python-stdnum/jsvat/valvat, bg.egn from stdnum-js/ruby-ssn, es.vat from jsvat/valvat, cz.dic from python-stdnum/jsvat, and sk.rc from ruby-ssn. Each removal has an inline comment explaining the disagreement (e.g., Cyprus TFA ranges, Latvia post-2017 format, GRAO EGN rules). This reduces oracle coverage but the rationale cites authoritative government sources over third-party libraries. Worth auditing the specific official sources to confirm the claimed discrepancies.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant