feat: Add Codex skill support and restore Bun workflows#79
feat: Add Codex skill support and restore Bun workflows#79jan-kubica wants to merge 120 commits intomainfrom
Conversation
Validate, compact, and format standard identifiers. Pure TypeScript, zero dependencies, tree-shakeable. Validators: - CZ: IČO (company ID), DIČ (VAT), RČ (birth number) - SK: RČ (birth number), IČ DPH (VAT) - DE: USt-IdNr. (VAT), IdNr (personal tax ID) - International: IBAN, credit card (Luhn), LEI Checksum algorithms: Luhn, mod-97 (ISO 7064), weighted sum, ISO 7064 Mod 11,10. Unicode normalization for OCR/PDF artifacts. Per-identifier entry points for tree-shaking. 76 tests.
- NIP: Polish VAT number, weighted checksum - PESEL: Polish national ID, date + gender encoding with century offsets, weighted checksum - REGON: Polish business register, 9-digit and 14-digit variants with separate checksums - Add banner image to README - 96 tests
Oracle script (scripts/oracle.ts) cross-checks all validators against python-stdnum on 2000 random inputs per format. Found and fixed: - CZ DIČ: special entity checksum skips the leading "6" (was including it). Also fixed JS negative modulo in check digit computation. - CZ RČ: off-by-one in 9-digit year boundary (>= 1980, not > 1980). Also fixed 10-digit checksum to use front mod 11 mod 10 (handles remainder=10 → check digit 0). Removed overly strict month encoding range check. - SK IČ DPH: added third-digit format rule (must be 2/3/4/7/8/9). Also accept valid birth numbers as DPH (matches python-stdnum). Result: 0 disagreements on CZ, SK, DE, PL, Luhn. IBAN has known BBAN format gap (documented).
Every validator now has @see links to official government or standards body documentation: - CZ IČO: czso.cz (already had) - CZ DIČ: mfcr.cz DPH register - CZ/SK RČ: mvcr.cz + Law 133/2000 Sb. - SK IČ DPH: financnasprava.sk - DE USt-IdNr.: bzst.de + format PDF - DE IdNr: bzst.de - PL NIP: biznes.gov.pl + OECD TIN PDF - PL PESEL: OECD TIN PDF - PL REGON: bip.stat.gov.pl - IBAN: ISO 13616 (already had) - Luhn: ISO/IEC 7812-1 (already had) - LEI: ISO 17442 (already had)
IBAN: add BBAN format regex for 70+ countries (sourced from SWIFT IBAN Registry). Now validates country-specific BBAN structure, not just mod-97. 0 disagreements with ibantools (111K downloads/wk). Oracle: add JS-based oracles that run without Python: - validate-polish: PL NIP, PESEL, REGON - ibantools: IBAN with BBAN format Python oracle is now optional (runs if .venv exists). JS oracles always run. Found 1 validate-polish bug: accepts PESEL with month encoding 40 (decodes to month 0). Both python-stdnum and @stll/stdnum correctly reject.
iban.js, jsvat) Now testing against 7 independent implementations: - python-stdnum (Python): 0 disagreements - validate-polish (JS): 5 (their PESEL bug) - ibantools (JS): 0 - iban.js (JS): 0 - luhn npm (JS): 0 - fast-luhn (JS): 0 - jsvat CZ/DE/PL (JS): DE=0, CZ=150 (their bug), PL=29 (their bug) All disagreements confirmed as bugs in the other libraries by tiebreaking against python-stdnum.
Oracle now tests against 10 independent implementations across 4 languages: JS (always): validate-polish, ibantools, iban.js, luhn, fast-luhn, jsvat Python (optional): python-stdnum Rust (optional): iban_validate, luhn crates Ruby (optional): valvat gem Results on 48,000 random inputs: - 0 disagreements with python-stdnum (all formats) - 0 disagreements with Rust crates (IBAN, Luhn) - 0 disagreements with valvat (DE VAT, PL NIP) - 0 disagreements with ibantools, iban.js, luhn, fast-luhn Known bugs found in other libraries: - validate-polish: accepts PESEL month 0 - jsvat: lax CZ DIČ and PL NIP validation - valvat: no CZ IČO checksum (syntax-only)
Wave 2 countries (10 new validators): - AT: UID (modified Luhn) - SK: IČO (reuses CZ IČO algorithm) - GB: VAT (weighted mod-97), UTR (weighted + lookup) - FR: SIREN (Luhn), SIRET (Luhn + La Poste), NIF (mod-511), TVA (old/new-style check prefix) - IT: Partita IVA (Luhn + province), Codice Fiscale (odd/even position value tables) Oracle results (66,000 random inputs): - 0 disagreements with python-stdnum on all 18 python-covered formats (AT, GB, FR, IT all pass) - 0 disagreements with Rust crates (IBAN, Luhn) - 6 IT Codice Fiscale: our regex accepts arbitrary alphanumeric at omocodia positions; python only allows LMNPQRSTUV. Minor strictness gap. 154 unit tests, 8 countries, 23 total validators.
- Validate all characters are digits before checksum (catches non-digit branch suffixes) - Return full 12-character compact value instead of truncating to 9 characters
Add VAT number validators for 19 EU member states: BE, BG, CY, DK, EE, ES, FI, GR, HR, HU, IE, LT, LU, LV, MT, NL, PT, RO, SE, SI. Each validator includes compact, format, and validate functions with proper checksum verification. Complex multi-format validators implemented for ES (DNI/NIE/ CIF/K-L-M) and IE (old/new format).
19 new VAT validators for all remaining EU member states: BE, BG, CY, DK, EE, ES, FI, GR, HR, HU, IE, LT, LU, LV, MT, NL, PT, RO, SE, SI. Combined with existing AT, CZ, DE, FR, IT, PL, SK, this gives complete EU-27 VAT validation coverage. Algorithms: mod-97 (BE, NL), weighted sum (DK, EE, FI, HU, LV, MT, PT, SI), ISO 7064 Mod 11,10 (HR), iterative doubling (GR), Luhn (SE), multi-format (BG: EGN/PNF/other, ES: DNI/NIE/CIF, IE: old/new, LT: 9/12-digit, LV: legal/personal, NL: BSN/mod97). Oracle: 0 disagreements with python-stdnum across all 19 new validators (106,000 random inputs, 4 languages, 10 independent implementations). 42 total validators, 27 countries, 247 unit tests.
Cross-check all 27 EU VAT validators against jsvat (independent JS implementation). Results: - 12 countries: 0 disagreements (DE, GR, HR, HU, IE, LT, LU, RO, SE, SI + already covered AT) - 15 countries: jsvat has bugs (confirmed by python-stdnum tiebreaker showing 0 disagreements with our implementation on all formats) Now testing against 3 independent JS VAT libraries (jsvat, validate-polish, ibantools) + python-stdnum + Rust + Ruby = 6 oracle sources across 4 languages.
- Fix BG validateOther doc comment (% 11, not % 10; code was correct, comment was wrong) - Split eu-vat.test.ts into 20 per-country test files per CONTRIBUTING.md convention
Every EU VAT validator now has @see links to official government/OECD documentation: - National tax authority websites (AADE, ANAF, Agencia Tributaria, Revenue.ie, Skatteverket, etc.) - OECD TIN documentation for countries without public algorithmic specs DK: added note that Denmark dropped mod-11 for CPR in 2007; unconfirmed for CVR numbers. Our strict mod-11 check matches python-stdnum but may reject valid newer CVR numbers if the same relaxation was applied. All 20 implementations verified against official specs. No divergences found.
feat: complete EU-27 VAT coverage (Phase 1)
feat: add AT, SK IČO, GB, FR, IT validators
14 new personal identification validators: - BE: NN (National Number, mod-97 dual-century) - BG: EGN (Unified Civil Number, weighted checksum) - DK: CPR (Personal ID, date-only, no checksum) - EE: IK (Isikukood, two-pass weighted checksum) - ES: DNI (National ID, mod-23 letter) - ES: NIE (Foreigner ID, prefix replacement + DNI) - FI: HETU (Personal ID, mod-31 alphanumeric check) - GR: AMKA (Social Security, Luhn) - IE: PPS (Personal Public Service, mod-23) - LT: Asmens kodas (Personal Code, reuses EE IK) - NL: BSN (Citizen Service Number, 11-proof) - RO: CNP (Personal Numeric Code, weighted + county) - SE: Personnummer (Personal ID, Luhn + birth date) - SI: EMŠO (Master Citizen Number, weighted mod-11) Oracle results (200,000 random inputs): - python-stdnum: 0 disagreements on all 14 - stdnum-js (JS): disagreements on 9 countries (confirmed as stdnum-js bugs by python tiebreak) - Rust, Ruby, other JS oracles: 0 56 total validators, 27 countries, 310 unit tests.
- Fix IE PPS length check: max 9, not 10 - Fix BE NN error message: "0..12" not "1..12" (month 0 is valid for counter-exhaustion) - Add ES NIE to oracle cross-validation
- NL BSN: reject all-zeros "000000000" (python- stdnum rejects it; our mod-11 check incorrectly passed since 0 % 11 === 0) - RO CNP: accept gender digit 9 (foreigners with temporary residence; python-stdnum accepts it) - DK CPR: remove future date rejection (python- stdnum does not enforce this; CPR numbers can be pre-assigned for future births)
Added boundary value injection to oracle (all-zeros, all-nines, off-by-one lengths, repeated digits). This immediately caught 3 bugs: - FR NIF: incorrectly rejected all-zeros (python-stdnum accepts: 0 % 511 == 0) - BE VAT: incorrectly accepted all-zeros (python-stdnum rejects) - NL VAT: missing zero-padding for numeric part (8-digit inputs like "41442283B01" must pad to "041442283B01" before validation) Oracle `digs()` generator now mixes 70% random values with 30% targeted edge cases (Hypothesis strategy pattern). Every `digs(n)` call injects all-zeros, all-nines, sequential digits, single repeated digits, and off-by-one lengths.
Belgian VAT numbers before 2007 were 9 digits.
Official SPF Finances spec says older 9-digit
numbers should start with a leading zero. Added
zero-padding in compact().
Verified against:
- Official: finance.belgium.be (pre-2007 format)
- python-stdnum: compact('990246769') → '0990246769'
- jsvat: accepts both 9 and 10-digit forms
- Oracle: 5/5 runs with 0 disagreements
This bug was found by the Hypothesis-style edge
case injection (digs(9) generates 9-digit values
that exercise the padding path).
Mutant testing: for each valid value, corrupt single digits and verify the checksum rejects them. Proves checksum strength per algorithm: 100% detection: IBAN (mod-97), Luhn, DE VAT (ISO 7064), NL BSN (11-proof), HR OIB, PL NIP, FR SIREN, IT IVA, BE NN ~96-98%: CZ IČO, CZ RČ, EE IK, SI EMŠO, GB UTR (inherent mod-11 limitation, not bugs) Also: - Bump default sample count from 2K to 10K - Configurable via ORACLE_SAMPLES env var - Mutant escapes are informational, not failures
Extract duplicated code into shared modules: - _util/date.ts: isValidDate (was in 11 files) - _util/result.ts: err() helper (was in 56 files) - _checksums/mod1110.ts: ISO 7064 Mod 11,10 (was in de/vat, de/idnr, hr/vat) - Replace 14 inline weighted-sum loops with shared weightedSum (LV personal kept inline: non-zero initial sum incompatible with shared fn) - Hoist centuryMap to module level in ee/ik, ro/cnp - Fix import paths in de/vat, de/idnr (relative → #util/* aliases) - Restore DK CPR future date rejection (python- stdnum does enforce it, contrary to earlier claim) 311 tests pass, oracle verified.
- SI EMŠO: fix year threshold from 800 to 900 per official JMBG standard (Wikipedia, JMBG spec). python-stdnum uses 800 but the standard says 900. No practical difference (800-899 range has no living citizens) but matches the official spec. - Oracle: expand IE PPS to cover 9-char new format (7 digits + check letter + A/B/H) - Oracle: expand FI HETU to cover all 13 separators (+, -, Y, X, W, V, U, A, B, C, D, E, F) - Oracle: expand SE Personnummer to cover + separator and 12-digit format
feat: add EU personal ID validators (Phase 2)
8 new validators for Switzerland, Norway, Iceland: - CH: UID (enterprise ID, weighted mod-11), VAT (UID + MWST/TVA/IVA/TPV suffix), SSN/AHV (EAN-13 checksum, 756 prefix) - NO: Organisasjonsnummer (weighted mod-11), MVA (orgnr + MVA suffix), Fødselsnummer (two check digits, D/H-number support, century from serial) - IS: Kennitala (weighted mod-11, org day+40), VSK (format-only, 5-6 digits) Oracle: 0 disagreements with python-stdnum on all 5 checksum-based validators (2000 samples each). 64 total validators, 30 countries, 359 tests.
- NO MVA: uppercase before prefix check (fixes mixed-case "No" input not being stripped) - CH UID: simplify compact to just clean+uppercase (remove redundant prefix reconstruction) - IS Kennitala: fix doc comment weights (doc said [3,2,7,6,5,4,3,2,1,0] but code correctly has weights only on first 8 digits)
Cross-check all EEA/EFTA validators against: - stdnum-js (CH UID, CH SSN, NO Orgnr, NO Fødselsnummer, IS Kennitala) - jsvat (CH VAT, NO VAT) Results: 0 disagreements with python-stdnum and stdnum-js (except IS Kennitala: 60 stdnum-js bugs, confirmed by python tiebreaker). Also verified IS Kennitala century digit 8 against official Þjóðskrá documentation (skra.is): only digits 0 and 9 are valid. Digit 8 is not documented in the official spec.
format() now checks for MVA suffix before slicing, preventing garbled output when called with input that lacks the suffix.
* feat: add generate() to structural validators * fix: address review comments - us/itin: use randomInt instead of Math.random for group selection (was causing unused import) - us/ssn: add blacklist check in generate() loop to prevent producing known-invalid SSNs; drop unused randomDigits import - de/handelsreg, us/ein: use randomInt for array selection, consistent with all other generators
* fix: resolve all strict TypeScript errors (noUncheckedIndexedAccess) - Use `.charAt()` instead of bracket indexing on strings to avoid `string | undefined` return type - Add `?? 0` fallback for readonly tuple/array weight lookups in bounded loops - Add explicit undefined guard for `cy/vat` checkLetter - Fix `de/idnr` digit distribution counter to avoid unchecked indexed increment - Remove unused imports: `randomDigits` (ba/jmbg, ro/cnp), `isdigits` (mu/brn) All 4184 tests pass. The tsconfig already had `strict: true` and `noUncheckedIndexedAccess: true` enabled; these fixes make the codebase actually clean under those settings. * fix: port full Stella strict tsconfig (noImplicitOverride, noImplicitReturns, useUnknownInCatchVariables) * chore: bump to 0.1.1 * fix: remove redundant strictNullChecks and useUnknownInCatchVariables Both are already implied by strict: true. Keeping them explicitly was misleading, as Greptile flagged in review.
* feat: add compiled dist output via tsup Add tsup build step to emit .js + .d.ts so consumers with skipLibCheck: true get clean builds. Bump to 0.2.0. * fix: align main and default export to compiled dist output Both `main` and the `default` export condition were still pointing to `src/index.ts`, which breaks non-ESM and legacy-resolution consumers that cannot process raw TypeScript. * fix: point all 177 sub-path exports to dist/ Only the root export "." was updated to use conditional exports with dist/ paths. All other sub-path exports still pointed to ./src/*.ts, causing compiled consumers to receive raw TypeScript. - Update sync-exports.ts to generate conditional export objects with types/import/default fields pointing to dist/ - Update tsup.config.ts to build all src/**/*.ts entry points (bundle: false) with es2022 target for Node 18 compatibility - Add tsconfig.build.json for tsc --emitDeclarationOnly (tsup DTS generation runs out of memory with 177+ entries) - Update imports map (#checksums/*, #util/*) to resolve to dist/ - Change build script to: tsup (JS) + tsc (declarations) * fix: conditional exports (types/import/default) for all 177 sub-paths - All exports now resolve to dist/ for compiled consumers - tsup builds all entry points (bundle: false, es2022) - tsc emits .d.ts via tsconfig.build.json - Fixes Greptile concern: sub-path imports no longer serve raw .ts
Align with Stella's strict TypeScript settings. This was the only missing compiler option.
* chore: migrate from tsup to tsdown * fix: restore bundle: false and simplify dts config
* docs: update README to reflect current API surface The README was severely outdated: documented 4 countries (97 supported), 3 international validators (7 exist), a minimal Validator type (missing 7 optional properties), and 76 tests (now 4,185). Updated all sections to match the actual codebase. * fix: correct detectNetwork usage and country count - detectNetwork is a top-level named export, not a method on the creditcard validator object - Country count is 96 (eu is in International section, not a country namespace)
* fix: use per-group char classes in toRegex to prevent false positives
toRegex() inferred a single character class for all positions from
examples. For validators where the compact form mixes letters and
digits in distinct positions (e.g., German SVNR "12010188M011"),
this produced [A-Z0-9] for every group. The overly broad pattern
matched all-caps prose like "OF NOVEMBER 6" as a valid candidate,
consuming the span and preventing the correct date pattern from
firing.
Add inferPerGroupInfo() which derives per-group character classes
from the formatted output. For SVNR, "12 010188 M 01 1" now
produces \d{2} \d{6} [A-Z]{1} \d{2} \d{1} instead of [A-Z0-9]
for all positions. Letter-only groups before the first digit group
(format-prepended prefixes like "CHE") are still excluded.
* fix: address review comments
- Extract shared charClassFor helper; inferCharClass now delegates
to it instead of duplicating the letter/digit scanning logic
- Add regression tests for de.svnr: per-group char class matching
and all-caps prose rejection
* feat: strengthen validator correctness and repo hygiene * Address bot review feedback
* Set up shared AI commands * Sync shared AI command files * Clarify AI sync prerequisites
|
| Filename | Overview |
|---|---|
| package.json | Merges duplicate scripts blocks into one and adds link-codex entry; bumps tsdown devDependency from ^0.12.4 to ^0.21.6. |
| tsdown.config.ts | Adds fixedExtension: false to keep .js/.d.ts output extensions aligned with package.json exports after the tsdown 0.21 upgrade. |
| scripts/link-codex-skills.sh | New wrapper script with two-stage guard (submodule directory presence + target file existence) before delegating to the shared submodule script. |
| scripts/sync-ai-skills.sh | Improved guard logic: now checks for directory existence and non-empty state before checking for the target file, with clearer error messages for each case. |
| CONTRIBUTING.md | Adds sync layout diagram, bun run link-codex documentation, and clarifies CODEX_SKILL_PREFIX usage. |
| bun.lock | Lock file regenerated for the tsdown ^0.12.4 → ^0.21.6 bump; pulls in @babel/generator@8.0.0-rc.3 and related transitive dependencies as part of the new rolldown-plugin-dts dependency tree. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["bun run sync-ai"] --> B["scripts/sync-ai-skills.sh"]
B --> C{".ai/shared dir\npresent & non-empty?"}
C -- No --> D["exit 1\n(submodule not init'd)"]
C -- Yes --> E{".ai/shared/scripts/\nsync-ai-skills.sh exists?"}
E -- No --> F["exit 1\n(incompatible submodule)"]
E -- Yes --> G["bash .ai/shared/scripts/sync-ai-skills.sh ."]
G --> H[".claude/commands/skill.md"]
G --> I[".agents/skills/skill/SKILL.md"]
J["bun run link-codex"] --> K["scripts/link-codex-skills.sh"]
K --> L{".ai/shared dir\npresent & non-empty?"}
L -- No --> M["exit 1\n(submodule not init'd)"]
L -- Yes --> N{".ai/shared/scripts/\nlink-codex-skills.sh exists?"}
N -- No --> O["exit 1\n(incompatible submodule)"]
N -- Yes --> P["bash .ai/shared/scripts/link-codex-skills.sh ."]
P --> Q["CODEX_HOME/skills/stdnum-skill -> SKILL.md"]
Reviews (4): Last reviewed commit: "docs: explain tsdown fixedExtension sett..." | Re-trigger Greptile
48008ab to
fdae3a0
Compare
What changed
This branch adds the repo's Codex/AI command and skill wiring, including the shared AI submodule integration, generated skill layouts for Claude and Codex, contributor documentation updates, and the new
link-codexhelper script.It also fixes a manifest regression introduced during that work:
package.jsonhad ended up with twoscriptsobjects. Bun only surfaced the first one, which brokebun run lint,bun run typecheck,bun run build, and theprepublishOnlyrelease path.Why
The branch is meant to make the repo's AI helper workflow reproducible for contributors using both Claude-style commands and Codex-style skills. While reviewing the repo state, the duplicate
scriptskey showed up as a blocker because it made the documented Bun workflow and the Bun-driven publish path unreliable.Impact
Contributors can now use the documented Bun commands again.
CI/build tooling that depends on
bun runcan resolve the expected scripts.The Codex skill sync/link workflow stays available without shadowing the rest of the package automation.
Root cause
The new Codex helper scripts were added in a second top-level
scriptsblock instead of being merged into the existing one. JSON accepts the duplicate key, but Bun did not behave like npm here, so the broken state was easy to miss unless you exercised the Bun-native commands directly.Validation
bun run lintbun run typecheckbun testbun run prepublishOnly