Add Finnish recognizers: FI_HETU (henkilötunnus) and FI_BUSINESS_ID (Y-tunnus)#4
Merged
Merged
Conversation
…Y-tunnus) Finland had no pattern recognizers, so Finnish national identifiers slipped through. This adds two, following the existing regex + context + score design (no engine changes): - FI_HETU: DDMMYY + century marker (+,-,A–F,U–Y) + 3-digit individual number + control character. The control char is restricted to "0123456789ABCDEFHJKLMNPRSTUVWXY" (no G/I/O/Q) and the date is validated by the regex, so the shape is highly specific — base score 0.85. - FI_BUSINESS_ID (Y-tunnus): 7 digits + hyphen + check digit. The bare shape is generic, so base score 0.4 leans on the +0.35 context boost (mirrors DE_TAX_ID, FR_CNI). Registered in entity-types.ts (SUPPORTED_ENTITIES + TAG_NAMES: FI_HETU, FI_YID) and README coverage updated. Tests: tests/finnish-recognizers-test.ts (11 assertions) — valid HETU across 1800s/1900s/2000s century markers, rejection of invalid day/month and forbidden control chars, Y-tunnus detection with context, rejection of wrong shapes. Run with `npx tsx tests/finnish-recognizers-test.ts`; all pass. Did not run the full ONNX-dependent build locally — CI (tsc --noEmit + smoke) covers integration.
akunikkola
added a commit
to akunikkola/claude-for-legal-finland
that referenced
this pull request
May 27, 2026
…path Wire PII Shield (local PII anonymization — "PII never enters the API") into the toolkit as a recommended companion: - tietosuoja/CLAUDE.md: new "Anonymisointi ennen analyysiä" guardrail - tietosuoja/README.md: install paths (official .mcpb/CLI) + how to get Finnish HETU/Y-tunnus recognizers immediately by building our fork (akunikkola/PII-Shield feat/finnish-recognizers), pending upstream gregmos/PII-Shield#4 - juristi/CLAUDE.md: confidentiality guardrail now points to PII Shield Not added to .mcp.json: PII Shield ships as a .mcpb extension / CLI, not a portable npx MCP server, and the FI recognizers live in our fork until the PR merges — so it's a local, machine-bound stdio setup documented in the README rather than a portable plugin connector. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Owner
|
Thanks for the contribution! Clean PR — recognizers follow the existing Merging via squash. Will ship in v2.2.0 shortly. |
gregmos
added a commit
that referenced
this pull request
May 28, 2026
Bump package, manifest, VERSION constant, README asset links, and smoke-banner from 2.1.0 to 2.2.0. New in this release: Finnish recognizers FI_HETU and FI_BUSINESS_ID (PR #4 by @akunikkola). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
akunikkola
added a commit
to akunikkola/claude-for-legal-finland
that referenced
this pull request
May 28, 2026
gregmos/PII-Shield#4 (Finnish HETU + Y-tunnus recognizers) was merged and released as v2.2.0 on 2026-05-28. Official npm `pii-shield@2.2.0` and the v2.2.0 .mcpb releases now ship FI_HETU and FI_BUSINESS_ID natively — no fork needed. - tietosuoja/README.md: replace the "build from fork" stopgap with three official install paths: A) .mcpb in Claude Desktop (recommended), B) npm install -g pii-shield (CLI), C) build server bundle for a local stdio MCP entry. Cite PR #4 as the merged origin of FI support. - tietosuoja/CLAUDE.md: update Finnish-identifiers note — v2.2.0 supports FI_HETU and FI_BUSINESS_ID natively; still verify results. Local PII-Shield checkout updated to upstream main (v2.2.0) and rebuilt; the user-scope pii-shield MCP keeps working at the same path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds two Finnish national-identifier recognizers. Finland had no pattern recognizers, so Finnish PII (personal identity code, business ID) slipped through.
FI_HETU— henkilötunnus (personal identity code):DDMMYY+ century marker (+,-,A–F,U–Y) + 3‑digit individual number + control character. The control char is restricted to0123456789ABCDEFHJKLMNPRSTUVWXY(noG/I/O/Q) and the date is validated by the regex, so the shape is highly specific — base score 0.85.FI_BUSINESS_ID— Y‑tunnus: 7 digits + hyphen + check digit. The bare shape (\d{7}-\d) is generic, so base score 0.4 leans on the+0.35context boost — same approach asDE_TAX_ID,FR_CNI, etc.Design
Follows the existing
regex + context + scorerecognizer design exactly — no engine changes. Registered inentity-types.ts(SUPPORTED_ENTITIES+TAG_NAMES:FI_HETU,FI_YID).Tests
nodejs-v2/tests/finnish-recognizers-test.ts(11 assertions, same standalone style as the other tests):G)y-tunnus/yritystunnuscontextRun:
npx tsx tests/finnish-recognizers-test.ts→ 11 passed, 0 failed.Context: contributed while building an open-source Finnish legal toolkit for Claude that recommends PII Shield for anonymisation — Finnish identifier coverage was the one gap.