Fixes 27911: recognizer inclusion based on language#27919
Merged
Conversation
harshach
approved these changes
May 5, 2026
Contributor
🟡 Playwright Results — all passed (9 flaky)✅ 4003 passed · ❌ 0 failed · 🟡 9 flaky · ⏭️ 86 skipped
🟡 9 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
mohittilala
approved these changes
May 6, 2026
Code Review ✅ ApprovedRefactors PII recognition logic by centralizing language validation and ensuring language-agnostic recognizers are correctly included. Includes new regression tests to verify consistent behavior across language settings; no issues found. OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
Contributor
|
Changes have been cherry-picked to the 1.12.7 branch. |
github-actions Bot
pushed a commit
that referenced
this pull request
May 7, 2026
(cherry picked from commit adcdd34)
Contributor
|
Changes have been cherry-picked to the 1.13 branch. |
github-actions Bot
pushed a commit
that referenced
this pull request
May 7, 2026
(cherry picked from commit adcdd34)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Summary
Fixes #27911
TagAnalyzer.get_recognizers_by()was silently dropping recognizers configured withsupportedLanguage = any(language-agnostic) whenever the auto-classification agent ran with a specific language (e.g.en).The old two-clause filter:
evaluated
"any" != "en"→Truefor any-language recognizers, causing them to be skipped regardless of intent. Theanysentinel is supposed to mean "this recognizer works for all languages".Changes
ingestion/src/metadata/pii/tag_analyzer.py: extracted the language-filter condition into a_supports_language(created)method with correct positive logic — a recognizer is included if the agent is inany-mode, OR the recognizer itself isany-language, OR the languages match exactly.ingestion/tests/unit/metadata/pii/test_language_filtering.py: addedTestAnyLanguageRecognizerPassthroughwith three regression tests covering the fixed case (any-recognizer + specific-language agent), the already-passing case (any-recognizer +any-agent), and the existing exclusion behavior (mismatched specific languages).Test plan
pytest ingestion/tests/unit/metadata/pii/test_language_filtering.py::TestAnyLanguageRecognizerPassthrough -v— 3 new tests passpytest ingestion/tests/unit/metadata/pii/ -v— full PII unit suite passes (no regressions)