Skip to content

Fixes 27911: recognizer inclusion based on language#27919

Merged
edg956 merged 1 commit into
mainfrom
issue-27911
May 7, 2026
Merged

Fixes 27911: recognizer inclusion based on language#27919
edg956 merged 1 commit into
mainfrom
issue-27911

Conversation

@edg956
Copy link
Copy Markdown
Contributor

@edg956 edg956 commented May 5, 2026

Summary

Fixes #27911

TagAnalyzer.get_recognizers_by() was silently dropping recognizers configured with supportedLanguage = any (language-agnostic) whenever the auto-classification agent ran with a specific language (e.g. en).

The old two-clause filter:

if (
    self._language is not ClassificationLanguage.any
    and created.supported_language != self._language.value
):
    continue

evaluated "any" != "en"True for any-language recognizers, causing them to be skipped regardless of intent. The any sentinel is supposed to mean "this recognizer works for all languages".

Changes

  • ingestion/src/metadata/pii/tag_analyzer.py: extracted the language-filter condition into a _supports_language(created) method with correct positive logic — a recognizer is included if the agent is in any-mode, OR the recognizer itself is any-language, OR the languages match exactly.
  • ingestion/tests/unit/metadata/pii/test_language_filtering.py: added TestAnyLanguageRecognizerPassthrough with three regression tests covering the fixed case (any-recognizer + specific-language agent), the already-passing case (any-recognizer + any-agent), and the existing exclusion behavior (mismatched specific languages).

Test plan

  • pytest ingestion/tests/unit/metadata/pii/test_language_filtering.py::TestAnyLanguageRecognizerPassthrough -v — 3 new tests pass
  • pytest ingestion/tests/unit/metadata/pii/ -v — full PII unit suite passes (no regressions)

@edg956 edg956 requested a review from a team as a code owner May 5, 2026 20:11
@github-actions github-actions Bot added the safe to test Add this label to run secure Github workflows on PRs label May 5, 2026
@edg956 edg956 added the To release Will cherry-pick this PR into the release branch label May 5, 2026
@edg956 edg956 self-assigned this May 5, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

🟡 Playwright Results — all passed (9 flaky)

✅ 4003 passed · ❌ 0 failed · 🟡 9 flaky · ⏭️ 86 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 298 0 1 4
🟡 Shard 2 750 0 2 8
🟡 Shard 3 760 0 1 7
🟡 Shard 4 773 0 2 18
🟡 Shard 5 686 0 1 41
🟡 Shard 6 736 0 2 8
🟡 9 flaky test(s) (passed on retry)
  • Pages/AuditLogs.spec.ts › should apply both User and EntityType filters simultaneously (shard 1, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event shows the actor who made the change (shard 2, 1 retry)
  • Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Pages/DataContractsSemanticRules.spec.ts › Validate Description Rule Is_Set (shard 4, 1 retry)
  • Pages/DomainAdvanced.spec.ts › Remove multiple assets from domain at once (shard 4, 1 retry)
  • Pages/EntityDataConsumer.spec.ts › Tier Add, Update and Remove (shard 5, 1 retry)
  • Pages/GlossaryImportExport.spec.ts › Glossary Bulk Import Export (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented May 6, 2026

Code Review ✅ Approved

Refactors PII recognition logic by centralizing language validation and ensuring language-agnostic recognizers are correctly included. Includes new regression tests to verify consistent behavior across language settings; no issues found.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 6, 2026

@edg956 edg956 merged commit adcdd34 into main May 7, 2026
48 checks passed
@edg956 edg956 deleted the issue-27911 branch May 7, 2026 00:16
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Changes have been cherry-picked to the 1.12.7 branch.

github-actions Bot pushed a commit that referenced this pull request May 7, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Changes have been cherry-picked to the 1.13 branch.

github-actions Bot pushed a commit that referenced this pull request May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

governance Ingestion safe to test Add this label to run secure Github workflows on PRs To release Will cherry-pick this PR into the release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auto-classification skips 'any'-language recognizers when agent runs with a specific language (e.g. 'en')

3 participants