fix: tokenize _ in ngram#24935
Conversation
|
|
gitar review this PR |
Code Review SummaryI've reviewed PR #24935. This is a well-executed bug fix for Elasticsearch tokenization. Here's my assessment: ✅ Strengths1. Precise Configuration Fix
2. Excellent Test Coverage
3. Documentation Quality
🔍 What This FixesProblem: Wildcard query Solution: Preserve underscores in edge n-gram tokens so 💡 Minor Suggestions
✅ Verdict: ApprovedThis is production-ready code with proper testing and documentation. The fix is well-targeted and the implementation follows OpenMetadata conventions. |



Describe your changes:
Fixes
I worked on ... because ...
Type of change:
Checklist:
Fixes <issue-number>: <short explanation>Summary by Gitar
edge_ngram_tokenizerintest_case_index_mapping.jsonto preserve underscores viacustom_token_chars: "_"*api_e*matchedapi_service_entity_test(now only matches substrings containing "api_e")test_listTestCasesFromSearch_wildcardQuery(107 lines) validating wildcard patterns with strategic test case namesThis will update automatically on new commits.