-
Notifications
You must be signed in to change notification settings - Fork 9
Support Unicode characters in keywords check #46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances the keywords check functionality to properly support Unicode characters in keyword matching. The previous implementation used \b word boundaries which don't work correctly with non-ASCII characters like Chinese, Arabic, or Cyrillic scripts.
Key Changes:
- Replaced simple
\bword boundaries with Unicode-aware boundary detection using\p{L}and\p{N}patterns - Implemented per-keyword boundary logic that handles keywords starting or ending with punctuation
- Added comprehensive test coverage for Unicode characters, mixed scripts, and edge cases with special characters
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/checks/keywords.ts | Implements Unicode-aware word boundary detection with lookahead/lookbehind assertions and adds helper function to determine word characters |
| src/tests/unit/checks/keywords-urls.test.ts | Adds extensive test cases covering partial word matching, Unicode scripts (Chinese, Arabic, Cyrillic), special character handling, and mixed script scenarios |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@codex please review |
|
Codex Review: Didn't find any major issues. Keep them coming! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Adds Unicode support for keyword search as suggested by @yehorkardash in PR 41
\bin regex doesn't work so well with Unicode characters. Replaced it with\pto be compatible with Unicode characters