Skip to content

fix: use word boundaries for issue label matching#55

Merged
harlan-zw merged 1 commit intomainfrom
fix/issue-label-word-boundary
Mar 24, 2026
Merged

fix: use word boundaries for issue label matching#55
harlan-zw merged 1 commit intomainfrom
fix/issue-label-word-boundary

Conversation

@oritwoen
Copy link
Copy Markdown
Collaborator

@oritwoen oritwoen commented Mar 24, 2026

labelMatchesAny was using String.includes() to check if a label contained a keyword, which caused substring false positives. Labels like debug matched keyword bug, docker matched doc, prefix matched fix - all getting misclassified in the type quota system and stealing slots from actual bugs/docs issues.

Switched to a single pre-compiled regex with \b word boundaries per keyword set (cached via WeakMap so sets that don't change only compile once). Emoji-prefixed labels like :lady_beetle: bug still match correctly since \b triggers at the colon/space boundary.

Added 15 unit tests covering the edge cases.

Summary by CodeRabbit

  • Bug Fixes

    • Improved label matching behavior to reduce false positives by using word boundary detection instead of substring matching when filtering issues by labels.
  • Tests

    • Added unit test coverage for label matching and issue classification to validate accuracy and ensure proper handling of edge cases.

labelMatchesAny was using String.includes() to match labels against
keyword sets, causing false positives on substrings. A label like
"debug" matched keyword "bug", "docker" matched "doc", and "prefix"
matched "fix", misclassifying issues in the type quota system.

Switched to a single pre-compiled regex with \b word boundaries per
keyword set (cached via WeakMap). Emoji-prefixed labels like
":lady_beetle: bug" still match correctly since \b triggers at the
colon/space boundary.
@oritwoen oritwoen self-assigned this Mar 24, 2026
@oritwoen oritwoen requested a review from harlan-zw March 24, 2026 14:17
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 24, 2026

📝 Walkthrough

Walkthrough

The changes refactor label matching logic in issue handling to use word-boundary regex patterns instead of substring matching, improving accuracy for keyword identification. The labelMatchesAny function is exported and backed by a regex cache for performance. A comprehensive test suite validates the updated matching behavior and issue classification logic.

Changes

Cohort / File(s) Summary
Label Matching Logic
src/sources/issues.ts
Implemented regex-based word-boundary matching with caching via labelRegexCache and getLabelRegex(). Replaced substring matching in labelMatchesAny with exact or word-boundary pattern matching. Exported the function and updated JSDoc to reflect the new behavior and elimination of substring false positives.
Test Coverage
test/unit/issues.test.ts
Added comprehensive unit tests for labelMatchesAny and classifyIssue. Tests validate exact matches, emoji-prefixed labels, hyphen-separated forms, case-sensitivity, keyword boundary detection (e.g., doc vs docker), and correct issue classification priorities (bug > question > other).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hop, hop—the labels now align!
Word boundaries drawn with regex fine,
No more false whiskers from partial strings,
Just exact matches—oh, the clarity it brings!
With tests as thorough as a burrow deep,
These labels will truth forever keep.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately summarizes the main change: implementing word boundaries for issue label matching to fix substring false positives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/issue-label-word-boundary

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/unit/issues.test.ts (1)

20-23: Verify emoji shortcode matching is intentional behavior.

Line 22 tests that :bug: report matches the keyword bug. This works because the colons in :bug: create word boundaries, causing "bug" inside the shortcode to match. If the intent is only to match labels like :lady_beetle: bug (emoji followed by keyword), the current regex would also match shortcodes that happen to contain keywords (e.g., :bug:, :firefix: would match "fix").

If this is the desired behavior, consider adding a brief comment in the test explaining that emoji shortcodes containing keywords are intentionally matched.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/unit/issues.test.ts` around lines 20 - 23, Test currently asserts that
labelMatchesAny(':bug: report', BUG_KEYWORDS) returns true because the regex
treats colons as word boundaries and matches "bug" inside an emoji shortcode; to
address the review, either (preferred) add a short inline comment above the test
case explaining that emoji shortcodes containing keywords (e.g., ':bug:') are
intentionally considered matches for labelMatchesAny against BUG_KEYWORDS, or
(if unintended) update the matching logic in labelMatchesAny to ignore text
inside colon-delimited shortcodes (i.e., skip or strip /:[^:]+:/ sequences
before applying BUG_KEYWORDS) so ':bug:' no longer triggers a match.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/unit/issues.test.ts`:
- Around line 20-23: Test currently asserts that labelMatchesAny(':bug: report',
BUG_KEYWORDS) returns true because the regex treats colons as word boundaries
and matches "bug" inside an emoji shortcode; to address the review, either
(preferred) add a short inline comment above the test case explaining that emoji
shortcodes containing keywords (e.g., ':bug:') are intentionally considered
matches for labelMatchesAny against BUG_KEYWORDS, or (if unintended) update the
matching logic in labelMatchesAny to ignore text inside colon-delimited
shortcodes (i.e., skip or strip /:[^:]+:/ sequences before applying
BUG_KEYWORDS) so ':bug:' no longer triggers a match.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 658e565b-e837-4bf9-a526-2685f3483c6e

📥 Commits

Reviewing files that changed from the base of the PR and between 8b54a9a and c84ca54.

📒 Files selected for processing (2)
  • src/sources/issues.ts
  • test/unit/issues.test.ts

@harlan-zw harlan-zw merged commit c3f56b5 into main Mar 24, 2026
2 checks passed
@harlan-zw
Copy link
Copy Markdown
Collaborator

Nice one

@harlan-zw harlan-zw deleted the fix/issue-label-word-boundary branch March 24, 2026 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants