Conversation
…atical operators Extends the Step 3 regex in hardenUnicodeText() to cover U+2061 (FUNCTION APPLICATION), U+2062 (INVISIBLE TIMES), U+2063 (INVISIBLE SEPARATOR), and U+2064 (INVISIBLE PLUS). These Unicode Cf characters are invisible in all renderers, not removed by NFKC normalization, and were previously passing through sanitizeContent() unchanged — allowing secret-like patterns fragmented with invisible operators to bypass static regex detection while remaining visually identical. Also updates both copies of threat_detection.md to instruct the LLM to check for invisible-operator fragmentation alongside existing encoded-representation and homoglyph checks. Agent-Logs-Url: https://github.com/github/gh-aw/sessions/94dfa88e-fdbd-4476-a118-3d070e17dbc0 Co-authored-by: szabta89 <1330202+szabta89@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Updates Unicode hardening and related threat-detection guidance to prevent secrets/payloads from being visually preserved while byte-wise altered using invisible mathematical operator format characters (U+2061–U+2064).
Changes:
- Extend
hardenUnicodeText()to strip U+2061–U+2064 (invisible mathematical operators) alongside existing zero-width/format characters. - Add unit tests covering each operator and multi-operator removal, including a fragmentation bypass scenario.
- Update threat detection prompt docs to explicitly call out “Invisible Operator Fragmentation” as a secret-leak evasion pattern.
Show a summary per file
| File | Description |
|---|---|
actions/setup/js/sanitize_content_core.cjs |
Expands the “strip invisible characters” regex to remove U+2061–U+2064 during Unicode hardening. |
actions/setup/js/sanitize_content.test.cjs |
Adds tests verifying removal of U+2061–U+2064 and a fragmentation-style bypass case. |
pkg/workflow/prompts/threat_detection.md |
Documents invisible-operator fragmentation as a secret leak detection heuristic. |
actions/setup/md/threat_detection.md |
Mirrors the same threat-detection documentation update for the setup action. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 4/4 changed files
- Comments generated: 1
| it("should strip U+2061-U+2064 used to fragment a secret-like marker", () => { | ||
| // Simulate a secret fragmented with invisible operators to bypass static detection | ||
| const marker = "SECRET"; | ||
| const fragmented = marker.split("").join("\u2061"); | ||
| const result = sanitizeContent(fragmented); | ||
| expect(result).toBe(marker); |
There was a problem hiding this comment.
Test description says it strips U+2061–U+2064 fragmentation, but the test only inserts U+2061. Either adjust the title/assertions to match U+2061 specifically, or parameterize the test to cover U+2062–U+2064 as well so the name reflects what’s actually validated.
🧪 Test Quality Sentinel ReportTest Quality Score: 85/100✅ Excellent test quality
Test Classification DetailsView all 6 tests
Flagged Tests — Requires ReviewNo tests require mandatory review. One advisory note is included below. i️ Per-character cluster (tests 1–4) — advisory onlyClassification: Design tests, minor duplication Language SupportTests analyzed:
Scoring Breakdown
Verdict
📖 Understanding Test ClassificationsDesign Tests (High Value) verify what the system does:
Implementation Tests (Low Value) verify how the system does it:
Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators. References: §24834484682
|
|
@copilot review all comments |
…ally Agent-Logs-Url: https://github.com/github/gh-aw/sessions/acd023ee-4af1-44c5-947d-31f18788242f Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
U+2061–U+2064 (FUNCTION APPLICATION, INVISIBLE TIMES, INVISIBLE SEPARATOR, INVISIBLE PLUS) are Unicode
Cfformat characters that are invisible in all renderers, survive NFKC normalization, and were passing throughsanitizeContent()unchanged. A secret fragmented by inserting these characters between every character is byte-distinct from the plain pattern but visually identical, defeating static regex detection.Changes
sanitize_content_core.cjs— Extend the Step 3 regex inhardenUnicodeText()to cover U+2061–U+2064 via range notation:sanitize_content.test.cjs— Add tests for each of U+2061–U+2064 individually, a fragmentation-bypass scenario (marker split with\u2061reassembles to plaintext), and multi-character removal.threat_detection.md(bothactions/setup/md/andpkg/workflow/prompts/) — Add an Invisible Operator Fragmentation bullet under the Secret Leak section so the LLM evaluator is explicitly instructed to recognize this bypass pattern alongside encoded representations and homoglyph substitution.