fix: strip U+2061–U+2064 invisible mathematical operators in hardenUnicodeText by Copilot · Pull Request #28037 · github/gh-aw

Copilot · 2026-04-23T11:45:07Z

U+2061–U+2064 (FUNCTION APPLICATION, INVISIBLE TIMES, INVISIBLE SEPARATOR, INVISIBLE PLUS) are Unicode Cf format characters that are invisible in all renderers, survive NFKC normalization, and were passing through sanitizeContent() unchanged. A secret fragmented by inserting these characters between every character is byte-distinct from the plain pattern but visually identical, defeating static regex detection.

Changes

sanitize_content_core.cjs — Extend the Step 3 regex in hardenUnicodeText() to cover U+2061–U+2064 via range notation:

// Before
result = result.replace(/[\u00AD\u034F\u200B\u200C\u200D\u200E\u200F\u2060\uFEFF]/g, "");
// After
result = result.replace(/[\u00AD\u034F\u200B-\u200F\u2060-\u2064\uFEFF]/g, "");

sanitize_content.test.cjs — Add tests for each of U+2061–U+2064 individually, a fragmentation-bypass scenario (marker split with \u2061 reassembles to plaintext), and multi-character removal.
threat_detection.md (both actions/setup/md/ and pkg/workflow/prompts/) — Add an Invisible Operator Fragmentation bullet under the Secret Leak section so the LLM evaluator is explicitly instructed to recognize this bypass pattern alongside encoded representations and homoglyph substitution.

…atical operators Extends the Step 3 regex in hardenUnicodeText() to cover U+2061 (FUNCTION APPLICATION), U+2062 (INVISIBLE TIMES), U+2063 (INVISIBLE SEPARATOR), and U+2064 (INVISIBLE PLUS). These Unicode Cf characters are invisible in all renderers, not removed by NFKC normalization, and were previously passing through sanitizeContent() unchanged — allowing secret-like patterns fragmented with invisible operators to bypass static regex detection while remaining visually identical. Also updates both copies of threat_detection.md to instruct the LLM to check for invisible-operator fragmentation alongside existing encoded-representation and homoglyph checks. Agent-Logs-Url: https://github.com/github/gh-aw/sessions/94dfa88e-fdbd-4476-a118-3d070e17dbc0 Co-authored-by: szabta89 <1330202+szabta89@users.noreply.github.com>

Copilot

Pull request overview

Updates Unicode hardening and related threat-detection guidance to prevent secrets/payloads from being visually preserved while byte-wise altered using invisible mathematical operator format characters (U+2061–U+2064).

Changes:

Extend hardenUnicodeText() to strip U+2061–U+2064 (invisible mathematical operators) alongside existing zero-width/format characters.
Add unit tests covering each operator and multi-operator removal, including a fragmentation bypass scenario.
Update threat detection prompt docs to explicitly call out “Invisible Operator Fragmentation” as a secret-leak evasion pattern.

Show a summary per file

File	Description
`actions/setup/js/sanitize_content_core.cjs`	Expands the “strip invisible characters” regex to remove U+2061–U+2064 during Unicode hardening.
`actions/setup/js/sanitize_content.test.cjs`	Adds tests verifying removal of U+2061–U+2064 and a fragmentation-style bypass case.
`pkg/workflow/prompts/threat_detection.md`	Documents invisible-operator fragmentation as a secret leak detection heuristic.
`actions/setup/md/threat_detection.md`	Mirrors the same threat-detection documentation update for the setup action.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 4/4 changed files
Comments generated: 1

Copilot · 2026-04-23T12:14:34Z

+      it("should strip U+2061-U+2064 used to fragment a secret-like marker", () => {
+        // Simulate a secret fragmented with invisible operators to bypass static detection
+        const marker = "SECRET";
+        const fragmented = marker.split("").join("\u2061");
+        const result = sanitizeContent(fragmented);
+        expect(result).toBe(marker);


Test description says it strips U+2061–U+2064 fragmentation, but the test only inserts U+2061. Either adjust the title/assertions to match U+2061 specifically, or parameterize the test to cover U+2062–U+2064 as well so the name reflects what’s actually validated.

github-actions · 2026-04-23T12:16:00Z

🧪 Test Quality Sentinel Report

Test Quality Score: 85/100

✅ Excellent test quality

Metric	Value
New/modified tests analyzed	6
✅ Design tests (behavioral contracts)	6 (100%)
⚠️ Implementation tests (low value)	0 (0%)
Tests with error/edge cases	6 (100%)
Duplicate test clusters	1
Test inflation detected	Yes (38 test lines / 5 prod lines ≈ 7.6:1)
🚨 Coding-guideline violations	None

Test Classification Details

View all 6 tests

Test	File	Classification	Issues Detected
`should remove invisible mathematical operator FUNCTION APPLICATION (U+2061)`	`sanitize_content.test.cjs`	✅ Design	Part of per-character cluster (tests 1–4)
`should remove invisible mathematical operator INVISIBLE TIMES (U+2062)`	`sanitize_content.test.cjs`	✅ Design	Part of per-character cluster (tests 1–4)
`should remove invisible mathematical operator INVISIBLE SEPARATOR (U+2063)`	`sanitize_content.test.cjs`	✅ Design	Part of per-character cluster (tests 1–4)
`should remove invisible mathematical operator INVISIBLE PLUS (U+2064)`	`sanitize_content.test.cjs`	✅ Design	Part of per-character cluster (tests 1–4)
`should strip U+2061-U+2064 used to fragment a secret-like marker`	`sanitize_content.test.cjs`	✅ Design	Security-focused adversarial scenario
`should remove multiple invisible mathematical operators`	`sanitize_content.test.cjs`	✅ Design	Multi-character combination coverage

Flagged Tests — Requires Review

No tests require mandatory review. One advisory note is included below.

i️ Per-character cluster (tests 1–4) — advisory only

Classification: Design tests, minor duplication
Observation: Four tests assert identical behavior (expect(sanitizeContent("Hello" + char + "World")).toBe("HelloWorld")) for U+2061–U+2064 individually. This pattern triggers the duplicate-cluster heuristic (3+ tests, same structure, different constants).
Impact on score: −5 pts from the Low Duplication component.
Optional consolidation: A single table-driven test iterating over the four code points would be equally expressive and more concise. However, explicit per-character tests are a legitimate and readable style for a security-hardening change, so this is purely advisory — no action required.

Language Support

Tests analyzed:

🐹 Go (*_test.go): 0 tests — no Go test files changed
🟨 JavaScript (*.test.cjs): 6 tests (vitest)

Scoring Breakdown

Component	Raw	Weight	Score
Behavioral Coverage	6/6 = 100%	40 pts	40
Error/Edge Case Coverage	6/6 = 100%	30 pts	30
Low Duplication	1 duplicate cluster	20 pts	15
Proportional Growth	38 test lines / 5 prod lines = 7.6:1	10 pts	0
Total			85

Note on inflation: the 7.6:1 ratio reflects the deliberate choice to add one explicit test per new Unicode code point, which is appropriate for a security-hardening change. The penalty is applied mechanically per the rubric but does not reflect a real quality concern here.

Verdict

✅ Check passed. 0% of new tests are implementation tests (threshold: 30%). All 6 tests verify observable, behavioral contracts against adversarial Unicode input.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

Assert on observable outputs, return values, or state changes
Cover error paths and boundary conditions
Would catch a behavioral regression if deleted
Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

Assert on internal function calls (mocking internals)
Only test the happy path with typical inputs
Break during legitimate refactoring even when behavior is correct
Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

References: §24834484682

🧪 Test quality analysis by Test Quality Sentinel · ● 327.8K · ◷

github-actions

✅ Test Quality Sentinel: 85/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 6 new tests verify observable behavioral contracts against adversarial invisible Unicode characters.

pelikhan · 2026-04-23T12:18:43Z

@copilot review all comments

…ally Agent-Logs-Url: https://github.com/github/gh-aw/sessions/acd023ee-4af1-44c5-947d-31f18788242f Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-04-23T12:22:55Z

@copilot review all comments

The fragmentation test has been parameterized to run once for each of U+2061–U+2064, so the test name now accurately reflects what's validated. Commit b7b627a.

Initial plan

2b0e3c1

Copilot AI assigned Copilot and szabta89 Apr 23, 2026

Copilot started work on behalf of szabta89 April 23, 2026 11:45 View session

Copilot AI linked an issue Apr 23, 2026 that may be closed by this pull request

sanitize_content_core: extend hardenUnicodeText to strip U+2061–U+2064 Invisible Mathematical Operators #28036

Closed

Copilot AI changed the title ~~[WIP] Update hardenUnicodeText to strip invisible mathematical operators~~ fix: strip U+2061–U+2064 invisible mathematical operators in hardenUnicodeText Apr 23, 2026

Copilot finished work on behalf of szabta89 April 23, 2026 11:53

Copilot AI requested a review from szabta89 April 23, 2026 11:53

pelikhan marked this pull request as ready for review April 23, 2026 12:11

Copilot AI review requested due to automatic review settings April 23, 2026 12:11

Copilot started reviewing on behalf of pelikhan April 23, 2026 12:12 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

github-actions Bot mentioned this pull request Apr 23, 2026

[aw] No-Op Runs #27675

Closed

github-actions Bot approved these changes Apr 23, 2026

View reviewed changes

Copilot started work on behalf of pelikhan April 23, 2026 12:18 View session

test: parameterize fragmentation test to cover U+2061-U+2064 individu…

b7b627a

…ally Agent-Logs-Url: https://github.com/github/gh-aw/sessions/acd023ee-4af1-44c5-947d-31f18788242f Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan April 23, 2026 12:24

Copilot AI requested a review from pelikhan April 23, 2026 12:24

pelikhan merged commit e0c7b2a into main Apr 23, 2026
20 of 24 checks passed

pelikhan deleted the copilot/extend-harden-unicode-text-stripping branch April 23, 2026 12:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: strip U+2061–U+2064 invisible mathematical operators in hardenUnicodeText#28037

fix: strip U+2061–U+2064 invisible mathematical operators in hardenUnicodeText#28037
pelikhan merged 3 commits intomainfrom
copilot/extend-harden-unicode-text-stripping

Copilot AI commented Apr 23, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

pelikhan commented Apr 23, 2026

Uh oh!

Copilot AI commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Copilot AI commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 23, 2026

🧪 Test Quality Sentinel Report

Test Quality Score: 85/100

Test Classification Details

Flagged Tests — Requires Review

i️ Per-character cluster (tests 1–4) — advisory only

Language Support

Scoring Breakdown

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

pelikhan commented Apr 23, 2026

Uh oh!

Copilot AI commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented Apr 23, 2026 •

edited

Loading