fix(pager): preserve git log colors by benvinegar · Pull Request #381 · modem-dev/hunk

benvinegar · 2026-05-28T20:15:50Z

Summary

Preserve safe ANSI SGR styling when hunk pager falls back to a terminal text pager for non-diff content
Continue stripping unsafe terminal controls such as OSC, DCS, clear-screen, cursor movement, and raw control characters
Add regression coverage for Git log colors and sanitizer behavior

Testing

bun test
bun run typecheck
Recreated issue Configuring hunk pager as git core.pager strips colors from other git commands #379 in tmux and confirmed git -c core.pager='bun run /home/bentlegen/Projects/hunk-a/src/main.tsx -- pager' log ... now preserves colors

Fixes #379

This PR description was generated by Pi using GPT-5

greptile-apps · 2026-05-28T20:19:53Z

Greptile Summary

This PR fixes a regression where hunk pager, when falling back to a plain-text pager for non-diff content (e.g. git log), was stripping all ANSI SGR color codes alongside the unsafe terminal controls it was intentionally removing. The fix adds a preserveAnsiStyle option to sanitizeTerminalText and enables it only on the TTY pager path.

terminalText.ts gains a preserveAnsiStyle flag that tokenises recognised SGR sequences (matching /^\\x1b\\[[0-9;:]*m$/) before sanitisation and splices them back in afterwards, while continuing to drop OSC, DCS, CSI non-SGR, and raw control characters.
pager.ts splits the original single sanitizeTerminalText call into two: the non-TTY path strips everything (unchanged semantics), and the TTY pager path now passes preserveAnsiStyle: true so colours survive.
New regression tests cover SGR preservation, non-SGR stripping, and the pager integration path.

Confidence Score: 3/5

Safe to merge for the vast majority of real-world git output, but the token scheme has a gap for crafted input in the security-sensitive sanitiser.

The core logic and pager integration are correct. The one concern is in terminalText.ts: the PUA characters chosen as token delimiters (U+F0000, U+F0001) are not removed before tokenisation, so a git commit message containing those characters could cause the restoration step to inject SGR sequences derived from other parts of the same output. This is a real defect in code that is explicitly designed to handle untrusted terminal content — the function's own docstring says 'Normalize untrusted terminal-bound text' — even though the practical impact is limited to cosmetic colour injection. A one-line pre-filter closes the gap completely.

src/lib/terminalText.ts — specifically the token placeholder scheme and restoration loop in sanitizeTerminalText.

Important Files Changed

Filename	Overview
src/lib/terminalText.ts	Adds preserveAnsiStyle option using a PUA-character token scheme; the token delimiters (U+F0000, U+F0001) are not stripped before tokenisation, so crafted input containing those characters can survive sanitisation and be replaced with ANSI escape sequences.
src/core/pager.ts	Correctly splits sanitisation into a strip-all non-TTY path and a color-preserving TTY pager path; logic is straightforward and correct.
src/lib/terminalText.test.ts	New tests verify SGR preservation, non-SGR stripping, and combined behaviour; coverage is solid for the happy path.
src/core/pager.test.ts	Adds a well-structured integration test that verifies SGR colours pass through the pager and CSI_CLEAR_SCREEN is still stripped.
CHANGELOG.md	Entry accurately describes the fix; no issues.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[pagePlainText input] --> B{stdout.isTTY?}
    B -- No --> C[sanitizeTerminalText\nstrip ALL escapes]
    C --> D[write to stdout]
    B -- Yes --> E[sanitizeTerminalText\npreserveAnsiStyle=true]
    E --> F{contains control codes?}
    F -- No --> G[return unchanged]
    F -- Yes --> H[sevenBitControlStrings replace]
    H --> I{SGR sequence?}
    I -- Yes --> J[store + emit token]
    I -- No --> K[emit empty string]
    J --> L[strip c1 controls\nand control chars]
    K --> L
    L --> M[restore tokens to\noriginal SGR sequences]
    M --> N[safeText with SGR colors]
    N --> O[spawn pager]
    O --> P[write safeText to pager stdin]

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
src/lib/terminalText.ts:38-50
The token delimiters `\u{f0000}` and `\u{f0001}` are Unicode Supplementary Private Use Area characters and are not covered by any of the sanitiser's control-character regexes (`\x00-\x1f`, `\x7f-\x9f`). If the untrusted input already contains these characters — for example, in a git commit message whose author embedded U+F0000 intentionally — the restoration loop will replace them with the ANSI SGR sequences collected from legitimate colour codes elsewhere in the same string, effectively injecting escape sequences that were not at that position in the original input. Since the function is explicitly documented as handling "untrusted" text, stripping the token-delimiter characters before tokenisation closes this gap without any visible difference for real-world input.

```suggestion
  const preservedStyles: string[] = [];
  const preserveStyle = (sequence: string) => {
    if (!preserveAnsiStyle || !/^\x1b\[[0-9;:]*m$/.test(sequence)) {
      return "";
    }

    const token = `\u{f0000}${preservedStyles.length}\u{f0001}`;
    preservedStyles.push(sequence);
    return token;
  };

  // Strip the token-delimiter characters before tokenising so that crafted input
  // cannot produce token strings that the restoration loop would later replace
  // with ANSI escape sequences from elsewhere in the same string.
  const tokenSafeText = preserveAnsiStyle
    ? text.replace(/[\u{f0000}\u{f0001}]/gu, "")
    : text;
  let sanitized = tokenSafeText
    .replace(sevenBitControlStrings, preserveStyle)
```

_{Reviews (1): Last reviewed commit: "fix(pager): preserve git log colors" | Re-trigger Greptile}

greptile-apps · 2026-05-28T20:19:57Z

+  const preservedStyles: string[] = [];
+  const preserveStyle = (sequence: string) => {
+    if (!preserveAnsiStyle || !/^\x1b\[[0-9;:]*m$/.test(sequence)) {
+      return "";
+    }
+
+    const token = `\u{f0000}${preservedStyles.length}\u{f0001}`;
+    preservedStyles.push(sequence);
+    return token;
+  };

-  return text
-    .replace(sevenBitControlStrings, "")
+  let sanitized = text
+    .replace(sevenBitControlStrings, preserveStyle)


The token delimiters \u{f0000} and \u{f0001} are Unicode Supplementary Private Use Area characters and are not covered by any of the sanitiser's control-character regexes (\x00-\x1f, \x7f-\x9f). If the untrusted input already contains these characters — for example, in a git commit message whose author embedded U+F0000 intentionally — the restoration loop will replace them with the ANSI SGR sequences collected from legitimate colour codes elsewhere in the same string, effectively injecting escape sequences that were not at that position in the original input. Since the function is explicitly documented as handling "untrusted" text, stripping the token-delimiter characters before tokenisation closes this gap without any visible difference for real-world input.

Suggested change

const preservedStyles: string[] = [];

const preserveStyle = (sequence: string) => {

if (!preserveAnsiStyle || !/^\x1b\[[0-9;:]*m$/.test(sequence)) {

return "";

}

const token = `\u{f0000}${preservedStyles.length}\u{f0001}`;

preservedStyles.push(sequence);

return token;

};

return text

.replace(sevenBitControlStrings, "")

let sanitized = text

.replace(sevenBitControlStrings, preserveStyle)

const preservedStyles: string[] = [];

const preserveStyle = (sequence: string) => {

if (!preserveAnsiStyle || !/^\x1b\[[0-9;:]*m$/.test(sequence)) {

return "";

}

const token = `\u{f0000}${preservedStyles.length}\u{f0001}`;

preservedStyles.push(sequence);

return token;

};

// Strip the token-delimiter characters before tokenising so that crafted input

// cannot produce token strings that the restoration loop would later replace

// with ANSI escape sequences from elsewhere in the same string.

const tokenSafeText = preserveAnsiStyle

? text.replace(/[\u{f0000}\u{f0001}]/gu, "")

: text;

let sanitized = tokenSafeText

.replace(sevenBitControlStrings, preserveStyle)

Prompt To Fix With AI

This is a comment left during a code review. Path: src/lib/terminalText.ts Line: 38-50 Comment: The token delimiters `\u{f0000}` and `\u{f0001}` are Unicode Supplementary Private Use Area characters and are not covered by any of the sanitiser's control-character regexes (`\x00-\x1f`, `\x7f-\x9f`). If the untrusted input already contains these characters — for example, in a git commit message whose author embedded U+F0000 intentionally — the restoration loop will replace them with the ANSI SGR sequences collected from legitimate colour codes elsewhere in the same string, effectively injecting escape sequences that were not at that position in the original input. Since the function is explicitly documented as handling "untrusted" text, stripping the token-delimiter characters before tokenisation closes this gap without any visible difference for real-world input. ```suggestion const preservedStyles: string[] = []; const preserveStyle = (sequence: string) => { if (!preserveAnsiStyle || !/^\x1b\[[0-9;:]*m$/.test(sequence)) { return ""; } const token = `\u{f0000}${preservedStyles.length}\u{f0001}`; preservedStyles.push(sequence); return token; }; // Strip the token-delimiter characters before tokenising so that crafted input // cannot produce token strings that the restoration loop would later replace // with ANSI escape sequences from elsewhere in the same string. const tokenSafeText = preserveAnsiStyle ? text.replace(/[\u{f0000}\u{f0001}]/gu, "") : text; let sanitized = tokenSafeText .replace(sevenBitControlStrings, preserveStyle) ``` How can I resolve this? If you propose a fix, please make it concise.

Good catch — fixed by stripping the internal placeholder delimiters from untrusted input before style-token restoration, and added a regression test covering the crafted-token case.

Also updated the Nix workflow/flake cache config so CI actually uses the nix-community Cachix substituter instead of trying to build bun2nix from crates.io.

This comment was generated by Pi using GPT-5

fix(pager): preserve git log colors

2d8a420

greptile-apps Bot reviewed May 28, 2026

View reviewed changes

fix(pager): harden ansi style restoration

da3181e

benvinegar merged commit e453a75 into main May 29, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(pager): preserve git log colors#381

fix(pager): preserve git log colors#381
benvinegar merged 2 commits into
mainfrom
fix/pager-preserve-git-colors

benvinegar commented May 28, 2026

Uh oh!

greptile-apps Bot commented May 28, 2026

Uh oh!

greptile-apps Bot May 28, 2026

Uh oh!

benvinegar May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

benvinegar commented May 28, 2026

Summary

Testing

Uh oh!

greptile-apps Bot commented May 28, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

benvinegar May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant