Skip to content

fix(pager): preserve git log colors#381

Merged
benvinegar merged 2 commits into
mainfrom
fix/pager-preserve-git-colors
May 29, 2026
Merged

fix(pager): preserve git log colors#381
benvinegar merged 2 commits into
mainfrom
fix/pager-preserve-git-colors

Conversation

@benvinegar
Copy link
Copy Markdown
Member

Summary

  • Preserve safe ANSI SGR styling when hunk pager falls back to a terminal text pager for non-diff content
  • Continue stripping unsafe terminal controls such as OSC, DCS, clear-screen, cursor movement, and raw control characters
  • Add regression coverage for Git log colors and sanitizer behavior

Testing

Fixes #379

This PR description was generated by Pi using GPT-5

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 28, 2026

Greptile Summary

This PR fixes a regression where hunk pager, when falling back to a plain-text pager for non-diff content (e.g. git log), was stripping all ANSI SGR color codes alongside the unsafe terminal controls it was intentionally removing. The fix adds a preserveAnsiStyle option to sanitizeTerminalText and enables it only on the TTY pager path.

  • terminalText.ts gains a preserveAnsiStyle flag that tokenises recognised SGR sequences (matching /^\\x1b\\[[0-9;:]*m$/) before sanitisation and splices them back in afterwards, while continuing to drop OSC, DCS, CSI non-SGR, and raw control characters.
  • pager.ts splits the original single sanitizeTerminalText call into two: the non-TTY path strips everything (unchanged semantics), and the TTY pager path now passes preserveAnsiStyle: true so colours survive.
  • New regression tests cover SGR preservation, non-SGR stripping, and the pager integration path.

Confidence Score: 3/5

Safe to merge for the vast majority of real-world git output, but the token scheme has a gap for crafted input in the security-sensitive sanitiser.

The core logic and pager integration are correct. The one concern is in terminalText.ts: the PUA characters chosen as token delimiters (U+F0000, U+F0001) are not removed before tokenisation, so a git commit message containing those characters could cause the restoration step to inject SGR sequences derived from other parts of the same output. This is a real defect in code that is explicitly designed to handle untrusted terminal content — the function's own docstring says 'Normalize untrusted terminal-bound text' — even though the practical impact is limited to cosmetic colour injection. A one-line pre-filter closes the gap completely.

src/lib/terminalText.ts — specifically the token placeholder scheme and restoration loop in sanitizeTerminalText.

Important Files Changed

Filename Overview
src/lib/terminalText.ts Adds preserveAnsiStyle option using a PUA-character token scheme; the token delimiters (U+F0000, U+F0001) are not stripped before tokenisation, so crafted input containing those characters can survive sanitisation and be replaced with ANSI escape sequences.
src/core/pager.ts Correctly splits sanitisation into a strip-all non-TTY path and a color-preserving TTY pager path; logic is straightforward and correct.
src/lib/terminalText.test.ts New tests verify SGR preservation, non-SGR stripping, and combined behaviour; coverage is solid for the happy path.
src/core/pager.test.ts Adds a well-structured integration test that verifies SGR colours pass through the pager and CSI_CLEAR_SCREEN is still stripped.
CHANGELOG.md Entry accurately describes the fix; no issues.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[pagePlainText input] --> B{stdout.isTTY?}
    B -- No --> C[sanitizeTerminalText\nstrip ALL escapes]
    C --> D[write to stdout]
    B -- Yes --> E[sanitizeTerminalText\npreserveAnsiStyle=true]
    E --> F{contains control codes?}
    F -- No --> G[return unchanged]
    F -- Yes --> H[sevenBitControlStrings replace]
    H --> I{SGR sequence?}
    I -- Yes --> J[store + emit token]
    I -- No --> K[emit empty string]
    J --> L[strip c1 controls\nand control chars]
    K --> L
    L --> M[restore tokens to\noriginal SGR sequences]
    M --> N[safeText with SGR colors]
    N --> O[spawn pager]
    O --> P[write safeText to pager stdin]
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
src/lib/terminalText.ts:38-50
The token delimiters `\u{f0000}` and `\u{f0001}` are Unicode Supplementary Private Use Area characters and are not covered by any of the sanitiser's control-character regexes (`\x00-\x1f`, `\x7f-\x9f`). If the untrusted input already contains these characters — for example, in a git commit message whose author embedded U+F0000 intentionally — the restoration loop will replace them with the ANSI SGR sequences collected from legitimate colour codes elsewhere in the same string, effectively injecting escape sequences that were not at that position in the original input. Since the function is explicitly documented as handling "untrusted" text, stripping the token-delimiter characters before tokenisation closes this gap without any visible difference for real-world input.

```suggestion
  const preservedStyles: string[] = [];
  const preserveStyle = (sequence: string) => {
    if (!preserveAnsiStyle || !/^\x1b\[[0-9;:]*m$/.test(sequence)) {
      return "";
    }

    const token = `\u{f0000}${preservedStyles.length}\u{f0001}`;
    preservedStyles.push(sequence);
    return token;
  };

  // Strip the token-delimiter characters before tokenising so that crafted input
  // cannot produce token strings that the restoration loop would later replace
  // with ANSI escape sequences from elsewhere in the same string.
  const tokenSafeText = preserveAnsiStyle
    ? text.replace(/[\u{f0000}\u{f0001}]/gu, "")
    : text;
  let sanitized = tokenSafeText
    .replace(sevenBitControlStrings, preserveStyle)
```

Reviews (1): Last reviewed commit: "fix(pager): preserve git log colors" | Re-trigger Greptile

Comment thread src/lib/terminalText.ts
Comment on lines +38 to +50
const preservedStyles: string[] = [];
const preserveStyle = (sequence: string) => {
if (!preserveAnsiStyle || !/^\x1b\[[0-9;:]*m$/.test(sequence)) {
return "";
}

const token = `\u{f0000}${preservedStyles.length}\u{f0001}`;
preservedStyles.push(sequence);
return token;
};

return text
.replace(sevenBitControlStrings, "")
let sanitized = text
.replace(sevenBitControlStrings, preserveStyle)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 The token delimiters \u{f0000} and \u{f0001} are Unicode Supplementary Private Use Area characters and are not covered by any of the sanitiser's control-character regexes (\x00-\x1f, \x7f-\x9f). If the untrusted input already contains these characters — for example, in a git commit message whose author embedded U+F0000 intentionally — the restoration loop will replace them with the ANSI SGR sequences collected from legitimate colour codes elsewhere in the same string, effectively injecting escape sequences that were not at that position in the original input. Since the function is explicitly documented as handling "untrusted" text, stripping the token-delimiter characters before tokenisation closes this gap without any visible difference for real-world input.

Suggested change
const preservedStyles: string[] = [];
const preserveStyle = (sequence: string) => {
if (!preserveAnsiStyle || !/^\x1b\[[0-9;:]*m$/.test(sequence)) {
return "";
}
const token = `\u{f0000}${preservedStyles.length}\u{f0001}`;
preservedStyles.push(sequence);
return token;
};
return text
.replace(sevenBitControlStrings, "")
let sanitized = text
.replace(sevenBitControlStrings, preserveStyle)
const preservedStyles: string[] = [];
const preserveStyle = (sequence: string) => {
if (!preserveAnsiStyle || !/^\x1b\[[0-9;:]*m$/.test(sequence)) {
return "";
}
const token = `\u{f0000}${preservedStyles.length}\u{f0001}`;
preservedStyles.push(sequence);
return token;
};
// Strip the token-delimiter characters before tokenising so that crafted input
// cannot produce token strings that the restoration loop would later replace
// with ANSI escape sequences from elsewhere in the same string.
const tokenSafeText = preserveAnsiStyle
? text.replace(/[\u{f0000}\u{f0001}]/gu, "")
: text;
let sanitized = tokenSafeText
.replace(sevenBitControlStrings, preserveStyle)
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/terminalText.ts
Line: 38-50

Comment:
The token delimiters `\u{f0000}` and `\u{f0001}` are Unicode Supplementary Private Use Area characters and are not covered by any of the sanitiser's control-character regexes (`\x00-\x1f`, `\x7f-\x9f`). If the untrusted input already contains these characters — for example, in a git commit message whose author embedded U+F0000 intentionally — the restoration loop will replace them with the ANSI SGR sequences collected from legitimate colour codes elsewhere in the same string, effectively injecting escape sequences that were not at that position in the original input. Since the function is explicitly documented as handling "untrusted" text, stripping the token-delimiter characters before tokenisation closes this gap without any visible difference for real-world input.

```suggestion
  const preservedStyles: string[] = [];
  const preserveStyle = (sequence: string) => {
    if (!preserveAnsiStyle || !/^\x1b\[[0-9;:]*m$/.test(sequence)) {
      return "";
    }

    const token = `\u{f0000}${preservedStyles.length}\u{f0001}`;
    preservedStyles.push(sequence);
    return token;
  };

  // Strip the token-delimiter characters before tokenising so that crafted input
  // cannot produce token strings that the restoration loop would later replace
  // with ANSI escape sequences from elsewhere in the same string.
  const tokenSafeText = preserveAnsiStyle
    ? text.replace(/[\u{f0000}\u{f0001}]/gu, "")
    : text;
  let sanitized = tokenSafeText
    .replace(sevenBitControlStrings, preserveStyle)
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed by stripping the internal placeholder delimiters from untrusted input before style-token restoration, and added a regression test covering the crafted-token case.

Also updated the Nix workflow/flake cache config so CI actually uses the nix-community Cachix substituter instead of trying to build bun2nix from crates.io.

This comment was generated by Pi using GPT-5

@benvinegar benvinegar merged commit e453a75 into main May 29, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Configuring hunk pager as git core.pager strips colors from other git commands

1 participant