Skip to content

CLI feature request: experimental hash-anchored edit mode to reduce edit failures and retries #12987

@ndycode

Description

@ndycode

What variant of Codex are you using?

CLI

What feature would you like to see?

I'd like Codex CLI to support an experimental hash-anchored edit mode (alongside apply_patch) to improve edit reliability, especially when models struggle with strict patch formatting.

The core idea is to let read/search tools return stable per-line anchors (short content-hash IDs), and let an edit tool target those anchors directly instead of reproducing exact old text.

Research-driven motivation (from the Harness Problem write-up):

  • Benchmark shape: 16 models, 3 edit formats, 180 tasks per run, 3 runs.
  • Reported patch-format failure rates were high for several non-Codex models (examples in the post: Grok 4 at 50.7%, GLM-4.7 at 46.2%).
  • The write-up reports that hashline-style anchored edits matched or beat replace-style edits for most tested models.
  • Reported impact examples: Grok Code Fast 1 from 6.7% to 68.3%, and Grok 4 Fast output tokens down 61% due to fewer edit retry loops.

Proposed CLI PoC scope:

  1. Add an experimental edit tool (or mode), e.g. apply_anchored_edit, behind a feature flag.
  2. Return line anchors from read/search output (example format: lineNumber:hash|content).
  3. Support operations like replace_range, insert_after, and delete_range using anchor IDs.
  4. Enforce optimistic concurrency: if anchors no longer match current file content, reject safely with a clear conflict message.
  5. Keep full backward compatibility by preserving apply_patch and using anchored mode as opt-in (or adaptive fallback).

Why this matters:

Suggested acceptance criteria:

  1. Add a benchmark mode for edit-tool reliability (failure rate + retries + token usage).
  2. Demonstrate reduced failed edit attempts vs. patch-only baseline on the same task set.
  3. Show no correctness regressions on existing Codex CLI edit workflows.
  4. Keep the feature gated/experimental until reliability data is strong.

Additional information

Primary reference: https://blog.can.ac/2026/02/12/the-harness-problem/

Related prior issue: #11601

If maintainers prefer a minimal first step, a benchmark-only branch (no default behavior change) would still be very useful to validate feasibility in Codex CLI.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions