CLI feature request: experimental hash-anchored edit mode to reduce edit failures and retries

### What variant of Codex are you using?

CLI

### What feature would you like to see?

I'd like Codex CLI to support an **experimental hash-anchored edit mode** (alongside `apply_patch`) to improve edit reliability, especially when models struggle with strict patch formatting.

The core idea is to let read/search tools return stable per-line anchors (short content-hash IDs), and let an edit tool target those anchors directly instead of reproducing exact old text.

Research-driven motivation (from the Harness Problem write-up):

- Benchmark shape: 16 models, 3 edit formats, 180 tasks per run, 3 runs.
- Reported patch-format failure rates were high for several non-Codex models (examples in the post: Grok 4 at 50.7%, GLM-4.7 at 46.2%).
- The write-up reports that hashline-style anchored edits matched or beat replace-style edits for most tested models.
- Reported impact examples: Grok Code Fast 1 from 6.7% to 68.3%, and Grok 4 Fast output tokens down 61% due to fewer edit retry loops.

Proposed CLI PoC scope:

1. Add an experimental edit tool (or mode), e.g. `apply_anchored_edit`, behind a feature flag.
2. Return line anchors from read/search output (example format: `lineNumber:hash|content`).
3. Support operations like `replace_range`, `insert_after`, and `delete_range` using anchor IDs.
4. Enforce optimistic concurrency: if anchors no longer match current file content, reject safely with a clear conflict message.
5. Keep full backward compatibility by preserving `apply_patch` and using anchored mode as opt-in (or adaptive fallback).

Why this matters:

- Edit failures and retry loops are still a practical reliability issue in real workflows.
- Existing Codex issues show recurring pain around patch/edit reliability and fallback behavior (for example: #9661, #10330).
- A prior request for this concept (#11601) was closed as not planned, but this request includes concrete rollout/guardrail and measurement criteria.

Suggested acceptance criteria:

1. Add a benchmark mode for edit-tool reliability (failure rate + retries + token usage).
2. Demonstrate reduced failed edit attempts vs. patch-only baseline on the same task set.
3. Show no correctness regressions on existing Codex CLI edit workflows.
4. Keep the feature gated/experimental until reliability data is strong.

### Additional information

Primary reference: https://blog.can.ac/2026/02/12/the-harness-problem/

Related prior issue: https://github.com/openai/codex/issues/11601

If maintainers prefer a minimal first step, a benchmark-only branch (no default behavior change) would still be very useful to validate feasibility in Codex CLI.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI feature request: experimental hash-anchored edit mode to reduce edit failures and retries #12987

What variant of Codex are you using?

What feature would you like to see?

Additional information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

CLI feature request: experimental hash-anchored edit mode to reduce edit failures and retries #12987

Description

What variant of Codex are you using?

What feature would you like to see?

Additional information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions