Feature request: Model-callable reasoning effort control ("Reasoning Gearbox") for auto mode

### What variant of Codex are you using?

CLI / TUI (and Codex app)

### What feature would you like to see?

## Summary

Add a model-callable action that lets the model itself set the reasoning effort
used for subsequent turns, in either direction, with a stated reason and an
optional scope. This is intended as a primitive that complements the
harness-side "Auto" tiering proposed in #8649: heuristics outside the model
choose a starting tier, and the model can re-tune it as evidence about the
current task accumulates.

The intuition is "match reasoning effort to local uncertainty, risk, and
reversibility" rather than "escalate when stuck." A long agentic task is not
uniform: it may contain mechanical edits, medium-difficulty debugging, and
high-consequence architectural choices. Reasoning effort that is right for one
phase is wrong for another.

## Problem / Motivation

Today, reasoning effort is set before a task and remains effectively fixed for
its duration, with the user as the only party who can change it mid-session
(via Alt+, / Alt+. in the TUI, /model, or the equivalent app control). This
has two failure modes:

1. Under-allocation. A task that started simple uncovers ambiguity, conflicting
   evidence, or hidden coupling. The model continues at the original tier and
   produces shallow patches, repeated failed attempts, or wrong architectural
   decisions.

2. Over-allocation. A task pinned to a high tier for safety burns tokens and
   latency on long mechanical stretches where the next action is "run the
   test" or "rename three call sites." High effort can also cause its own
   failures: over-planning, second-guessing clear errors, wandering into
   architecture when a typo fix is needed, larger patches than necessary.

Issue #8649 proposes harness-driven Auto: the harness inspects external signals
(diff size, failure loops, intent keywords) and picks a tier per turn. That is
useful, but the signals available outside the model are coarse. The model
itself often has the earliest and clearest signal that the local task shape
has changed. This proposal adds the action that lets the model express that
signal directly.

The two proposals compose: harness-side Auto can choose the starting tier and
provide outside-view safeguards; the model-callable control handles
inside-view re-tuning between turns. Either can ship without the other.

## Proposed solution

A new model-callable action - working name codex.set_reasoning_effort - whose
intent is to declare the reasoning effort the model wants applied to its
upcoming turns.

Conceptual shape (not prescriptive):

- An effort value drawn from the tiers the active model supports.
- A scope expressing how long the change should persist: a single upcoming
  turn, until a stated condition, or for the rest of the session.
- A free-form reason explaining why the change is being made.
- Optionally, an exit condition describing what would prompt the model to
  re-tune again.

Activation is gated:

- Off by default; opt-in via configuration.
- User-set caps bound the range the model is allowed to choose
  (analogous to auto_min / auto_max in #8649).
- Manual user controls (Alt+, / Alt+. and equivalents) remain authoritative
  and can override or pin at any time.

The action is information for the harness about the model's intent. The
harness retains full discretion over what to do with it: clamp, accept,
reject, log.

## Behavior characteristics worth specifying

These are design questions where the right answer matters more than the
implementation:

- Bidirectionality. The action explicitly supports downshifts as well as
  upshifts.

- Effect timing. A change applied via this action affects the next request
  built by the harness, not the current generation. Worth surfacing this in
  any user-facing description.

- Visibility. Each accepted change should be visible to the user at the moment
  it happens (one-line note in the TUI / inline event in the app) and the
  current effective tier should be visible at all times, with a marker for
  whether it was set by the user or by the model.

- Interaction with plan mode. Codex already has plan_mode_reasoning_effort.
  The proposal needs an explicit answer to whether the action is available
  during planning, during execution, both, or neither - and what happens to a
  model-set tier across the plan / execution boundary. Three coherent options
  exist (execution-only, independent gearboxes per phase, single gearbox
  across phases); the right default is a design call.

- Interaction with subagents. Subagents already accept per-agent
  model_reasoning_effort. Whether a subagent can re-tune itself, and whether
  changes propagate to or from the parent, is also a design call.

- Interaction with the harness-driven Auto policy in #8649. If both are
  enabled, the harness's choice should serve as the starting tier and the
  model's choice should be the in-task adjustment, with caps applied to both.

- Oscillation. With bidirectional control and a model that may misjudge, rapid
  flipping is possible. Two complementary safeguards seem reasonable: a high
  pathological cap on shifts per task as a circuit breaker, and a soft
  detection signal that surfaces oscillation patterns to the user without
  blocking further changes. Hard cooldowns are probably the wrong shape: they
  suppress signal that future improvements (including any RL on this control)
  would want to learn from.

## Why this is worth shipping even before any model is trained on it

Without training, behavior on this action will likely be uneven - some models
will use it well, others poorly, others not at all. The value of having the
interface in the open-source harness regardless:

- It defines a stable contract that future training can target.
- It produces structured telemetry (when, why, and to what effect models
  re-tune themselves) that is otherwise unobtainable.
- It establishes a convergence point that other harnesses can adopt.
- For users who do not want it, it is opt-in and does nothing.

Framing the PR as "interface plus prompted fallback" (rather than "calibrated
behavior on day one") avoids the failure mode where reviewers expect the
control to already be reliable.

## Acceptance criteria

- The action exists, is opt-in, and is bounded by user-configurable caps.
- Changes initiated by the action take effect on the next request and are
  visible in the UI of every Codex surface that exposes reasoning effort.
- Each change emits a structured event including the requested tier, the
  applied tier (after clamping), the scope, the model's stated reason, and
  surrounding turn context, so the data is useful for evaluation and for any
  future training run.
- Behavior at the plan / execution boundary is documented and tested.
- Manual user controls remain authoritative.
- Off-by-default; documented as an interface intended to evolve, not a
  finished behavior.

## Surfaces

This proposal is harness-level and would naturally surface in CLI / TUI, the
Codex app, and the IDE extension. The model-facing contract is the same in
all three; the user-facing visibility (status indicator, shift events,
settings) needs to land on each surface.

## Related

- #8649 - "Auto" reasoning effort (dynamic tiering) for Codex CLI. This
  proposal is complementary: harness-side Auto chooses an initial tier and
  applies outside-view safeguards; the model-callable control handles
  inside-view re-tuning.
- #19877 - Warn before switching models in a long-running session. Same UX
  family: the user has standing controls for in-session reasoning/model
  changes and benefits from clear visibility when those change.

### Additional information

Happy to prepare a focused PR for the harness-side primitive (action surface,
controller state, request wiring, configuration, telemetry) if the design
direction is acceptable, with surface-specific UI work tracked separately so
each surface team can land its own piece.

### Additional information

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Model-callable reasoning effort control ("Reasoning Gearbox") for auto mode #20855

What variant of Codex are you using?

What feature would you like to see?

Summary

Problem / Motivation

Proposed solution

Behavior characteristics worth specifying

Why this is worth shipping even before any model is trained on it

Acceptance criteria

Surfaces

Related

Additional information

Additional information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature request: Model-callable reasoning effort control ("Reasoning Gearbox") for auto mode #20855

Description

What variant of Codex are you using?

What feature would you like to see?

Summary

Problem / Motivation

Proposed solution

Behavior characteristics worth specifying

Why this is worth shipping even before any model is trained on it

Acceptance criteria

Surfaces

Related

Additional information

Additional information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions