Plan Mode needs defined planning semantics and lifecycle, not only plan-shaped output

## Summary

Codex Plan Mode can fail not only because an individual plan is weak, but because the product does not clearly define what a plan is after it is written.

This issue is a cross-cutting survey of several existing Plan/Goal/compaction/hook/instruction-following issues, plus one missing design question: **what are the semantics and lifecycle of a Plan Mode plan?**

This is not just a request to make plans longer, more visible, or more binding. The deeper problem is that Codex currently risks treating "write a plan" as the goal, rather than treating planning as a lifecycle with validity, assumptions, revision rules, implementation linkage, and completion criteria.

## Why this is not only a prompt-quality issue

There is already related prompt/context work, especially:

- #23481: Plan Mode should research facts and decisions before proposing a plan.
- #11321: Plan Mode asks redundant or already-answered questions.
- #23274: Codex can fall into planning/documentation loops instead of executing the requested work.

Those are important, but they do not fully cover the lifecycle question.

Even if Plan Mode explores first and writes a plausible plan, what does the accepted plan mean during implementation?

Is it:

- a fixed execution contract?
- a hypothesis to be tested?
- a handoff artifact?
- a living plan that can be revised?
- a checklist?
- a goal completion contract?
- something else?

Without a defined answer, Plan Mode can fail in two opposite ways:

- If the initial plan is silently abandoned after implementation discoveries, why was the plan created?
- If Codex follows a stale or bad plan rigidly, Plan Mode can be worse than no plan.

## Problem 1: plan-shaped output can be mistaken for actual planning

A useful plan is not just a structured response. For many tasks, the plan should be the result of actual thinking and investigation.

For example, architecture planning should reason about:

- whether the design will work;
- what can fail;
- scalability and maintainability;
- integration boundaries;
- migration and rollback risk;
- alternatives and why they were rejected.

If implementation is expected to require trial and error, the plan should acknowledge that too. A plan that pretends all decisions are known up front may look organized while still being misleading.

This overlaps with #23481, but this issue is broader: even after research-first planning, Codex still needs semantics for what the resulting plan means during later execution.

## Problem 2: one generic Plan Mode cannot fit every task type

Different task families need different planning protocols.

A debugging task should not be planned like a greenfield feature. Debugging often needs reproduction, evidence collection, hypotheses, root-cause proof, minimal fix strategy, verification, and cleanup. This is the point made by #25933: Plan and Goal are useful, but they do not fully replace an evidence-driven Debug Mode.

A production/product-building task has another shape. #21446 asks for Codex to help users reach production-ready outcomes rather than demos, and #26556 proposes user modes and claim gates so Codex does not claim completion from weak evidence.

A research/spec-compilation task has yet another shape. #30394 shows that when user-provided plans/specs are the authority, lossy summarization or extraction can collapse the user's design into an agent-generated reduced plan.

So the issue is not only that Plan Mode needs "better plans." It needs a way to select or enforce the right kind of planning discipline for the task.

## Problem 3: plan validity and re-planning are undefined

Many real tasks cannot be fully decided before implementation begins.

New facts are discovered while editing, running tests, inspecting runtime behavior, resolving integration details, or learning that an assumption was wrong. In those cases, Plan Mode needs a defined policy for:

- when the original plan remains valid;
- when it must be revised;
- how deviations are recorded and justified;
- whether user approval is required before changing direction;
- how the active goal relates to a revised plan;
- what happens when the original goal becomes impossible, unsafe, or no longer desirable;
- how completion is checked against the user's intent and accepted constraints.

This is where the phrase "execution contract" needs careful treatment. A visible or persistent plan is only one possible mechanism. The product first needs to define what kind of contract, if any, an accepted plan represents.

## Problem 4: `/plan` and `/goal` handoff issues are related but not sufficient

There are already related issues about connecting plans to goals:

- #24218: when `/goal` is used in Plan Mode, defer goal creation until the plan is finalized.
- #27139: allow promoting an approved `/plan` into a `/goal` with clean context.

Those are useful handoff improvements, but they still mostly assume that an approved plan can become the authoritative execution plan.

The missing question is what happens after that, when implementation discovers that the plan is incomplete, stale, wrong, or impossible.

If Codex can freely switch the goal, the goal loses meaning. If it cannot revise the plan, it may follow a bad plan. If it silently ignores the plan, Plan Mode becomes decorative.

## Problem 5: state, compaction, and long-running execution make this worse

Long-running tasks need plan/goal/current-task continuity. Several existing issues show adjacent state-continuity failures:

- #29356: compaction loses operational continuity and recent task state.
- #28925: Codex repeatedly re-analyzes after context compaction instead of executing.
- #30859: completed steered prompts can be treated as current tasks after compaction when a goal is active.
- #18920 and #13932: plan/checklist visibility and preservation are fragile.

These issues are not the same as Plan Mode semantics, but they show why a plan cannot remain ordinary chat text if it is supposed to guide later work.

A plan may need first-class structured state, durable enough to survive compaction/resume, but that state still needs clear semantics before hooks or UI can solve the problem.

## Problem 6: user intent and instruction fidelity are part of the failure mode

Plan Mode also depends on understanding the user's actual intent.

If the user already provided a design, constraints, or a full specification, Codex should not silently replace that with a reduced agent interpretation. #30394 is especially relevant here: large user-provided planning/spec corpora can be collapsed into lossy summaries, and later implementation may follow the agent's reduced plan rather than the user's source material.

There may also be a harness/prompt-stack component. #27587 argues that Codex's built-in instruction stack can contain conflicting defaults, such as verifying first while also strongly preferring assumptions and execution. That kind of conflict can encourage premature assumptions, arbitrary interpretation, and under-clarification.

So this should not be treated as only a model-quality complaint. Some of the responsibility may belong to Codex's mode design, prompt stack, state model, and harness behavior.

## Relationship to hooks / harness proposals

Hooks and harnesses may be part of the eventual solution, but they are not the whole issue.

- #24547 proposes task and plan lifecycle hooks plus external harness-driven plan updates.
- #21753 tracks broader hook parity and lifecycle observability.

Those are useful because an external harness may need to observe or update plan state. But before external tools can reliably enforce or update a plan, Codex needs to define what a plan state means.

For example, an external harness should not have to guess whether a completed checklist item is history, an obligation, a hypothesis, or a stale step that should be revised.

## Non-goals

This issue is not asking Codex to always follow the initial plan rigidly.

It is not only asking for a persistent plan UI, a live plan file, or a copy-plan workflow.

It is not only asking for `/plan` -> `/goal` promotion.

It is not only asking for more hooks.

Those may be useful mechanisms, but the first problem is semantic: Codex needs to define what a Plan Mode plan is, how it is validated, how it evolves, and how it relates to implementation and completion.

## Desired direction

Codex should treat Plan Mode as a planning lifecycle, not only as a text-generation mode.

At minimum, the product should define:

- plan artifact types or semantics;
- task-specific planning protocols;
- accepted-plan state;
- validity assumptions and invalidation conditions;
- revision/re-planning behavior;
- relation between plan, goal, execution, and completion;
- behavior when a goal becomes impossible or unsafe;
- how user-provided designs/specs remain authoritative;
- how compaction/resume preserves active plan and operational state;
- how completion claims are checked against user intent, plan state, and evidence.

A plan is useful only if both the model and the harness know what role it plays after it is written.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Plan Mode needs defined planning semantics and lifecycle, not only plan-shaped output #30937

Summary

Why this is not only a prompt-quality issue

Problem 1: plan-shaped output can be mistaken for actual planning

Problem 2: one generic Plan Mode cannot fit every task type

Problem 3: plan validity and re-planning are undefined

Problem 4: `/plan` and `/goal` handoff issues are related but not sufficient

Problem 5: state, compaction, and long-running execution make this worse

Problem 6: user intent and instruction fidelity are part of the failure mode

Relationship to hooks / harness proposals

Non-goals

Desired direction

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Plan Mode needs defined planning semantics and lifecycle, not only plan-shaped output #30937

Description

Summary

Why this is not only a prompt-quality issue

Problem 1: plan-shaped output can be mistaken for actual planning

Problem 2: one generic Plan Mode cannot fit every task type

Problem 3: plan validity and re-planning are undefined

Problem 4: /plan and /goal handoff issues are related but not sufficient

Problem 5: state, compaction, and long-running execution make this worse

Problem 6: user intent and instruction fidelity are part of the failure mode

Relationship to hooks / harness proposals

Non-goals

Desired direction

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Problem 4: `/plan` and `/goal` handoff issues are related but not sufficient