Summary
Codex Plan Mode can fail not only because an individual plan is weak, but because the product does not clearly define what a plan is after it is written.
This issue is a cross-cutting survey of several existing Plan/Goal/compaction/hook/instruction-following issues, plus one missing design question: what are the semantics and lifecycle of a Plan Mode plan?
This is not just a request to make plans longer, more visible, or more binding. The deeper problem is that Codex currently risks treating "write a plan" as the goal, rather than treating planning as a lifecycle with validity, assumptions, revision rules, implementation linkage, and completion criteria.
Why this is not only a prompt-quality issue
There is already related prompt/context work, especially:
Those are important, but they do not fully cover the lifecycle question.
Even if Plan Mode explores first and writes a plausible plan, what does the accepted plan mean during implementation?
Is it:
- a fixed execution contract?
- a hypothesis to be tested?
- a handoff artifact?
- a living plan that can be revised?
- a checklist?
- a goal completion contract?
- something else?
Without a defined answer, Plan Mode can fail in two opposite ways:
- If the initial plan is silently abandoned after implementation discoveries, why was the plan created?
- If Codex follows a stale or bad plan rigidly, Plan Mode can be worse than no plan.
Problem 1: plan-shaped output can be mistaken for actual planning
A useful plan is not just a structured response. For many tasks, the plan should be the result of actual thinking and investigation.
For example, architecture planning should reason about:
- whether the design will work;
- what can fail;
- scalability and maintainability;
- integration boundaries;
- migration and rollback risk;
- alternatives and why they were rejected.
If implementation is expected to require trial and error, the plan should acknowledge that too. A plan that pretends all decisions are known up front may look organized while still being misleading.
This overlaps with #23481, but this issue is broader: even after research-first planning, Codex still needs semantics for what the resulting plan means during later execution.
Problem 2: one generic Plan Mode cannot fit every task type
Different task families need different planning protocols.
A debugging task should not be planned like a greenfield feature. Debugging often needs reproduction, evidence collection, hypotheses, root-cause proof, minimal fix strategy, verification, and cleanup. This is the point made by #25933: Plan and Goal are useful, but they do not fully replace an evidence-driven Debug Mode.
A production/product-building task has another shape. #21446 asks for Codex to help users reach production-ready outcomes rather than demos, and #26556 proposes user modes and claim gates so Codex does not claim completion from weak evidence.
A research/spec-compilation task has yet another shape. #30394 shows that when user-provided plans/specs are the authority, lossy summarization or extraction can collapse the user's design into an agent-generated reduced plan.
So the issue is not only that Plan Mode needs "better plans." It needs a way to select or enforce the right kind of planning discipline for the task.
Problem 3: plan validity and re-planning are undefined
Many real tasks cannot be fully decided before implementation begins.
New facts are discovered while editing, running tests, inspecting runtime behavior, resolving integration details, or learning that an assumption was wrong. In those cases, Plan Mode needs a defined policy for:
- when the original plan remains valid;
- when it must be revised;
- how deviations are recorded and justified;
- whether user approval is required before changing direction;
- how the active goal relates to a revised plan;
- what happens when the original goal becomes impossible, unsafe, or no longer desirable;
- how completion is checked against the user's intent and accepted constraints.
This is where the phrase "execution contract" needs careful treatment. A visible or persistent plan is only one possible mechanism. The product first needs to define what kind of contract, if any, an accepted plan represents.
Problem 4: /plan and /goal handoff issues are related but not sufficient
There are already related issues about connecting plans to goals:
Those are useful handoff improvements, but they still mostly assume that an approved plan can become the authoritative execution plan.
The missing question is what happens after that, when implementation discovers that the plan is incomplete, stale, wrong, or impossible.
If Codex can freely switch the goal, the goal loses meaning. If it cannot revise the plan, it may follow a bad plan. If it silently ignores the plan, Plan Mode becomes decorative.
Problem 5: state, compaction, and long-running execution make this worse
Long-running tasks need plan/goal/current-task continuity. Several existing issues show adjacent state-continuity failures:
These issues are not the same as Plan Mode semantics, but they show why a plan cannot remain ordinary chat text if it is supposed to guide later work.
A plan may need first-class structured state, durable enough to survive compaction/resume, but that state still needs clear semantics before hooks or UI can solve the problem.
Problem 6: user intent and instruction fidelity are part of the failure mode
Plan Mode also depends on understanding the user's actual intent.
If the user already provided a design, constraints, or a full specification, Codex should not silently replace that with a reduced agent interpretation. #30394 is especially relevant here: large user-provided planning/spec corpora can be collapsed into lossy summaries, and later implementation may follow the agent's reduced plan rather than the user's source material.
There may also be a harness/prompt-stack component. #27587 argues that Codex's built-in instruction stack can contain conflicting defaults, such as verifying first while also strongly preferring assumptions and execution. That kind of conflict can encourage premature assumptions, arbitrary interpretation, and under-clarification.
So this should not be treated as only a model-quality complaint. Some of the responsibility may belong to Codex's mode design, prompt stack, state model, and harness behavior.
Relationship to hooks / harness proposals
Hooks and harnesses may be part of the eventual solution, but they are not the whole issue.
Those are useful because an external harness may need to observe or update plan state. But before external tools can reliably enforce or update a plan, Codex needs to define what a plan state means.
For example, an external harness should not have to guess whether a completed checklist item is history, an obligation, a hypothesis, or a stale step that should be revised.
Non-goals
This issue is not asking Codex to always follow the initial plan rigidly.
It is not only asking for a persistent plan UI, a live plan file, or a copy-plan workflow.
It is not only asking for /plan -> /goal promotion.
It is not only asking for more hooks.
Those may be useful mechanisms, but the first problem is semantic: Codex needs to define what a Plan Mode plan is, how it is validated, how it evolves, and how it relates to implementation and completion.
Desired direction
Codex should treat Plan Mode as a planning lifecycle, not only as a text-generation mode.
At minimum, the product should define:
- plan artifact types or semantics;
- task-specific planning protocols;
- accepted-plan state;
- validity assumptions and invalidation conditions;
- revision/re-planning behavior;
- relation between plan, goal, execution, and completion;
- behavior when a goal becomes impossible or unsafe;
- how user-provided designs/specs remain authoritative;
- how compaction/resume preserves active plan and operational state;
- how completion claims are checked against user intent, plan state, and evidence.
A plan is useful only if both the model and the harness know what role it plays after it is written.
Summary
Codex Plan Mode can fail not only because an individual plan is weak, but because the product does not clearly define what a plan is after it is written.
This issue is a cross-cutting survey of several existing Plan/Goal/compaction/hook/instruction-following issues, plus one missing design question: what are the semantics and lifecycle of a Plan Mode plan?
This is not just a request to make plans longer, more visible, or more binding. The deeper problem is that Codex currently risks treating "write a plan" as the goal, rather than treating planning as a lifecycle with validity, assumptions, revision rules, implementation linkage, and completion criteria.
Why this is not only a prompt-quality issue
There is already related prompt/context work, especially:
Those are important, but they do not fully cover the lifecycle question.
Even if Plan Mode explores first and writes a plausible plan, what does the accepted plan mean during implementation?
Is it:
Without a defined answer, Plan Mode can fail in two opposite ways:
Problem 1: plan-shaped output can be mistaken for actual planning
A useful plan is not just a structured response. For many tasks, the plan should be the result of actual thinking and investigation.
For example, architecture planning should reason about:
If implementation is expected to require trial and error, the plan should acknowledge that too. A plan that pretends all decisions are known up front may look organized while still being misleading.
This overlaps with #23481, but this issue is broader: even after research-first planning, Codex still needs semantics for what the resulting plan means during later execution.
Problem 2: one generic Plan Mode cannot fit every task type
Different task families need different planning protocols.
A debugging task should not be planned like a greenfield feature. Debugging often needs reproduction, evidence collection, hypotheses, root-cause proof, minimal fix strategy, verification, and cleanup. This is the point made by #25933: Plan and Goal are useful, but they do not fully replace an evidence-driven Debug Mode.
A production/product-building task has another shape. #21446 asks for Codex to help users reach production-ready outcomes rather than demos, and #26556 proposes user modes and claim gates so Codex does not claim completion from weak evidence.
A research/spec-compilation task has yet another shape. #30394 shows that when user-provided plans/specs are the authority, lossy summarization or extraction can collapse the user's design into an agent-generated reduced plan.
So the issue is not only that Plan Mode needs "better plans." It needs a way to select or enforce the right kind of planning discipline for the task.
Problem 3: plan validity and re-planning are undefined
Many real tasks cannot be fully decided before implementation begins.
New facts are discovered while editing, running tests, inspecting runtime behavior, resolving integration details, or learning that an assumption was wrong. In those cases, Plan Mode needs a defined policy for:
This is where the phrase "execution contract" needs careful treatment. A visible or persistent plan is only one possible mechanism. The product first needs to define what kind of contract, if any, an accepted plan represents.
Problem 4:
/planand/goalhandoff issues are related but not sufficientThere are already related issues about connecting plans to goals:
/goalis used in Plan Mode, defer goal creation until the plan is finalized./plan→/goalworkflow: allow promoting an approved plan into a goal with clean context #27139: allow promoting an approved/planinto a/goalwith clean context.Those are useful handoff improvements, but they still mostly assume that an approved plan can become the authoritative execution plan.
The missing question is what happens after that, when implementation discovers that the plan is incomplete, stale, wrong, or impossible.
If Codex can freely switch the goal, the goal loses meaning. If it cannot revise the plan, it may follow a bad plan. If it silently ignores the plan, Plan Mode becomes decorative.
Problem 5: state, compaction, and long-running execution make this worse
Long-running tasks need plan/goal/current-task continuity. Several existing issues show adjacent state-continuity failures:
These issues are not the same as Plan Mode semantics, but they show why a plan cannot remain ordinary chat text if it is supposed to guide later work.
A plan may need first-class structured state, durable enough to survive compaction/resume, but that state still needs clear semantics before hooks or UI can solve the problem.
Problem 6: user intent and instruction fidelity are part of the failure mode
Plan Mode also depends on understanding the user's actual intent.
If the user already provided a design, constraints, or a full specification, Codex should not silently replace that with a reduced agent interpretation. #30394 is especially relevant here: large user-provided planning/spec corpora can be collapsed into lossy summaries, and later implementation may follow the agent's reduced plan rather than the user's source material.
There may also be a harness/prompt-stack component. #27587 argues that Codex's built-in instruction stack can contain conflicting defaults, such as verifying first while also strongly preferring assumptions and execution. That kind of conflict can encourage premature assumptions, arbitrary interpretation, and under-clarification.
So this should not be treated as only a model-quality complaint. Some of the responsibility may belong to Codex's mode design, prompt stack, state model, and harness behavior.
Relationship to hooks / harness proposals
Hooks and harnesses may be part of the eventual solution, but they are not the whole issue.
Those are useful because an external harness may need to observe or update plan state. But before external tools can reliably enforce or update a plan, Codex needs to define what a plan state means.
For example, an external harness should not have to guess whether a completed checklist item is history, an obligation, a hypothesis, or a stale step that should be revised.
Non-goals
This issue is not asking Codex to always follow the initial plan rigidly.
It is not only asking for a persistent plan UI, a live plan file, or a copy-plan workflow.
It is not only asking for
/plan->/goalpromotion.It is not only asking for more hooks.
Those may be useful mechanisms, but the first problem is semantic: Codex needs to define what a Plan Mode plan is, how it is validated, how it evolves, and how it relates to implementation and completion.
Desired direction
Codex should treat Plan Mode as a planning lifecycle, not only as a text-generation mode.
At minimum, the product should define:
A plan is useful only if both the model and the harness know what role it plays after it is written.