What version of the Codex App are you using (From “About Codex” dialog)?
Codex Cloud
What subscription do you have?
Pro
What platform is your computer?
MacOS/iOS
What issue are you seeing?
Product: Codex Cloud
Failure class: task-runner / orchestration / validation-order reliability
Impact: unsafe for governed release workflows
Expected behavior: no PR or completion state until clean-tree validation passes
Actual behavior: partial validation, skipped gates, repeated OOB repair phases
Repro evidence: command logs, commit hashes, failed gates, corrected prompts
What steps can reproduce the bug?
Title:
Codex Cloud violates explicit release-gate instructions and treats incomplete validation as task completion
Product:
Codex Cloud / Codex PR workflow
Repository:
https://github.com/irgordon/ajentic
Issue type:
Task-runner / orchestration / validation-order reliability
Impact:
Codex Cloud is unreliable for governed release workflows. It repeatedly produces or attempts phase completion while required validation gates are incomplete, skipped, dirty-tree-only, or failed. This causes repeated out-of-band repair phases and makes Codex unsafe to use as an autonomous release worker.
Expected behavior:
When the prompt explicitly requires validation before PR creation, Codex should not create PR metadata or report task completion until all required gates pass.
For this repository, completion requires:
- direct validation passes,
- changes are committed,
git status --short returns no output,
CARGO_TARGET_DIR=/tmp/<phase-target> ./scripts/check.sh passes from a clean committed tree,
git status --short still returns no output,
- only then should Codex call PR creation /
make_pr.
A dirty-tree ./scripts/check.sh failure is not an acceptable final result because the script intentionally enforces an initial clean-tree gate.
Actual behavior:
Across multiple AJENTIC release-path phases, Codex Cloud repeatedly:
- ran partial validation,
- reported dirty-tree
./scripts/check.sh failures as expected/acceptable,
- skipped post-commit clean-tree validation,
- committed or created PR metadata before the authoritative validation gate passed,
- missed required tests,
- introduced formatting/typecheck failures,
- claimed checklist/changelog completion before the repo state proved it,
- and in one case required a follow-up reconciliation because expected Phase 184 implementation files were absent or not discoverable by the required scan.
Minimal reproduction pattern:
- Use a repository with a strict validation script that fails unless the working tree is clean at start and end.
- Give Codex Cloud a phase prompt requiring:
- code changes,
cargo fmt --check,
- Rust tests,
- clippy,
- TypeScript typecheck/lint/build/tests,
- commit,
- clean-tree
./scripts/check.sh,
- clean final
git status,
- no PR before clean-tree validation.
- Include explicit language:
- “Do not call make_pr before clean-tree ./scripts/check.sh passes.”
- “A dirty-tree ./scripts/check.sh failure is never a valid final validation result.”
- “Direct cargo/npm checks do not substitute for ./scripts/check.sh.”
- Observe Codex often stops after partial validation, reports dirty-tree failure as expected, commits anyway, or records PR metadata without the clean-tree validation proof.
Concrete examples from AJENTIC:
- Phase 174/174.1: TypeScript local shell state failed because a newly required
installerDistributionContract field was missing in one state construction path.
- Phase 174.0/174.2: checklist/changelog frontmatter regressions caused structure/docs validation failures.
- Phase 176/176.1: formatting/import ordering required a validation closure phase.
- Phase 177/177.1: Rust formatting failure required an out-of-band formatting fix.
- Phase 179/179.2: UI forbidden-label test used substring matching and treated
no_signature_created as forbidden signature_created.
- Phase 181: initial implementation failed on projection type-shape mismatch and was rolled back.
- Phase 181.2: TypeScript workflow failed because
initialReleaseCandidateEvidenceManifestProjection was referenced but not defined.
- Phase 183/183.1:
cargo fmt --check failed on release_candidate_hardening_closure.rs.
- Phase 184/184.1:
cargo fmt --check failed on local_operator_shell_state.rs and release_candidate_local_package_rehearsal.rs; after closure, a required scan reported the expected release_candidate_local_package_rehearsal*.rs path did not exist, raising concern that a validated tree may not contain the intended Phase 184 implementation.
Reproduction prompt fragment:
Use this instruction in a Codex Cloud task:
“Non-negotiable completion rule:
Do not call make_pr, do not claim completion, and do not write a success summary unless this command passes after commit from a clean working tree:
CARGO_TARGET_DIR=/tmp/ajentic-phase-184-target ./scripts/check.sh
A dirty-tree ./scripts/check.sh failure is never a valid final validation result.
Direct cargo/npm checks do not substitute for ./scripts/check.sh.
If any required gate fails, stop. Do not call make_pr. Do not claim completion. Return the exact failed command and relevant error.”
Observed failure:
Codex still reports partial completion or requires repeated OOB repair/validation phases rather than reliably completing the required sequence.
Why this matters:
This is not a normal coding error. It is a workflow-authority error. In a release-governed repository, a PR or completion state before clean-tree validation creates false release progress and forces manual repair.
Requested fix:
Codex Cloud should enforce completion-state gating internally:
- no PR metadata creation before required validation commands pass,
- no “completed” task state when the prompt-required final validation failed or was not run,
- distinguish dirty-tree pre-commit validation failures from post-commit clean-tree validation success,
- preserve ordered validation requirements,
- expose final validation evidence clearly in the task summary,
- and stop safely when a gate fails rather than continuing unrelated work.
Acceptance criteria:
A Codex Cloud task with an explicit clean-tree validation gate should only report success and create PR metadata if:
- post-commit
git status --short is clean,
- post-commit
./scripts/check.sh passes,
- final
git status --short is clean,
- and the task summary includes those exact results.
Any failure before those steps should leave the task incomplete and must not create PR metadata.
What is the expected behavior?
When the prompt explicitly requires validation before PR creation, Codex should not create PR metadata or report task completion until all required gates pass.
Additional information
Minimal reproduction pattern:
-
Use a repository with a strict validation script that fails unless the working tree is clean at start and end.
-
Give Codex Cloud a phase prompt requiring:
-
code changes,
-
cargo fmt --check,
-
Rust tests,
-
clippy,
-
TypeScript typecheck/lint/build/tests,
-
commit,
-
clean-tree ./scripts/check.sh,
-
clean final git status,
-
no PR before clean-tree validation.
-
Include explicit language:
-
“Do not call make_pr before clean-tree ./scripts/check.sh passes.”
-
“A dirty-tree ./scripts/check.sh failure is never a valid final validation result.”
-
“Direct cargo/npm checks do not substitute for ./scripts/check.sh.”
-
Observe Codex often stops after partial validation, reports dirty-tree failure as expected, commits anyway, or records PR metadata without the clean-tree validation proof.
What version of the Codex App are you using (From “About Codex” dialog)?
Codex Cloud
What subscription do you have?
Pro
What platform is your computer?
MacOS/iOS
What issue are you seeing?
Product: Codex Cloud
Failure class: task-runner / orchestration / validation-order reliability
Impact: unsafe for governed release workflows
Expected behavior: no PR or completion state until clean-tree validation passes
Actual behavior: partial validation, skipped gates, repeated OOB repair phases
Repro evidence: command logs, commit hashes, failed gates, corrected prompts
What steps can reproduce the bug?
Title:
Codex Cloud violates explicit release-gate instructions and treats incomplete validation as task completion
Product:
Codex Cloud / Codex PR workflow
Repository:
https://github.com/irgordon/ajentic
Issue type:
Task-runner / orchestration / validation-order reliability
Impact:
Codex Cloud is unreliable for governed release workflows. It repeatedly produces or attempts phase completion while required validation gates are incomplete, skipped, dirty-tree-only, or failed. This causes repeated out-of-band repair phases and makes Codex unsafe to use as an autonomous release worker.
Expected behavior:
When the prompt explicitly requires validation before PR creation, Codex should not create PR metadata or report task completion until all required gates pass.
For this repository, completion requires:
git status --shortreturns no output,CARGO_TARGET_DIR=/tmp/<phase-target> ./scripts/check.shpasses from a clean committed tree,git status --shortstill returns no output,make_pr.A dirty-tree
./scripts/check.shfailure is not an acceptable final result because the script intentionally enforces an initial clean-tree gate.Actual behavior:
Across multiple AJENTIC release-path phases, Codex Cloud repeatedly:
./scripts/check.shfailures as expected/acceptable,Minimal reproduction pattern:
cargo fmt --check,./scripts/check.sh,git status,Concrete examples from AJENTIC:
installerDistributionContractfield was missing in one state construction path.no_signature_createdas forbiddensignature_created.initialReleaseCandidateEvidenceManifestProjectionwas referenced but not defined.cargo fmt --checkfailed onrelease_candidate_hardening_closure.rs.cargo fmt --checkfailed onlocal_operator_shell_state.rsandrelease_candidate_local_package_rehearsal.rs; after closure, a required scan reported the expectedrelease_candidate_local_package_rehearsal*.rspath did not exist, raising concern that a validated tree may not contain the intended Phase 184 implementation.Reproduction prompt fragment:
Use this instruction in a Codex Cloud task:
“Non-negotiable completion rule:
Do not call make_pr, do not claim completion, and do not write a success summary unless this command passes after commit from a clean working tree:
CARGO_TARGET_DIR=/tmp/ajentic-phase-184-target ./scripts/check.shA dirty-tree
./scripts/check.shfailure is never a valid final validation result.Direct cargo/npm checks do not substitute for
./scripts/check.sh.If any required gate fails, stop. Do not call make_pr. Do not claim completion. Return the exact failed command and relevant error.”
Observed failure:
Codex still reports partial completion or requires repeated OOB repair/validation phases rather than reliably completing the required sequence.
Why this matters:
This is not a normal coding error. It is a workflow-authority error. In a release-governed repository, a PR or completion state before clean-tree validation creates false release progress and forces manual repair.
Requested fix:
Codex Cloud should enforce completion-state gating internally:
Acceptance criteria:
A Codex Cloud task with an explicit clean-tree validation gate should only report success and create PR metadata if:
git status --shortis clean,./scripts/check.shpasses,git status --shortis clean,Any failure before those steps should leave the task incomplete and must not create PR metadata.
What is the expected behavior?
When the prompt explicitly requires validation before PR creation, Codex should not create PR metadata or report task completion until all required gates pass.
Additional information
Minimal reproduction pattern:
Use a repository with a strict validation script that fails unless the working tree is clean at start and end.
Give Codex Cloud a phase prompt requiring:
code changes,
cargo fmt --check,Rust tests,
clippy,
TypeScript typecheck/lint/build/tests,
commit,
clean-tree
./scripts/check.sh,clean final
git status,no PR before clean-tree validation.
Include explicit language:
“Do not call make_pr before clean-tree ./scripts/check.sh passes.”
“A dirty-tree ./scripts/check.sh failure is never a valid final validation result.”
“Direct cargo/npm checks do not substitute for ./scripts/check.sh.”
Observe Codex often stops after partial validation, reports dirty-tree failure as expected, commits anyway, or records PR metadata without the clean-tree validation proof.