fix(agentic): fast-fail interactive Claude on subscription/usage caps (#1389)#1390
Conversation
Verification & testingFull local run (
New / changed tests
Behaviors verified
Companion: the pdd_cloud OAuth waterfall force-rotates on this 🤖 Generated with Claude Code |
|
Re: the remaining "no
By design the token is dormant until this PR merges. Sequence: merge this → re-pin #1816 to the new pdd On the meta nit ( |
gltanaka
left a comment
There was a problem hiding this comment.
Decision: do not merge as-is.
The core fix looks needed. On current main, the weekly/overall cap classifies as credential-limit, but the common session/usage cap strings still classify as None, and a synthetic interactive row for You've hit your session limit · resets ... leaves _auth_state_after_row(...) as None. This PR fixes that behavior and the focused credential-limit tests pass locally.
Required changes before merge:
-
Reduce the PR to the credential-limit fix and directly related tests/prompt metadata. The appended
tests/test_agentic_common.pycoverage block from roughly lines 9508-9936 is unrelated to #1389 and should be removed or split into a separate PR. It adds duplicate late imports /sys.pathmutation and new pylint issues in that block (reimported/wrong-import-positionaround 9514, 9522, 9722, 9730, 9731, plusunsubscriptable-objecton the optional context assertions around 9614-9618 and 9922). It also makes a small production fix look like a 600-line mixed-change review. -
Remove or tightly justify the unrelated generated churn. In particular, the
architecture.jsonsteering/interface edits and Unicode normalization are not part of the subscription-cap fast-fail, and deleting.pdd/meta/agentic_common_python_run.jsonshould not be bundled unless the final regenerated metadata still proves the narrowed PR is drift-clean. -
Update the PR text so it does not imply end-to-end cloud failover is complete. The companion pdd_cloud PR currently documents a staging blocker where headless interactive mode hangs before producing the MCP reply. This pdd change can still land as a standalone cap-specific fast-fail, but it should not be presented as production-ready cloud rollout until that separate interactive/MCP blocker is fixed and re-smoked.
After those changes, I would re-review the narrowed diff; the actual classifier + _auth_state_after_row behavior is the part worth keeping.
5f65c8c to
9fbf9e0
Compare
|
Narrowed the PR to just the cap fast-fail — all three points addressed. 1. Diff reduced to the fix. Now 4 files: 2. Unrelated generated churn dropped. 3. PR text corrected. The body no longer claims end-to-end cloud failover. The companion pdd_cloud PR has the matching rotation code but an open staging blocker (headless interactive hangs before the MCP reply), so this lands as a standalone cap-specific fast-fail, not a cloud rollout. Two items that are your call:
The classifier + |
gltanaka
left a comment
There was a problem hiding this comment.
Current-head decision: still do not merge as-is.
The important parts are better: the architecture.json / example churn is gone, the PR body now accurately scopes this as a standalone cap fast-fail, and the actual classifier + _auth_state_after_row implementation still looks like the right fix. Local verification on 4ca11f357:
pytest tests/test_agentic_common.py -k "credential_limit or transient_synthetic" -q→ 13 passedpytest tests/test_agentic_common.py -q→ 419 passed
The remaining blocker is that auto-heal re-added unrelated generated tests at the end of tests/test_agentic_common.py (# Additional tests appended, around lines 9508-9816). Those tests cover template substitution, control-token detection, classifier LLM fallback, OpenCode parsing, post_final_comment, and JSON extraction, none of which is part of #1389's credential-limit fast-fail. They also reintroduce the late duplicate imports / sys.path mutation pattern around lines 9513-9532 and 9671-9682, plus a missing final newline.
Required change: remove/split that appended test block again, or pause/adjust auto-heal for this PR so it does not keep reintroducing unrelated test_extend output. If generated tests are desired, they should land in a separate PR with that scope called out. This PR should contain only the cap classifier/hook, the focused credential-limit tests, and the matching prompt/meta updates.
Once the diff is back to that focused shape, I think the core fix is mergeable.
f1bd62e to
cb9e3ef
Compare
|
@gltanaka current head All current checks on this head are green now, including:
Re-review when you have a moment. |
gltanaka
left a comment
There was a problem hiding this comment.
Approved current head cb9e3eff8.
The diff is now focused on the credential-limit fast-fail: pdd/agentic_common.py, the matching prompt sentence, the meta fingerprint, and targeted tests only. The unrelated appended test_extend block, architecture.json churn, example churn, and run-report changes are gone.
Local verification:
pytest tests/test_agentic_common.py -k "credential_limit or transient_synthetic" -q→ 13 passedpytest tests/test_agentic_common.py -q→ 381 passed- Simulated merge of current
origin/maininto this head completed without conflicts; the focused credential-limit subset still passed.
One administrative note: GitHub currently reports this PR as BEHIND, so branch protection may require updating/rebasing onto main and rerunning checks before merge. I do not see a remaining code-review blocker in the current focused diff.
On main the weekly/overall cap classifies as credential-limit, but the
common 5-hour session and usage cap strings classify as None, and a
synthetic interactive row for "You've hit your session limit · resets …"
leaves _auth_state_after_row(...) as None — so a mid-run cap parks the PTY
until the per-step timeout instead of surfacing a rotation signal.
- Broaden the credential-limit classifier to match the session/usage/
weekly/monthly cap envelopes; the proximity + time-token guard is
unchanged, so benign prose ("if you hit your limit, nothing resets
automatically") still does not classify.
- Arm the interactive fast-fail on credential-limit with a deterministic,
credential-safe message that itself classifies as credential-limit, so
the caller treats it as permanent and rotates instead of waiting out the
step timeout. Never echoes the raw reset time / account text; kept
distinct from an API-tier 429 so it never lands on the 60s retry floor.
- Document the broadened envelope in the prompt; refresh meta hashes.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ed49d19 to
f1aab77
Compare
#1403) (#1440) * fix(ci): scope-preserving PR auto-heal — never escalate to test_extend (#1403) PR auto-heal was re-bloating narrow fix PRs: for a Python module whose tests pass but coverage is below target, `sync_determine_operation` returns `test_extend`, and `heal_module` routes verify/generate/test/crash through `pdd sync`, which re-derives the same coverage gap internally and appends unrelated generated tests (rewriting `.pdd/meta` command to `test_extend`). This made narrow PRs non-mergeable (e.g. #1390). Add a single env-var signal, `PDD_DISABLE_TEST_EXTEND`, set only in PR auto-heal mode (`not skip_ci`) and enforced at two layers: - Detection (`sync_determine_operation.test_extend_disabled`): the coverage-gap branch returns the existing `all_synced` no-op for all languages when the flag is set. Because this function is called by both the in-process `detect_drift` and the nested `pdd sync`, one branch covers both the detection and execution paths the issue requires. - Execution backstop (`sync_orchestration`): mirror the existing non-Python `test_extend` skip — log `test_extend_skipped`, accept the current state, and write no test file. - `ci_drift_heal.main` sets the flag on `os.environ` only around the in-process `detect_drift` call (restored in `finally`, no leak) and passes it explicitly in the `pdd sync` subprocess env. Push-to-main (`--skip-ci`) is unaffected — whole-module coverage growth still runs. - `detect_drift` now treats `all_synced` as "no drift" (alongside nothing/synced) so the guarded no-op is a clean skip, not an "unknown operation" heal failure. Prompts updated to match (source of truth). Regression tests prove: PR mode suppresses + propagates the flag, push-to-main keeps test_extend, the orchestrator never appends tests / writes a `test_extend` fingerprint when suppressed, and an e2e run proves the parent→child env contract. The flag is default-off, so unset behavior is byte-for-byte unchanged. Supersedes #1416 (prompts-only, code never regenerated) and #1432 (tests + partial code; missed the all_synced filter and env restore). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(ci): address review — all_synced skip after reclassification; rename helper (#1403) Codex review round 1 found a real regression: detect_drift skipped 'all_synced' BEFORE the git-based reclassification, so a PR that changed code without its prompt (and had a low-coverage passing run_report) was silently dropped instead of being promoted to 'update'. This also regressed the pre-existing non-Python all_synced coverage-gap path under --diff-base. - detect_drift: move the 'all_synced' no-drift skip to AFTER git reclassification, so an all_synced module whose code changed without its prompt is still promoted to 'update'; only a still-terminal all_synced is dropped. - Rename helper test_extend_disabled() -> is_test_extend_disabled() so pytest does not collect it as a test when imported into a test module, and so the name reads as a predicate. New regression tests: - detect_drift: all_synced + code-only change -> update (not dropped); terminal all_synced -> clean skip (never an unknown-operation failure). - is_test_extend_disabled truthiness incl. falsey (0/false/off/'') and whitespace; unset -> False. - main(): os.environ flag restored even when detect_drift raises, and a pre-existing value is restored exactly (not clobbered). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Test User <test@test.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Problem
On
main, the weekly/overall Claude subscription cap classifies ascredential-limit, but the common 5-hour session and usage cap strings still classify asNone. A synthetic interactive row forYou've hit your session limit · resets …leaves_auth_state_after_row(...)asNone, so a mid-run cap parks the PTY until the per-step timeout instead of surfacing a rotation signal — a regression vsclaude -p, which rotated caps via the structured 429.Fix (standalone, cap-specific fast-fail)
credential-limitclassifier to match the session/usage/weekly/monthly cap envelopes. The proximity + time-token guard is unchanged, so benign prose ("if you hit your limit, nothing resets automatically") still does not classify.credential-limitwith a deterministic, credential-safe message that itself classifies ascredential-limit, so the caller treats it as permanent and rotates instead of waiting out the step timeout. Never echoes the raw reset time / account text; kept distinct from an API-tier 429 so it never lands on the 60s rate-limit retry floor.Scope
Intentionally narrow: the classifier +
_auth_state_after_rowbehavior, its tests, and the prompt/metadata that describe it (4 files). Noarchitecture.json, example, or unrelated test changes.Cloud consumer / end-to-end status
The pdd_cloud OAuth waterfall has matching
credential-limitrotation logic in a companion PR, but end-to-end failover is not yet proven: that companion has an open staging blocker (headless interactive mode hangs before producing the MCP reply). This change stands on its own as a correct, cap-specific fast-fail; it is not a production-ready cloud rollout until the separate interactive/MCP blocker is fixed and re-smoked.Tests
tests/test_agentic_common.py(credential-limit subset): weekly / 5-hour session / usage envelopes all fast-fail and self-classify ascredential-limit; the benign-prose guard keeps "if you hit your limit, nothing resets automatically" out of the class. Fulltest_agentic_common.pygreen locally.