Skip to content

fix(agentic): fast-fail interactive Claude on subscription/usage caps (#1389)#1390

Merged
gltanaka merged 2 commits into
mainfrom
fix/1389-interactive-credential-limit-fastfail
Jun 4, 2026
Merged

fix(agentic): fast-fail interactive Claude on subscription/usage caps (#1389)#1390
gltanaka merged 2 commits into
mainfrom
fix/1389-interactive-credential-limit-fastfail

Conversation

@Serhan-Asad
Copy link
Copy Markdown
Collaborator

@Serhan-Asad Serhan-Asad commented Jun 3, 2026

Problem

On main, the weekly/overall Claude subscription cap classifies as credential-limit, but the common 5-hour session and usage cap strings still classify as None. A synthetic interactive row for You've hit your session limit · resets … leaves _auth_state_after_row(...) as None, so a mid-run cap parks the PTY until the per-step timeout instead of surfacing a rotation signal — a regression vs claude -p, which rotated caps via the structured 429.

Fix (standalone, cap-specific fast-fail)

  • Broaden the credential-limit classifier to match the session/usage/weekly/monthly cap envelopes. The proximity + time-token guard is unchanged, so benign prose ("if you hit your limit, nothing resets automatically") still does not classify.
  • Arm the interactive fast-fail on credential-limit with a deterministic, credential-safe message that itself classifies as credential-limit, so the caller treats it as permanent and rotates instead of waiting out the step timeout. Never echoes the raw reset time / account text; kept distinct from an API-tier 429 so it never lands on the 60s rate-limit retry floor.
  • Document the broadened envelope in the prompt; refresh the meta fingerprint.

Scope

Intentionally narrow: the classifier + _auth_state_after_row behavior, its tests, and the prompt/metadata that describe it (4 files). No architecture.json, example, or unrelated test changes.

Cloud consumer / end-to-end status

The pdd_cloud OAuth waterfall has matching credential-limit rotation logic in a companion PR, but end-to-end failover is not yet proven: that companion has an open staging blocker (headless interactive mode hangs before producing the MCP reply). This change stands on its own as a correct, cap-specific fast-fail; it is not a production-ready cloud rollout until the separate interactive/MCP blocker is fixed and re-smoked.

Tests

tests/test_agentic_common.py (credential-limit subset): weekly / 5-hour session / usage envelopes all fast-fail and self-classify as credential-limit; the benign-prose guard keeps "if you hit your limit, nothing resets automatically" out of the class. Full test_agentic_common.py green locally.

@Serhan-Asad
Copy link
Copy Markdown
Collaborator Author

Verification & testing

Full local run (tests/test_agentic_common.py on this branch):

  • 381 passed — the whole interactive-fast-fail + classifier suite, no regressions.
  • ✅ checkup-review-loop credential-limit/429 tests green (the other _classify_permanent_error consumer — confirms the classifier broadening doesn't regress it).

New / changed tests

  • test_detect_interactive_credential_limit_fast_fails — the weekly, 5-hour session, and usage cap envelopes all fast-fail, and the returned message self-classifies as credential-limit (so the caller treats it as permanent and rotates — exactly how the auth message self-classifies as oauth/login).
  • test_detect_interactive_credential_limit_ignores_benign_prose — the time-token guard keeps "if you hit your limit, nothing resets automatically" out of the class → no false fast-fail.
  • test_detect_interactive_auth_failure_ignores_transient_synthetic — the prior session limit-is-transient assertion was moved into the credential-limit case; genuinely transient rows (server rate-limit / dropped socket / timeout) are still ignored.

Behaviors verified

  • A mid-run cap now fast-fails (no ~10-min PTY park) and emits a deterministic, credential-safe credential-limit signal — never echoes the raw reset time / account info.
  • The classifier now catches the session cap ("You've hit your session limit"), which the old hit your limit pattern missed — the cap OAuth tokens actually hit under load.

Companion: the pdd_cloud OAuth waterfall force-rotates on this credential-limit signal (promptdriven/pdd_cloud#1816) — completing the end-to-end failover.

🤖 Generated with Claude Code

@Serhan-Asad
Copy link
Copy Markdown
Collaborator Author

Re: the remaining "no credential-limit consumer in pdd_cloud" concern — the cloud side is a deliberate companion change, not missing. It lands in pdd_cloud PR #1816:

  • extensions/github_pdd_app/src/workers/pdd_executor/waterfall_runner.py: _INTERACTIVE_CREDENTIAL_LIMIT_PATTERNS = ("credential-limit",), force-rotated in the structured-429 guard before the exit-code break branches (load-bearing — a capped interactive run follows real work / non-zero markers, which the cli_did_real_work break would otherwise accept), and again in _is_real_auth_failure ahead of the strong-evidence guard.
  • Tests: test_is_real_auth_failure_force_rotates_on_credential_limit_despite_strong_evidence and test_credential_waterfall_rotates_on_interactive_credential_limit_with_strong_work — the latter asserts on the exact message this PR's _interactive_credential_limit_message emits.

By design the token is dormant until this PR merges. Sequence: merge this → re-pin #1816 to the new pdd main SHA → end-to-end rotation works. credential-limit is a stable control token (not bare English), so it satisfies the "don't rotate on echoed text" invariant the structured-429-only matcher was protecting.

On the meta nit (_run.json deleted, "command": "example"): both are artifacts of the auto-heal's example regeneration, not of testing. The drift-relevant fingerprint — prompt/code/example/test + all include_deps hashes — is fully consistent (verified with the repo's own calculate_prompt_hash/calculate_sha256), so the heal/sync drift check is a no-op. The test run-report (with its coverage figure) gets reconsummated by CI's clean-env pdd test/heal pass; I verified the suite directly here (test_agentic_common*: 495 passed, 0 failed) but deliberately did not hand-write a coverage number into the report rather than fabricate one.

Copy link
Copy Markdown
Contributor

@gltanaka gltanaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decision: do not merge as-is.

The core fix looks needed. On current main, the weekly/overall cap classifies as credential-limit, but the common session/usage cap strings still classify as None, and a synthetic interactive row for You've hit your session limit · resets ... leaves _auth_state_after_row(...) as None. This PR fixes that behavior and the focused credential-limit tests pass locally.

Required changes before merge:

  1. Reduce the PR to the credential-limit fix and directly related tests/prompt metadata. The appended tests/test_agentic_common.py coverage block from roughly lines 9508-9936 is unrelated to #1389 and should be removed or split into a separate PR. It adds duplicate late imports / sys.path mutation and new pylint issues in that block (reimported / wrong-import-position around 9514, 9522, 9722, 9730, 9731, plus unsubscriptable-object on the optional context assertions around 9614-9618 and 9922). It also makes a small production fix look like a 600-line mixed-change review.

  2. Remove or tightly justify the unrelated generated churn. In particular, the architecture.json steering/interface edits and Unicode normalization are not part of the subscription-cap fast-fail, and deleting .pdd/meta/agentic_common_python_run.json should not be bundled unless the final regenerated metadata still proves the narrowed PR is drift-clean.

  3. Update the PR text so it does not imply end-to-end cloud failover is complete. The companion pdd_cloud PR currently documents a staging blocker where headless interactive mode hangs before producing the MCP reply. This pdd change can still land as a standalone cap-specific fast-fail, but it should not be presented as production-ready cloud rollout until that separate interactive/MCP blocker is fixed and re-smoked.

After those changes, I would re-review the narrowed diff; the actual classifier + _auth_state_after_row behavior is the part worth keeping.

@Serhan-Asad Serhan-Asad marked this pull request as draft June 3, 2026 23:27
@Serhan-Asad Serhan-Asad force-pushed the fix/1389-interactive-credential-limit-fastfail branch from 5f65c8c to 9fbf9e0 Compare June 3, 2026 23:27
@Serhan-Asad
Copy link
Copy Markdown
Collaborator Author

Narrowed the PR to just the cap fast-fail — all three points addressed.

1. Diff reduced to the fix. Now 4 files: pdd/agentic_common.py (classifier + _auth_state_after_row arming), tests/test_agentic_common.py (credential-limit tests only), the prompt sentence documenting the broadened envelope, and the meta fingerprint. The appended test_agentic_common.py coverage block (~9508–9936, with the duplicate late imports / sys.path mutation and the pylint hits) is gone — it was auto-heal test_extend churn, not part of this fix.

2. Unrelated generated churn dropped. architecture.json is reverted to main (no steering/interface edits, no Unicode normalization). The SteerEntry/drain_issue_steers interface is pre-existing main drift the auto-heal swept in — it's public in code but absent from main's architecture.json, and this fix never touched it — so it belongs in a separate drift PR, not here. context/agentic_common_example.py is untouched (== main). .pdd/meta/agentic_common_python_run.json is not deleted (kept as main's); it's stale vs the new tests and gets reconsummated by a clean pdd test — I deliberately did not hand-write a coverage figure.

3. PR text corrected. The body no longer claims end-to-end cloud failover. The companion pdd_cloud PR has the matching rotation code but an open staging blocker (headless interactive hangs before the MCP reply), so this lands as a standalone cap-specific fast-fail, not a cloud rollout.

Two items that are your call:

  • Auto-heal will re-bloat this PR. It fires on every synchronize/ready_for_review, re-syncs this module's architecture.json from the pre-existing steering drift, and runs test_extend. I've put the PR in draft to hold the narrow diff; marking it ready (or any push) triggers one more heal that re-adds that churn unless auto-heal is paused for this PR.
  • The steering-drift split (a separate PR for the architecture.json interface backfill) if you want that drift tracked.

The classifier + _auth_state_after_row behavior is unchanged from what you flagged as worth keeping.

@Serhan-Asad Serhan-Asad marked this pull request as ready for review June 3, 2026 23:55
Copy link
Copy Markdown
Contributor

@gltanaka gltanaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current-head decision: still do not merge as-is.

The important parts are better: the architecture.json / example churn is gone, the PR body now accurately scopes this as a standalone cap fast-fail, and the actual classifier + _auth_state_after_row implementation still looks like the right fix. Local verification on 4ca11f357:

  • pytest tests/test_agentic_common.py -k "credential_limit or transient_synthetic" -q → 13 passed
  • pytest tests/test_agentic_common.py -q → 419 passed

The remaining blocker is that auto-heal re-added unrelated generated tests at the end of tests/test_agentic_common.py (# Additional tests appended, around lines 9508-9816). Those tests cover template substitution, control-token detection, classifier LLM fallback, OpenCode parsing, post_final_comment, and JSON extraction, none of which is part of #1389's credential-limit fast-fail. They also reintroduce the late duplicate imports / sys.path mutation pattern around lines 9513-9532 and 9671-9682, plus a missing final newline.

Required change: remove/split that appended test block again, or pause/adjust auto-heal for this PR so it does not keep reintroducing unrelated test_extend output. If generated tests are desired, they should land in a separate PR with that scope called out. This PR should contain only the cap classifier/hook, the focused credential-limit tests, and the matching prompt/meta updates.

Once the diff is back to that focused shape, I think the core fix is mergeable.

@Serhan-Asad Serhan-Asad force-pushed the fix/1389-interactive-credential-limit-fastfail branch from f1bd62e to cb9e3ef Compare June 4, 2026 01:02
@Serhan-Asad Serhan-Asad requested a review from gltanaka June 4, 2026 01:08
@Serhan-Asad
Copy link
Copy Markdown
Collaborator Author

@gltanaka current head cb9e3eff8 is the narrowed 4-file diff again, with the unrelated appended test_extend block removed from tests/test_agentic_common.py.

All current checks on this head are green now, including:

  • Run Unit Tests
  • Public CLI Regression
  • Package Preprocess Smoke
  • auto-heal / heal
  • CodeQL jobs

Re-review when you have a moment.

Copy link
Copy Markdown
Contributor

@gltanaka gltanaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved current head cb9e3eff8.

The diff is now focused on the credential-limit fast-fail: pdd/agentic_common.py, the matching prompt sentence, the meta fingerprint, and targeted tests only. The unrelated appended test_extend block, architecture.json churn, example churn, and run-report changes are gone.

Local verification:

  • pytest tests/test_agentic_common.py -k "credential_limit or transient_synthetic" -q → 13 passed
  • pytest tests/test_agentic_common.py -q → 381 passed
  • Simulated merge of current origin/main into this head completed without conflicts; the focused credential-limit subset still passed.

One administrative note: GitHub currently reports this PR as BEHIND, so branch protection may require updating/rebasing onto main and rerunning checks before merge. I do not see a remaining code-review blocker in the current focused diff.

Serhan-Asad and others added 2 commits June 3, 2026 22:15
On main the weekly/overall cap classifies as credential-limit, but the
common 5-hour session and usage cap strings classify as None, and a
synthetic interactive row for "You've hit your session limit · resets …"
leaves _auth_state_after_row(...) as None — so a mid-run cap parks the PTY
until the per-step timeout instead of surfacing a rotation signal.

- Broaden the credential-limit classifier to match the session/usage/
  weekly/monthly cap envelopes; the proximity + time-token guard is
  unchanged, so benign prose ("if you hit your limit, nothing resets
  automatically") still does not classify.
- Arm the interactive fast-fail on credential-limit with a deterministic,
  credential-safe message that itself classifies as credential-limit, so
  the caller treats it as permanent and rotates instead of waiting out the
  step timeout. Never echoes the raw reset time / account text; kept
  distinct from an API-tier 429 so it never lands on the 60s retry floor.
- Document the broadened envelope in the prompt; refresh meta hashes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@gltanaka gltanaka force-pushed the fix/1389-interactive-credential-limit-fastfail branch from ed49d19 to f1aab77 Compare June 4, 2026 05:16
@gltanaka gltanaka merged commit dcd3368 into main Jun 4, 2026
9 checks passed
gltanaka pushed a commit that referenced this pull request Jun 6, 2026
#1403) (#1440)

* fix(ci): scope-preserving PR auto-heal — never escalate to test_extend (#1403)

PR auto-heal was re-bloating narrow fix PRs: for a Python module whose
tests pass but coverage is below target, `sync_determine_operation`
returns `test_extend`, and `heal_module` routes verify/generate/test/crash
through `pdd sync`, which re-derives the same coverage gap internally and
appends unrelated generated tests (rewriting `.pdd/meta` command to
`test_extend`). This made narrow PRs non-mergeable (e.g. #1390).

Add a single env-var signal, `PDD_DISABLE_TEST_EXTEND`, set only in PR
auto-heal mode (`not skip_ci`) and enforced at two layers:

- Detection (`sync_determine_operation.test_extend_disabled`): the
  coverage-gap branch returns the existing `all_synced` no-op for all
  languages when the flag is set. Because this function is called by both
  the in-process `detect_drift` and the nested `pdd sync`, one branch
  covers both the detection and execution paths the issue requires.
- Execution backstop (`sync_orchestration`): mirror the existing
  non-Python `test_extend` skip — log `test_extend_skipped`, accept the
  current state, and write no test file.
- `ci_drift_heal.main` sets the flag on `os.environ` only around the
  in-process `detect_drift` call (restored in `finally`, no leak) and
  passes it explicitly in the `pdd sync` subprocess env. Push-to-main
  (`--skip-ci`) is unaffected — whole-module coverage growth still runs.
- `detect_drift` now treats `all_synced` as "no drift" (alongside
  nothing/synced) so the guarded no-op is a clean skip, not an
  "unknown operation" heal failure.

Prompts updated to match (source of truth). Regression tests prove: PR
mode suppresses + propagates the flag, push-to-main keeps test_extend,
the orchestrator never appends tests / writes a `test_extend` fingerprint
when suppressed, and an e2e run proves the parent→child env contract.
The flag is default-off, so unset behavior is byte-for-byte unchanged.

Supersedes #1416 (prompts-only, code never regenerated) and #1432
(tests + partial code; missed the all_synced filter and env restore).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(ci): address review — all_synced skip after reclassification; rename helper (#1403)

Codex review round 1 found a real regression: detect_drift skipped
'all_synced' BEFORE the git-based reclassification, so a PR that changed
code without its prompt (and had a low-coverage passing run_report) was
silently dropped instead of being promoted to 'update'. This also
regressed the pre-existing non-Python all_synced coverage-gap path under
--diff-base.

- detect_drift: move the 'all_synced' no-drift skip to AFTER git
  reclassification, so an all_synced module whose code changed without its
  prompt is still promoted to 'update'; only a still-terminal all_synced
  is dropped.
- Rename helper test_extend_disabled() -> is_test_extend_disabled() so
  pytest does not collect it as a test when imported into a test module,
  and so the name reads as a predicate.

New regression tests:
- detect_drift: all_synced + code-only change -> update (not dropped);
  terminal all_synced -> clean skip (never an unknown-operation failure).
- is_test_extend_disabled truthiness incl. falsey (0/false/off/'') and
  whitespace; unset -> False.
- main(): os.environ flag restored even when detect_drift raises, and a
  pre-existing value is restored exactly (not clobbered).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Test User <test@test.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants