Push-rejection loop guard + SSH transport auto-detect#315
Conversation
…nt ones Background: a patch worktree entered a tight retry loop — 54 cycles of Orchestrator.Start → Session_push_failed → Start in 3.5 minutes — because the remote refused every push and onton had no way to tell a transient race apart from a permanent block, so no counter ever fired needs_intervention. This change adds pure decision logic to (1) classify the porcelain '!' rejection from stderr into a small set of named reasons (workflow-scope, branch-protection, push-pattern, lease, hook) and (2) drive a new push_failure_count counter that flips needs_intervention at ≥3, with an immediate short-circuit to session_fallback = Given_up when the rejection is permanent (workflow-scope etc.). The full reject reason also surfaces in the activity log. Adjacent groundwork while the patch_agent / persistence / orchestrator files were already open: a branch_rebased_onto_sha field and a pure Rebase_decision.upstream helper, so a future worktree.ml integration can use the recorded old-base SHA as the rebase --onto anchor and drop stale-dep commits when a transitive dep squash-merges. Plus a tiny detect_main_freshness_drift predicate in Reconciler for the matching poller integration. Both are pure-side only; the effectful wiring lands in follow-up commits. New pure modules under lib_core/ (each with property tests): - push_reject_classify (14 PRC-* properties) - oauth_scopes (10 OSC-* properties — used by P0-E later) - rebase_decision (5 RD-* properties) Patch_agent: push_failure_count (threshold ≥3) and branch_rebased_onto_sha fields, increment_/reset_ helpers, threshold disjunct in needs_intervention, bulk reset in reset_intervention_state, restore parameter + persistence round-trip. Orchestrator: Session_push_failed now carries Push_reject_classify.rejection option (Some for server rejections, None for transport-level Push_error); apply_session_result increments the counter and, on is_permanent reasons, jumps straight to Given_up. Session_driver / runner_fiber_impl: activity-log line uses short_label, with a follow-up event carrying the captured stderr line on permanent rejections. Properties added or extended: PSF-4..8 (counter + permanent escalation), CV-6 (3-iteration convergence without Human), CV-7 (immediate Given_up on permanent reasons), MFD-1..5 (main-freshness predicate totality and boundaries). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Onton's managed clone at ~/.local/share/onton/<project>/repo was always HTTPS so the GIT_ASKPASS shim in lib/git_env.ml could inject the OAuth token. That structurally forced every push through the per-OAuth-scope restrictions GitHub enforces (workflow scope, etc.), even when the user's own clone of the same repo used SSH and would not have hit those restrictions. This commit makes the managed clone inherit the user's transport. Before cloning, scan a small set of conventional locations ($PWD/.., ~/code-src/, ~/src/, ~/code/, ~/dev/, ~/projects/) for a sibling clone of owner/repo. If any sibling's origin is SSH, pick git@github.com:owner/repo.git for the managed clone; otherwise default to https://github.com/owner/repo.git as before. The resolved transport is announced on stderr at first run and persisted to config.json so subsequent invocations are stable. An explicit override (intended for a future --clone-scheme CLI flag, or anyone editing config.json) wins over auto-detect. When a managed clone already exists on disk, its current origin URL is authoritative — we don't flip transports mid-life. Pure changes (lib_core/github_target): - new url_scheme variant (Https | Ssh) with yojson + show + eq - clone_url is now scheme-aware - new scheme_of_url parser and pure resolve_scheme decision - 9 SSH/scheme property tests added (32 total in the suite) Effectful changes (lib/managed_repo): - sibling_clone_candidates / read_remote_urls / discover_sibling_clones (best-effort, total on failure) - resolve_clone_scheme with the one-line stderr announce - ensure_managed_repo now returns (repo_root, resolved_scheme); the caller persists the resolved value to config.json Schema: - Project_store.stored_config gains url_scheme : string option (None on legacy configs; back-filled on next ensure_managed_repo) - save_config gains an optional ~url_scheme parameter Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
a scope a patch in the gameplan will need When a patch modifies .github/workflows/*.yml under a token without the workflow OAuth scope, GitHub silently refuses the push with a clear remote-rejected message. With the push-reject classifier in place we now route those failures straight to needs_intervention, but the user finds out at first push attempt — after the agent has already written the change. This commit adds a startup check that hits /user with the configured token, reads X-OAuth-Scopes, and compares against the scopes the gameplan files would require. Missing scopes emit a clear stderr warning with the suggested gh auth refresh fix. The check is gated on the resolved managed-clone transport: SSH doesn't consult OAuth scopes, so the warning is a false positive there and we skip it entirely. Github.fetch_oauth_scopes_for is a thin wrapper around the existing cohttp-eio plumbing that returns the parsed scope list, plus a public re-export from github.mli that mirrors make's argument shape so the preflight handler doesn't need to thread a Github.t. The pure Oauth_scopes module shipped in the previous commit; this is the effectful side. Scope_preflight.run is intended to be called from bin/main.ml after Managed_repo.ensure_managed_repo returns the resolved transport scheme and before any patch sessions start; that wiring is not part of this commit (bin/main.ml's Cmdliner dependency is currently broken in the dev env), but the public API is in place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an Authentication section to AGENTS.md covering the HTTPS+OAuth default, the workflow-scope requirement, the SSH auto-detect heuristic inherited from a sibling user clone, and the gh auth setup-git tip for running git by hand inside a worktree. Mirrors the same content into README.md as a sub-section under the existing GitHub authentication discussion, with the additional context that SSH bypasses per-OAuth-scope restrictions entirely (which is the actual escape hatch for users who hit workflow-scope errors and don't want to gh auth refresh). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the [branch_rebased_onto_sha] field added earlier into the actual rebase machinery. Two halves: (1) [Worktree.rebase_onto] now takes [?prev_base_sha:string option] and threads it into the fallback path. The default branch (find_old_base returns Ok with cherry-pick-isolated commits) is unchanged. When find_old_base returns Error AND the orchestrator recorded a SHA the previous base resolved to at the last rebase / start, the supervisor now runs [git rebase --onto target sha] instead of the legacy plain [git rebase target]. That drops commits whose content was absorbed into a squash-merge on origin/main (the patch-6 case) — only the patch's own commits in [sha..HEAD] are replayed. When [prev_base_sha] is [None] the behavior is bit-for-bit the legacy plain rebase, so agents whose [branch_rebased_onto_sha] was never recorded keep working exactly as before. An optional argument terminator [()] is added because OCaml can't otherwise erase the optional; mechanical update at every call site. (2) After a successful rebase the runner fiber reads [origin/<new_base>]'s resolved SHA via the new [W.read_branch_sha] helper, and stores it on the agent via [Orchestrator.set_branch_rebased_onto_sha]. The next rebase therefore has the right anchor. [Worktree.S] gains [read_branch_sha] and the dependency-injection test stub gains a [None] implementation. The integration tests in [test/test_rebase_onto.ml] all keep passing — the new code path is inactive (prev_base_sha defaults to None there). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Threads the [url_scheme] resolved by [Managed_repo.ensure_managed_repo] through [repo_coords] / [finalize_run] into [Project_store.save_config] so subsequent runs read the same value from [config.json] rather than re-detecting from sibling clones (which can move). The resume path reads the persisted value via [Managed_repo.url_scheme_of_string]. At the top of [Eio_main.run], invokes [Scope_preflight.run] with the network + clock capabilities, the resolved scheme, and the gameplan. The scheme is re-read from [config.json] at the Eio entry rather than plumbed through [Resolved_config.t] — the persistence happened inside [finalize_run] which ran before [Eio_main.run], so the file is already up-to-date by the time the preflight needs it. Gating on [Github_target.Https] inside [Scope_preflight.run] means SSH clones skip the preflight (the warning would be a false positive there). Effect for users: on first push attempt that would have been rejected with [refusing to allow an OAuth App to create or update workflow …], they now see a clear stderr warning at startup with the suggested [gh auth refresh -h github.com -s workflow] fix. The push-time classifier still routes the failure to needs_intervention if they proceed regardless. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pure [Reconciler.detect_main_freshness_drift] predicate shipped
earlier compares local-main-sha to origin-main-sha. This commit wires
it into the poller loop so that when a squash-merge lands on
origin/main, every agent currently based on main gets a [Rebase]
enqueued automatically — without it, [detect_stale_bases] passes
because [base_branch] and [Graph.initial_base] both read the same
stale branch name and never disagree.
Per loop tick (before the per-patch poll fan-out):
1. [W.fetch_origin] on the managed repo root.
2. Read [main] and [origin/main] SHAs via [W.read_branch_sha].
3. If [detect_main_freshness_drift] fires AND we haven't already
announced this particular [origin_sha] (a [last_main_sha] ref
prevents log + enqueue spam when the next loop iteration sees the
same drift), log the transition and enqueue Rebase for every
agent on main that has a PR, isn't merged, and doesn't already
have a Rebase queued.
[W.fetch_origin] races with per-worktree fetches in the runner fiber on
shared ref storage — pre-existing condition I'm not fixing here.
Best-effort: any "cannot lock ref" defers this check by one
[poll_interval], and the rebase plan executor's [Fetch_origin] step
still runs from the worktree side so the rebase target ends up
fresh.
The fast-forward step from the plan (advance local main to match
origin) is intentionally not done here: rebases use
[origin/<main>] directly via [Worktree_plan.for_rebase], so once the
poller wakes the agents, the rebase targets are correct regardless of
what local main resolves to in the managed-clone HEAD.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (7)
📝 WalkthroughWalkthroughAdds push-rejection classification, scheme-aware managed clones (HTTPS/SSH) with persistence and CLI override, per-patch push-failure tracking and rebase-SHA anchoring, a shared fetch mutex for serialized git fetches, main-branch freshness detection, and corresponding rebase/push flow and tests updates. ChangesPush Rejection Classification & Tests
GitHub target & Managed clone scheme
Fetch mutex and environment wiring
Patch agent state and persistence
Worktree rebase & push integration
Main freshness detection & tests
Estimated code review effort 🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Review SummaryThe PR is well-structured with thorough property coverage for all new pure modules and clean separation of pure/effectful code. Three findings:
Variant: convergence-v2 Candidates: 3 | Posted: 3 | Suppressed: 0 3 comments posted · Model: |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
lib/github.ml (1)
535-596: 💤 Low valueConsider extracting common HTTP client setup (optional refactor).
The TLS config and client initialization (lines 552-570) duplicate the same logic in the
requestfunction (lines 473-494). While the duplication is localized and the functions have different purposes (body vs header extraction), extracting a sharedmake_clienthelper would reduce maintenance burden if the setup ever needs updating.Not blocking since the duplication is manageable and the use case is specialized.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/github.ml` around lines 535 - 596, The TLS config and Cohttp_eio client creation are duplicated between fetch_oauth_scopes and the request function; extract a shared helper (e.g. make_client) that calls https_config(), builds the https_fun wrapper and returns a Cohttp_eio.Client.t (or a tuple of client and tls_config if needed), then replace the inline TLS/client setup in fetch_oauth_scopes and in request with calls to make_client so both reuse the same initialization path (refer to functions fetch_oauth_scopes, request, https_config, and https_fun to implement and wire the helper).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib_core/oauth_scopes.ml`:
- Around line 27-35: required_for_files currently returns [Workflow] when
workflows are touched and [] otherwise, omitting the mandatory Repo scope;
update the function required_for_files to always include the Repo scope and add
Workflow only when paths touch .github/workflows (i.e., return [Repo; Workflow]
when touches_workflows is true, otherwise return [Repo]), referencing the
Workflow and Repo variants to locate the change.
In `@test/test_github_target_properties.ml`:
- Around line 409-417: Remove the unnecessary try...with wrappers around calls
to pure/total functions in the QCheck properties: specifically edit
prop_scheme_of_url_total (and the similar property at lines ~457-474) to stop
catching exceptions and instead directly call Github_target.scheme_of_url (and
the other pure function) in the property body, returning the boolean result
(e.g., call the function, ignore its result, and return true) so that any
unexpected raises surface as test failures rather than being swallowed.
---
Nitpick comments:
In `@lib/github.ml`:
- Around line 535-596: The TLS config and Cohttp_eio client creation are
duplicated between fetch_oauth_scopes and the request function; extract a shared
helper (e.g. make_client) that calls https_config(), builds the https_fun
wrapper and returns a Cohttp_eio.Client.t (or a tuple of client and tls_config
if needed), then replace the inline TLS/client setup in fetch_oauth_scopes and
in request with calls to make_client so both reuse the same initialization path
(refer to functions fetch_oauth_scopes, request, https_config, and https_fun to
implement and wire the helper).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7d61a156-3352-408d-bdad-8d5f5ac14db1
📒 Files selected for processing (49)
AGENTS.mdREADME.mdbin/main.mllib/event_log.mllib/github.mllib/github.mlilib/managed_repo.mllib/managed_repo.mlilib/orchestrator.mllib/orchestrator.mlilib/persistence.mllib/poller_fiber.mllib/project_store.mllib/project_store.mlilib/runner_fiber_impl.mllib/scope_preflight.mllib/scope_preflight.mlilib/session_driver.mllib/worktree.mllib/worktree.mlilib/worktree_plan_executor.mllib_core/github_target.mllib_core/github_target.mlilib_core/oauth_scopes.mllib_core/oauth_scopes.mlilib_core/patch_agent.mllib_core/patch_agent.mlilib_core/push_reject_classify.mllib_core/push_reject_classify.mlilib_core/rebase_decision.mllib_core/rebase_decision.mlilib_core/reconciler.mllib_core/reconciler.mlilib_core/worktree_parser.mllib_test/test_generators.mltest/dunetest/test_dependency_injection_functors.mltest/test_github_target_properties.mltest/test_interleaving_properties.mltest/test_oauth_scopes_properties.mltest/test_orchestrator_properties.mltest/test_patch_agent.mltest/test_patch_controller.mltest/test_poll_log_properties.mltest/test_properties.mltest/test_push_reject_classify_properties.mltest/test_rebase_decision_properties.mltest/test_rebase_onto.mltest/test_reconciler_properties.ml
The SSH auto-detect heuristic from the previous commit picks a transport from the surrounding filesystem; sometimes the user wants to force one explicitly (e.g. running onton in a context where the sibling clone is misleading, or pinning HTTPS in a script). This commit exposes a CLI flag. Precedence on each run: CLI --clone-scheme > persisted config.json url_scheme > auto-detect Both the fresh / gameplan path and the resume path honor the override. Invalid values (anything other than [https] or [ssh]) exit with a clear error. After [ensure_managed_repo] returns the resolved scheme, [finalize_run] still persists it to config.json so the override only needs to be passed once. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before: the runner fiber created its own fetch_mutex (per-fiber lifetime) and the poller's main-freshness fetch added on the previous commit also created its own. Both serialize their own internal calls but the two mutexes don't see each other — two simultaneous [git fetch origin] on the same managed repo (one from a runner action, one from the poller tick) race on the shared [refs/remotes/origin/*] ref store and the loser gets "cannot lock ref ...: is at X but expected Y". Best-effort tolerance was added in the previous commit; this commit eliminates the race entirely. Promotes [fetch_mutex] into the [Run_env.S] module type so every fiber that includes Run_env (Runner, Poller, Tui, the per-fiber sub-envs for Worktree_setup / Session_driver) sees the same mutex. The mutex is created once at the top of [build_capabilities] in [bin/main.ml], stored on [constructed_capabilities], and threaded through [FIBER_ENV] to every sub-fiber env. Runner_fiber_impl and Poller_fiber both drop their local creation and read [Env.fetch_mutex] instead. The dependency-injection functor test stubs gain a matching field (three real mutexes + one [Obj.magic ()] for a stub that pre-existed that pattern). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before: the check ran at the top of every poll loop iteration,
blocking the per-patch poll fan-out behind a [git fetch] for
origin/<main>. A slow fetch (e.g. cold connection, large repo, network
hiccup) delayed every patch's GitHub poll by the same amount.
This commit splits the poller into two parallel fibers via
[Eio.Fiber.both]:
- the existing patch-poll loop (PR state, CI, base updates, etc.)
- a new freshness_loop that runs [check_main_freshness] then sleeps
[poll_interval]
The two share state through [Runtime.update_orchestrator] (used inside
[check_main_freshness] to enqueue Rebase on detected drift) and serialize
their [git fetch] calls through [Env.fetch_mutex] (the shared mutex
added in the previous commit), so there's no ref-store race.
Cadence for the freshness loop is the same [poll_interval] as the
patch loop; could be tuned independently later (probably slower, since
main usually moves less often than per-patch state) but the symmetry
keeps the operational model simple for now.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Review SummaryThis delta push wires together the previously reviewed pure modules (push-reject classification, SSH transport, OAuth scope preflight, rebase-decision, main-freshness drift) into the effectful fiber layer. The code is generally correct and the property-test coverage is comprehensive. One pre-existing note: the previously-filed outstanding finding at The remaining two previously-filed outstanding comments ( No new must-fix or should-fix issues were found in the changed lines of this push.
Variant: convergence-v2 Candidates: 0 | Posted: 0 | Suppressed: 0 0 comments posted · Model: |
a06aa1a to
94fda74
Compare
Review Summary
Variant: convergence-v2 Candidates: 0 | Posted: 0 | Suppressed: 0 0 comments posted · Model: |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
lib/runner_fiber_impl.ml (1)
909-917:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPreserve the recorded upstream SHA on noop rebases.
Worktree.Noopstill means the branch is already aligned withorigin/<base>. Leavingpost_rebase_shaasNonethere keeps the old/emptybranch_rebased_onto_sha, so the nextprev_base_sha-driven rebase can fall back to the wrong--ontoanchor after squash-merges.Suggested fix
let post_rebase_sha = match rebase_result with - | Worktree.Ok -> + | Worktree.Ok | Worktree.Noop -> W.read_branch_sha ~path:wt_path ~ref_name:("origin/" ^ Branch.to_string new_base) - | Worktree.Noop | Worktree.Conflict _ - | Worktree.Error _ -> + | Worktree.Conflict _ | Worktree.Error _ -> None inAlso applies to: 943-947
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/runner_fiber_impl.ml` around lines 909 - 917, The pattern match that sets post_rebase_sha skips recording the branch SHA when rebase_result is Worktree.Noop, which causes branch_rebased_onto_sha to remain empty and later prev_base_sha-driven rebases to use the wrong --onto anchor; update the match for rebase_result in the post-rebase logic so that both Worktree.Ok and Worktree.Noop call W.read_branch_sha ~path:wt_path ~ref_name:("origin/" ^ Branch.to_string new_base) (i.e., preserve the recorded upstream SHA on noop rebases), and apply the same change to the second occurrence of this pattern (the block referenced around lines 943-947) so branch_rebased_onto_sha is consistently set after noop or ok results.bin/main.ml (1)
1259-1265:⚠️ Potential issue | 🟠 Major | ⚡ Quick winTeach PR-op stripping that
--clone-schemeconsumes the next argv token.
extract_pr_ops_from_argvcurrently treats the token after--clone-schemeas a positional argument.onton PROJECT --clone-scheme ssh +123will fail before Cmdliner gets control.Suggested fix
- (* Keep this in sync with Cmdliner [opt] arguments below: - --gameplan, --token, --backend, --model, --repo, --main-branch, - --poll-interval, and --max-concurrency. *) + (* Keep this in sync with Cmdliner [opt] arguments below: + --gameplan, --token, --backend, --model, --repo, --main-branch, + --poll-interval, --max-concurrency, and --clone-scheme. *) match s with | "--gameplan" | "--token" | "--backend" | "--model" | "--repo" - | "--main-branch" | "--poll-interval" | "--max-concurrency" -> + | "--main-branch" | "--poll-interval" | "--max-concurrency" + | "--clone-scheme" -> true🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@bin/main.ml` around lines 1259 - 1265, The PR-op extraction logic treats the token after "--clone-scheme" as positional; update extract_pr_ops_from_argv to mark "--clone-scheme" as an option that consumes the next argv token (similar to "--gameplan"/"--token" handling) so the following value isn't mis-parsed as a PR op; locate extract_pr_ops_from_argv and add "--clone-scheme" to the set of recognized consuming options (or implement a branch that skips the next element when encountering "--clone-scheme") so "onton PROJECT --clone-scheme ssh +123" correctly keeps "ssh" as the clone scheme and "+123" as the PR op.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@bin/main.ml`:
- Around line 597-611: The resolved clone scheme returned by
Managed_repo.ensure_managed_repo is currently ignored so the code keeps
persisting the old stored Project_store.url_scheme; update the resume path to
capture the effective scheme returned by Managed_repo.ensure_managed_repo and
persist that value into repo_coords.url_scheme (and the stored Project_store)
instead of reusing stored_scheme or rebuild from old config; locate usages
around clone_scheme_override, stored_scheme, effective_scheme and ensure the
value produced by Managed_repo.ensure_managed_repo is threaded into subsequent
logic (including the OAuth preflight decision) and written back to persistent
store.
---
Outside diff comments:
In `@bin/main.ml`:
- Around line 1259-1265: The PR-op extraction logic treats the token after
"--clone-scheme" as positional; update extract_pr_ops_from_argv to mark
"--clone-scheme" as an option that consumes the next argv token (similar to
"--gameplan"/"--token" handling) so the following value isn't mis-parsed as a PR
op; locate extract_pr_ops_from_argv and add "--clone-scheme" to the set of
recognized consuming options (or implement a branch that skips the next element
when encountering "--clone-scheme") so "onton PROJECT --clone-scheme ssh +123"
correctly keeps "ssh" as the clone scheme and "+123" as the PR op.
In `@lib/runner_fiber_impl.ml`:
- Around line 909-917: The pattern match that sets post_rebase_sha skips
recording the branch SHA when rebase_result is Worktree.Noop, which causes
branch_rebased_onto_sha to remain empty and later prev_base_sha-driven rebases
to use the wrong --onto anchor; update the match for rebase_result in the
post-rebase logic so that both Worktree.Ok and Worktree.Noop call
W.read_branch_sha ~path:wt_path ~ref_name:("origin/" ^ Branch.to_string
new_base) (i.e., preserve the recorded upstream SHA on noop rebases), and apply
the same change to the second occurrence of this pattern (the block referenced
around lines 943-947) so branch_rebased_onto_sha is consistently set after noop
or ok results.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e2cfd56b-44f5-43fb-8985-7c058cf1c8d3
📒 Files selected for processing (6)
bin/main.mllib/poller_fiber.mllib/run_env.mllib/run_env.mlilib/runner_fiber_impl.mltest/test_dependency_injection_functors.ml
Review SummaryOne new issue in this push:
Variant: convergence-v2 Candidates: 1 | Posted: 1 | Suppressed: 0 1 comment posted · Model: |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
bin/main.ml (1)
1236-1247:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRegister
--clone-schemeas a value-taking option in argv preprocessing.
extract_pr_ops_from_argvonly preserves separate option values for names listed incli_option_takes_value. Because--clone-schemeis missing there,onton PROJECT --clone-scheme sshtreatssshas a stray positional argument instead of the option value.Suggested fix
- (* Keep this in sync with Cmdliner [opt] arguments below: - --gameplan, --token, --backend, --model, --repo, --main-branch, - --poll-interval, and --max-concurrency. *) + (* Keep this in sync with Cmdliner [opt] arguments below: + --gameplan, --token, --backend, --model, --repo, --main-branch, + --clone-scheme, --poll-interval, and --max-concurrency. *) match s with | "--gameplan" | "--token" | "--backend" | "--model" | "--repo" - | "--main-branch" | "--poll-interval" | "--max-concurrency" -> + | "--main-branch" | "--clone-scheme" | "--poll-interval" + | "--max-concurrency" -> trueAlso applies to: 1351-1364
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@bin/main.ml` around lines 1236 - 1247, The argv preprocessing function cli_option_takes_value currently omits "--clone-scheme", causing extract_pr_ops_from_argv to treat its value as a positional argument; update cli_option_takes_value to include the "--clone-scheme" option in the match arm that returns true (the same arm that contains "--gameplan", "--token", "--backend", "--model", "--repo", "--main-branch", "--poll-interval", and "--max-concurrency") so that "--clone-scheme <value>" is preserved as a single option/value pair during argv parsing.
♻️ Duplicate comments (1)
bin/main.ml (1)
596-633:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPersist the scheme returned by
ensure_managed_repoon resume.Line 612 discards the resolved scheme, and Lines 629-632 rebuild
repo_coords.url_schemefrom the old stored value. Resume can therefore refresh the checkout via SSH or--clone-scheme, then immediately re-save stale config and flip back on the next run.Suggested fix
- match - Managed_repo.ensure_managed_repo - ~clone_scheme:effective_scheme ~project_name:proj - ~token ~owner ~repo () - with - | Ok _ -> () - | Error msg -> - Printf.eprintf - "onton: warning: %s (resuming with local state)\n\ - %!" - msg); + let resolved_scheme = + match + Managed_repo.ensure_managed_repo + ~clone_scheme:effective_scheme ~project_name:proj + ~token ~owner ~repo () + with + | Ok (_repo_root, scheme) -> Some scheme + | Error msg -> + Printf.eprintf + "onton: warning: %s (resuming with local state)\n\ + %!" + msg; + stored_scheme + in @@ - url_scheme = - Managed_repo.url_scheme_of_string - (Option.value stored.Project_store.url_scheme - ~default:""); + url_scheme = resolved_scheme;Based on learnings Onton auto-detects SSH transport when a sibling clone of
owner/repoexists and persists the managed clone transport asurl_schemeinconfig.json.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@bin/main.ml` around lines 596 - 633, The call to Managed_repo.ensure_managed_repo currently discards the resolved clone scheme (match ... | Ok _ -> ()), then later repo_coords.url_scheme is rebuilt from the old stored.Project_store.url_scheme, so any auto-detected or --clone-scheme resolution is not persisted; change the ensure_managed_repo match to capture the returned scheme (e.g. | Ok resolved_scheme -> ...), use that resolved_scheme when constructing repo_coords.url_scheme (instead of calling Managed_repo.url_scheme_of_string on stored.Project_store.url_scheme), and update/persist stored.Project_store.url_scheme with the resolved value before continuing so resume saves the effective transport (reference symbols: Managed_repo.ensure_managed_repo, repo_coords.url_scheme, stored.Project_store.url_scheme).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@bin/main.ml`:
- Around line 1236-1247: The argv preprocessing function cli_option_takes_value
currently omits "--clone-scheme", causing extract_pr_ops_from_argv to treat its
value as a positional argument; update cli_option_takes_value to include the
"--clone-scheme" option in the match arm that returns true (the same arm that
contains "--gameplan", "--token", "--backend", "--model", "--repo",
"--main-branch", "--poll-interval", and "--max-concurrency") so that
"--clone-scheme <value>" is preserved as a single option/value pair during argv
parsing.
---
Duplicate comments:
In `@bin/main.ml`:
- Around line 596-633: The call to Managed_repo.ensure_managed_repo currently
discards the resolved clone scheme (match ... | Ok _ -> ()), then later
repo_coords.url_scheme is rebuilt from the old stored.Project_store.url_scheme,
so any auto-detected or --clone-scheme resolution is not persisted; change the
ensure_managed_repo match to capture the returned scheme (e.g. | Ok
resolved_scheme -> ...), use that resolved_scheme when constructing
repo_coords.url_scheme (instead of calling Managed_repo.url_scheme_of_string on
stored.Project_store.url_scheme), and update/persist
stored.Project_store.url_scheme with the resolved value before continuing so
resume saves the effective transport (reference symbols:
Managed_repo.ensure_managed_repo, repo_coords.url_scheme,
stored.Project_store.url_scheme).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 94bc132f-035a-4f7c-90f5-6465d5358464
📒 Files selected for processing (6)
bin/main.mllib/orchestrator.mllib_core/push_reject_classify.mltest/dunetest/test_github_target_properties.mltest/test_properties.ml
💤 Files with no reviewable changes (1)
- test/dune
The sibling scan only worked when the user happened to have another GitHub clone in one of a handful of conventional directories, and could pick SSH even on machines where SSH to GitHub didn't actually work (no key, blocked port 22, passphrase + no agent). Swap it for a non-interactive [git ls-remote -h git@github.com:owner/repo.git] with [BatchMode=yes] and [ConnectTimeout=5] — exit 0 means SSH can authenticate to GitHub and reach this repo, which is the same condition SSH push needs. The resolved scheme is still persisted to config.json so the probe only runs on the first invocation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Review SummaryOne new
Variant: convergence-v2 Candidates: 1 | Posted: 1 | Suppressed: 0 1 comment posted · Model: |
…fiber Three findings from the review of 116df0a, all confirmed against current code: 1. bin/main.ml resume path discarded the scheme returned by [ensure_managed_repo] (matched [Ok _]) and rebuilt [repo_coords.url_scheme] from [stored.url_scheme], so a CLI override or a probe-determined scheme on a legacy config never reached [finalize_run]'s [save_config] call. Capture the resolved scheme and thread it through. 2. [Managed_repo.ensure_managed_repo] called [resolve_clone_scheme] unconditionally before the [has_git_dir] check. On a resume with a pre-existing clone (the common case), this ran a 5-second SSH reachability probe whose result was then discarded in favor of the on-disk [origin] URL, and printed a misleading "cloning managed repo via X" line even though no clone was performed. Defer the probe into the two clone branches so it only runs when we're actually about to clone. 3. [Poller_fiber.freshness_loop] fired [check_main_freshness] at t=0 before sleeping, duplicating the fetch that [ensure_managed_repo] already ran at startup. Sleep first to match the main poll loop's cadence and avoid the redundant fetch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| (`git@github.com:owner/repo.git`), the managed clone at | ||
| `~/.local/share/onton/<project>/repo` is created with the same SSH | ||
| transport, and pushes go through the user's ssh-agent / `~/.ssh/config`. | ||
| SSH bypasses the per-OAuth-scope restrictions entirely. The chosen |
There was a problem hiding this comment.
[SHOULD FIX] AGENTS.md:120 — The SSH section in the new ## Authentication block still describes the old sibling-clone heuristic (commit #2, which was superseded by commit #12). It says onton "auto-detects whether the user already maintains a clone of owner/repo under one of $PWD/.., ~/code-src/, …" and references the log message onton: detected SSH sibling clone at … — cloning managed repo via SSH.
The actual implementation introduced in this PR (resolve_clone_scheme / probe_ssh_available in lib/managed_repo.ml) does no sibling-clone scanning. It runs a non-interactive git ls-remote -h git@github.com:owner/repo.git probe and emits:
onton: SSH probe to git@github.com succeeded — cloning managed repo via SSHonton: SSH probe to git@github.com unavailable — cloning managed repo via HTTPS
A user following these docs will look for a sibling clone that doesn't exist, and the log messages won't match. The section needs to describe the probe-based detection and the actual emitted messages.
Reviewed at 2173002
| workflow … without 'workflow' scope` message — and onton now classifies that | ||
| as `workflow_scope_missing` and routes the agent straight to | ||
| `needs_intervention` instead of retrying. | ||
|
|
There was a problem hiding this comment.
[SHOULD FIX] README.md:200 — Same stale-docs issue as AGENTS.md: the #### SSH transport section added in this PR still documents the old sibling-clone heuristic ($PWD/.., ~/code-src/, ~/src/, ~/code/, ~/dev/, ~/projects/, and onton: detected SSH sibling clone at …). The implementation shipped in this PR is probe-based and emits onton: SSH probe to git@github.com succeeded — cloning managed repo via SSH — no sibling-clone directory scanning is involved. The section should be updated to reflect the actual probe mechanism, its fallback behaviour, and the correct log messages.
Reviewed at 2173002
Review SummaryDelta review (round 5). Two
No previously-raised findings are re-raised. The
Variant: convergence-v2 Candidates: 2 | Posted: 2 | Suppressed: 0 2 comments posted · Model: |
--force-with-lease alone only verifies that the remote tracking ref matches actual remote — once a background fetch refreshes that tracking ref, the lease passes even when local is strictly behind, silently wiping remote commits. PR #315 was force-pushed empty and auto-closed this way. --force-if-includes additionally requires that the remote tip is in the local branch's reflog, which refuses pushes that would lose commits the local clone never observed. Requires git ≥ 2.30 (already a prerequisite for the worktree features onton relies on). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Worktree.create] used to take the local branch ref as authoritative when it existed in the user's working clone. The user's [refs/heads/<branch>] could be stale (left at the PR's base, behind the real PR head), so the worktree would start missing every PR commit — exactly the state that wiped PR #315 when the session pushed it. Split the decision into a pure module [Start_point_plan] in lib_core/ that takes observed local/remote-ref SHAs and a precomputed two-way ancestry, returns one of [Reset_and_use_remote_tracking], [Use_local_branch_unchanged], [Create_new_branch_from_base], or a [refusal] for the diverged/local-ahead cases. The effectful side in [Worktree.create] gathers the inputs, calls the planner, executes the chosen action via [git worktree add -B/-b]/[add], and returns [(t, refusal) Result.t]. [Worktree_setup.ensure_worktree] runs a new [fetch_origin_branch] before [create] so the planner observes a fresh remote ref. Property tests (SPP-1..9) cover totality, determinism, the "no-silent-clobber" invariant, remote authority, label bounds, and exhaustive variant reachability. An integration test exercises every arm against real git fixtures, including a direct reproduction of the #315 stale-local-ref state. Refusals are logged through the existing activity-log channel and return [None] from [ensure_worktree], so the next scheduling tick sees the agent idle. Future commits will route refusals through [needs_intervention] explicitly via [Push_reject_classify]. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
[Worktree.force_push_with_lease] used to gate on [git rev-list --count <base>..HEAD] (worktree HEAD) and then push the named branch — so an agent that [git switch]ed to a different local branch mid-session could pass the gate with commits the push command would never upload. Combined with --force-with-lease's blind spot for local commit loss (the lease passes whenever the local tracking ref matches actual remote, even if local strictly lacks remote commits), this put PR #315 in the position where a push of a stale branch wiped its remote. New pure module [Push_plan] in lib_core/ takes the observed worktree HEAD, branch ref SHA, remote tracking SHA, two-way ancestry, and [base..refs/heads/<branch>] commit count, and returns [Push (Force_push_if_includes | Initial_push)] or [Refuse (No_commits_ahead_of_base | Worktree_missing | Branch_ref_missing | Branch_switched | Local_missing_remote_commits)]. The named branch — never HEAD — is what gets measured and pushed. [Push_reject_classify.rejection] gains a [Local_state_unsafe] variant (permanent), and [Push_plan.to_push_reject_classify_rejection] maps the three "supervisor said no" refusals onto it so the orchestrator's existing [is_permanent] → [needs_intervention] escalation applies without new state. Property tests (PPP-1..11) cover totality, branch-switch refusal, local-missing-remote refusal, zero-commits skip, initial-push, happy-path, worktree-missing pre-emption, determinism, exhaustive variant reachability, refusal-to-rejection permanence, and label bounds. PRC tests extend their existing variant scans for Local_state_unsafe. An integration test drives real-git fixtures for the branch-switched and local-missing-remote refusal arms plus the happy-path force-push, asserting nothing reaches the remote when the planner refuses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#318) * Add --force-if-includes to session-end force-push --force-with-lease alone only verifies that the remote tracking ref matches actual remote — once a background fetch refreshes that tracking ref, the lease passes even when local is strictly behind, silently wiping remote commits. PR #315 was force-pushed empty and auto-closed this way. --force-if-includes additionally requires that the remote tip is in the local branch's reflog, which refuses pushes that would lose commits the local clone never observed. Requires git ≥ 2.30 (already a prerequisite for the worktree features onton relies on). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Plan worktree start point against fresh remote refs [Worktree.create] used to take the local branch ref as authoritative when it existed in the user's working clone. The user's [refs/heads/<branch>] could be stale (left at the PR's base, behind the real PR head), so the worktree would start missing every PR commit — exactly the state that wiped PR #315 when the session pushed it. Split the decision into a pure module [Start_point_plan] in lib_core/ that takes observed local/remote-ref SHAs and a precomputed two-way ancestry, returns one of [Reset_and_use_remote_tracking], [Use_local_branch_unchanged], [Create_new_branch_from_base], or a [refusal] for the diverged/local-ahead cases. The effectful side in [Worktree.create] gathers the inputs, calls the planner, executes the chosen action via [git worktree add -B/-b]/[add], and returns [(t, refusal) Result.t]. [Worktree_setup.ensure_worktree] runs a new [fetch_origin_branch] before [create] so the planner observes a fresh remote ref. Property tests (SPP-1..9) cover totality, determinism, the "no-silent-clobber" invariant, remote authority, label bounds, and exhaustive variant reachability. An integration test exercises every arm against real git fixtures, including a direct reproduction of the #315 stale-local-ref state. Refusals are logged through the existing activity-log channel and return [None] from [ensure_worktree], so the next scheduling tick sees the agent idle. Future commits will route refusals through [needs_intervention] explicitly via [Push_reject_classify]. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Plan pushes against worktree state, not just lease [Worktree.force_push_with_lease] used to gate on [git rev-list --count <base>..HEAD] (worktree HEAD) and then push the named branch — so an agent that [git switch]ed to a different local branch mid-session could pass the gate with commits the push command would never upload. Combined with --force-with-lease's blind spot for local commit loss (the lease passes whenever the local tracking ref matches actual remote, even if local strictly lacks remote commits), this put PR #315 in the position where a push of a stale branch wiped its remote. New pure module [Push_plan] in lib_core/ takes the observed worktree HEAD, branch ref SHA, remote tracking SHA, two-way ancestry, and [base..refs/heads/<branch>] commit count, and returns [Push (Force_push_if_includes | Initial_push)] or [Refuse (No_commits_ahead_of_base | Worktree_missing | Branch_ref_missing | Branch_switched | Local_missing_remote_commits)]. The named branch — never HEAD — is what gets measured and pushed. [Push_reject_classify.rejection] gains a [Local_state_unsafe] variant (permanent), and [Push_plan.to_push_reject_classify_rejection] maps the three "supervisor said no" refusals onto it so the orchestrator's existing [is_permanent] → [needs_intervention] escalation applies without new state. Property tests (PPP-1..11) cover totality, branch-switch refusal, local-missing-remote refusal, zero-commits skip, initial-push, happy-path, worktree-missing pre-emption, determinism, exhaustive variant reachability, refusal-to-rejection permanence, and label bounds. PRC tests extend their existing variant scans for Local_state_unsafe. An integration test drives real-git fixtures for the branch-switched and local-missing-remote refusal arms plus the happy-path force-push, asserting nothing reaches the remote when the planner refuses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Address PR 318 review comments * Fix local-missing-remote integration fixture * Distinguish refused worktree setup * Route repo-root worktree refusal permanently --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
A patch worktree entered a 54-cycle tight retry loop (
Orchestrator.Start → Session_push_failed → Startevery few seconds) because GitHub refused every push and onton had no way to tell a transient race apart from a permanent block. Root cause: the patch modified.github/workflows/ci.ymlunder an OAuth token without theworkflowscope, and onton's managed clone always used HTTPS+OAuth, structurally forcing the rejection.This PR addresses the incident at two levels:
Push_reject_classifyturns the porcelain!reject into a named reason (workflow scope / branch protection / push pattern / lease / hook). Newpush_failure_countcounter onPatch_agentflipsneeds_interventionat ≥3; a permanent reason short-circuits straight toGiven_up. Activity log now reports the real reason and a follow-up detail line on permanent failures.Managed_repo.probe_ssh_availableruns a non-interactivegit ls-remote -h git@github.com:<owner>/<repo>.gitwithGIT_SSH_COMMAND="ssh -o BatchMode=yes -o ConnectTimeout=5". Exit 0 means SSH can authenticate to GitHub and reach this repo — the same condition SSH push needs. SSH wins on success (bypassing per-OAuth-scope restrictions entirely); HTTPS is the fallback. The choice is announced on stderr and persisted toconfig.json, so the probe only runs on first invocation. A--clone-scheme=https|sshCLI flag forces a transport and skips the probe.Plus three adjacent correctness fixes:
Reconciler.detect_main_freshness_drift+ poller wiring: whenorigin/mainadvances past local main (e.g. a sibling patch squash-merges), every agent on main gets aRebaseenqueued. Without this,detect_stale_basessilently passes because both sides read the same stale branch name. The main-freshness check runs in its own fiber.Patch_agent.branch_rebased_onto_sha+ pureRebase_decision.upstream+ wiring inWorktree.rebase_onto: whenfind_old_basefalls back to plain rebase, use the SHA the previous base resolved to at the last rebase as the--ontoanchor instead. Drops commits whose content was absorbed into a squash-merge on origin.Env.fetch_mutexacross runner + poller fibers, so concurrentgit fetchcalls can't race on the ref lock.Plus authentication docs in
AGENTS.mdandREADME.md.What changed (by commit)
Push-failure loop guard: classify rejections and escalate the permanent onesAuto-detect SSH transport for managed clones from sibling user clones(superseded by CodeRabbit Generated Unit Tests: Add unit tests for changes #12 — see below)OAuth scope pre-flight: warn at startup when the configured token lacks a scope a patch in the gameplan will need(superseded by [onton-port] Patch 15: Terminal UI #11 — see below)Document the authentication model and SSH transport optionUse the recorded prev-base SHA as the rebase --onto anchorWire scope_preflight and persist the resolved transport scheme(scope_preflight wiring superseded by [onton-port] Patch 15: Terminal UI #11; persistence kept)Poller: detect when origin/main advances and enqueue rebasesAdd --clone-scheme=https|ssh flag to override the SSH auto-detectShare fetch_mutex globally across runner + poller fibersRun main-freshness check in its own fiberAddress PR 315 review comments(also drops the startup OAuth-scope preflight — see below)Replace sibling-clone heuristic with an SSH reachability probeNotes for reviewers
/userscope probe at startup. After review, that was dropped (commit [onton-port] Patch 15: Terminal UI #11): the real fix for the workflow-scope failure is having a working SSH transport, and the preflight added complexity without a documented in-the-wild failure mode where SSH wasn't the right answer.Property tests
New pure modules each ship with their own QCheck2 suite:
Push_reject_classify: 14 PRC-* (totality, recognizers per fingerprint, permanence ↔ variant, label / excerpt bounds)Rebase_decision: 5 RD-* (totality, Some-wins, None-falls-back, empty-SHA-falls-back, idempotence on well-formed input)Github_target(SSH/scheme): 32 total — SSHclone_urlround-trip,scheme_of_urltotality + recognizers,resolve_scheme ~override ~ssh_available(override wins, true→Ssh, false→Https, total)Reconciler(main-freshness): 5 MFD-* (totality, equal-SHA silent, distinct-SHA fires, empty-side silent, whitespace stripping)Extensions to existing suites:
test_properties.ml— PSF-4..8 (push_failure_count increment / reset / threshold, permanent-rejection immediate Given_up, transient-rejection non-escalation)test_interleaving_properties.ml— CV-6 (3-iteration convergence without Human-in-queue), CV-7 (1-iteration convergence on permanent rejection)All tests pass:
dune buildclean,dune runtestexit 0,dune fmtclean.Test plan
.github/workflows/*.yml, run onton with an OAuth token lackingworkflowscope, on a machine without SSH access to the repo. Expected: HTTPS clone (probe falls back), push fails with activity-log linerunner: push rejected after session — workflow_scope_missing+ follow-up detail;push_failure_countflips toGiven_upon the first event; agent appears in TUI intervention list.onton: SSH probe to git@github.com succeeded — cloning managed repo via SSH;config.jsonrecordsurl_scheme: "ssh"; the same workflow-touching patch pushes successfully (SSH bypasses OAuth scopes).onton: SSH probe to git@github.com unavailable — cloning managed repo via HTTPS;config.jsonrecordsurl_scheme: "https"; first run takes ~5s longer (probe timeout) but subsequent runs skip the probe.--clone-scheme=httpsand--clone-scheme=sshoverride the probe; persistedconfig.jsonvalue also acts as override on subsequent runs.branch_rebased_onto_shawas recorded at the previous rebase.origin/mainadvance triggers rebases: with two patches based on main, squash-merge one externally, watch the poller's next tick logorigin/main advanced to <sha>and enqueue a Rebase for the other.🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Documentation