Skip to content

Push-rejection loop guard + SSH transport auto-detect#315

Merged
subsetpark merged 13 commits into
mainfrom
push-reject-classify-and-ssh-auto-detect
May 24, 2026
Merged

Push-rejection loop guard + SSH transport auto-detect#315
subsetpark merged 13 commits into
mainfrom
push-reject-classify-and-ssh-auto-detect

Conversation

@subsetpark
Copy link
Copy Markdown
Collaborator

@subsetpark subsetpark commented May 23, 2026

Summary

A patch worktree entered a 54-cycle tight retry loop (Orchestrator.Start → Session_push_failed → Start every few seconds) because GitHub refused every push and onton had no way to tell a transient race apart from a permanent block. Root cause: the patch modified .github/workflows/ci.yml under an OAuth token without the workflow scope, and onton's managed clone always used HTTPS+OAuth, structurally forcing the rejection.

This PR addresses the incident at two levels:

  1. Classify push rejections and stop looping on permanent ones. New pure module Push_reject_classify turns the porcelain ! reject into a named reason (workflow scope / branch protection / push pattern / lease / hook). New push_failure_count counter on Patch_agent flips needs_intervention at ≥3; a permanent reason short-circuits straight to Given_up. Activity log now reports the real reason and a follow-up detail line on permanent failures.
  2. Auto-detect SSH transport via a reachability probe. Managed_repo.probe_ssh_available runs a non-interactive git ls-remote -h git@github.com:<owner>/<repo>.git with GIT_SSH_COMMAND="ssh -o BatchMode=yes -o ConnectTimeout=5". Exit 0 means SSH can authenticate to GitHub and reach this repo — the same condition SSH push needs. SSH wins on success (bypassing per-OAuth-scope restrictions entirely); HTTPS is the fallback. The choice is announced on stderr and persisted to config.json, so the probe only runs on first invocation. A --clone-scheme=https|ssh CLI flag forces a transport and skips the probe.

Plus three adjacent correctness fixes:

  • Reconciler.detect_main_freshness_drift + poller wiring: when origin/main advances past local main (e.g. a sibling patch squash-merges), every agent on main gets a Rebase enqueued. Without this, detect_stale_bases silently passes because both sides read the same stale branch name. The main-freshness check runs in its own fiber.
  • Patch_agent.branch_rebased_onto_sha + pure Rebase_decision.upstream + wiring in Worktree.rebase_onto: when find_old_base falls back to plain rebase, use the SHA the previous base resolved to at the last rebase as the --onto anchor instead. Drops commits whose content was absorbed into a squash-merge on origin.
  • Shared Env.fetch_mutex across runner + poller fibers, so concurrent git fetch calls can't race on the ref lock.

Plus authentication docs in AGENTS.md and README.md.

What changed (by commit)

  1. Push-failure loop guard: classify rejections and escalate the permanent ones
  2. Auto-detect SSH transport for managed clones from sibling user clones (superseded by CodeRabbit Generated Unit Tests: Add unit tests for changes #12 — see below)
  3. OAuth scope pre-flight: warn at startup when the configured token lacks a scope a patch in the gameplan will need (superseded by [onton-port] Patch 15: Terminal UI #11 — see below)
  4. Document the authentication model and SSH transport option
  5. Use the recorded prev-base SHA as the rebase --onto anchor
  6. Wire scope_preflight and persist the resolved transport scheme (scope_preflight wiring superseded by [onton-port] Patch 15: Terminal UI #11; persistence kept)
  7. Poller: detect when origin/main advances and enqueue rebases
  8. Add --clone-scheme=https|ssh flag to override the SSH auto-detect
  9. Share fetch_mutex globally across runner + poller fibers
  10. Run main-freshness check in its own fiber
  11. Address PR 315 review comments (also drops the startup OAuth-scope preflight — see below)
  12. Replace sibling-clone heuristic with an SSH reachability probe

Notes for reviewers

  • OAuth scope preflight removed. An earlier revision of this PR included a /user scope probe at startup. After review, that was dropped (commit [onton-port] Patch 15: Terminal UI #11): the real fix for the workflow-scope failure is having a working SSH transport, and the preflight added complexity without a documented in-the-wild failure mode where SSH wasn't the right answer.
  • Sibling-clone scanner replaced. The first SSH auto-detect implementation ([onton-port] Patch 1: Core types #2) scanned conventional directories for an existing user clone and inherited its transport. That was brittle (only worked when a sibling happened to exist) and unreliable (a sibling using SSH didn't guarantee SSH worked on this machine for this repo). Commit CodeRabbit Generated Unit Tests: Add unit tests for changes #12 replaces it with a real reachability probe.

Property tests

New pure modules each ship with their own QCheck2 suite:

  • Push_reject_classify: 14 PRC-* (totality, recognizers per fingerprint, permanence ↔ variant, label / excerpt bounds)
  • Rebase_decision: 5 RD-* (totality, Some-wins, None-falls-back, empty-SHA-falls-back, idempotence on well-formed input)
  • Github_target (SSH/scheme): 32 total — SSH clone_url round-trip, scheme_of_url totality + recognizers, resolve_scheme ~override ~ssh_available (override wins, true→Ssh, false→Https, total)
  • Reconciler (main-freshness): 5 MFD-* (totality, equal-SHA silent, distinct-SHA fires, empty-side silent, whitespace stripping)

Extensions to existing suites:

  • test_properties.ml — PSF-4..8 (push_failure_count increment / reset / threshold, permanent-rejection immediate Given_up, transient-rejection non-escalation)
  • test_interleaving_properties.ml — CV-6 (3-iteration convergence without Human-in-queue), CV-7 (1-iteration convergence on permanent rejection)

All tests pass: dune build clean, dune runtest exit 0, dune fmt clean.

Test plan

  • Re-trigger the incident: in a fresh project where the gameplan declares a patch that touches .github/workflows/*.yml, run onton with an OAuth token lacking workflow scope, on a machine without SSH access to the repo. Expected: HTTPS clone (probe falls back), push fails with activity-log line runner: push rejected after session — workflow_scope_missing + follow-up detail; push_failure_count flips to Given_up on the first event; agent appears in TUI intervention list.
  • SSH probe — happy path: on a machine with an SSH key authorized for the repo, start onton fresh. Expected: stderr onton: SSH probe to git@github.com succeeded — cloning managed repo via SSH; config.json records url_scheme: "ssh"; the same workflow-touching patch pushes successfully (SSH bypasses OAuth scopes).
  • SSH probe — fallback: on a machine without SSH set up, start onton fresh. Expected: stderr onton: SSH probe to git@github.com unavailable — cloning managed repo via HTTPS; config.json records url_scheme: "https"; first run takes ~5s longer (probe timeout) but subsequent runs skip the probe.
  • --clone-scheme=https and --clone-scheme=ssh override the probe; persisted config.json value also acts as override on subsequent runs.
  • Squash-merge a dep on origin: verify the dependent rebase replays only the dependent's own commits (not the merged dep's originals) when branch_rebased_onto_sha was recorded at the previous rebase.
  • Confirm origin/main advance triggers rebases: with two patches based on main, squash-merge one externally, watch the poller's next tick log origin/main advanced to <sha> and enqueue a Rebase for the other.

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Added SSH transport support for managed repository clones with automatic detection and persistence across runs.
    • Implemented push rejection classification to better identify and handle server-side failures.
    • Added automatic main branch freshness detection to trigger rebases when upstream advances.
    • Enhanced push failure tracking to escalate intervention after repeated failures.
  • Documentation

    • Added comprehensive authentication guide covering HTTPS+OAuth and SSH transport options.
    • Documented SSH configuration and credential handling for git operations.

subsetpark and others added 7 commits May 23, 2026 15:15
…nt ones

Background: a patch worktree entered a tight retry loop — 54 cycles of
Orchestrator.Start → Session_push_failed → Start in 3.5 minutes — because
the remote refused every push and onton had no way to tell a transient
race apart from a permanent block, so no counter ever fired
needs_intervention.

This change adds pure decision logic to (1) classify the porcelain '!'
rejection from stderr into a small set of named reasons (workflow-scope,
branch-protection, push-pattern, lease, hook) and (2) drive a new
push_failure_count counter that flips needs_intervention at ≥3, with an
immediate short-circuit to session_fallback = Given_up when the
rejection is permanent (workflow-scope etc.). The full reject reason
also surfaces in the activity log.

Adjacent groundwork while the patch_agent / persistence / orchestrator
files were already open: a branch_rebased_onto_sha field and a pure
Rebase_decision.upstream helper, so a future worktree.ml integration can
use the recorded old-base SHA as the rebase --onto anchor and drop
stale-dep commits when a transitive dep squash-merges. Plus a tiny
detect_main_freshness_drift predicate in Reconciler for the matching
poller integration. Both are pure-side only; the effectful wiring lands
in follow-up commits.

New pure modules under lib_core/ (each with property tests):
  - push_reject_classify  (14 PRC-* properties)
  - oauth_scopes          (10 OSC-* properties — used by P0-E later)
  - rebase_decision       (5 RD-* properties)

Patch_agent: push_failure_count (threshold ≥3) and branch_rebased_onto_sha
fields, increment_/reset_ helpers, threshold disjunct in
needs_intervention, bulk reset in reset_intervention_state, restore
parameter + persistence round-trip.

Orchestrator: Session_push_failed now carries
Push_reject_classify.rejection option (Some for server rejections, None
for transport-level Push_error); apply_session_result increments the
counter and, on is_permanent reasons, jumps straight to Given_up.

Session_driver / runner_fiber_impl: activity-log line uses
short_label, with a follow-up event carrying the captured stderr line
on permanent rejections.

Properties added or extended: PSF-4..8 (counter + permanent escalation),
CV-6 (3-iteration convergence without Human), CV-7 (immediate Given_up
on permanent reasons), MFD-1..5 (main-freshness predicate totality and
boundaries).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Onton's managed clone at ~/.local/share/onton/<project>/repo was always
HTTPS so the GIT_ASKPASS shim in lib/git_env.ml could inject the OAuth
token. That structurally forced every push through the per-OAuth-scope
restrictions GitHub enforces (workflow scope, etc.), even when the
user's own clone of the same repo used SSH and would not have hit
those restrictions.

This commit makes the managed clone inherit the user's transport.
Before cloning, scan a small set of conventional locations
($PWD/.., ~/code-src/, ~/src/, ~/code/, ~/dev/, ~/projects/) for a
sibling clone of owner/repo. If any sibling's origin is SSH, pick
git@github.com:owner/repo.git for the managed clone; otherwise default
to https://github.com/owner/repo.git as before. The resolved transport
is announced on stderr at first run and persisted to config.json so
subsequent invocations are stable. An explicit override (intended for a
future --clone-scheme CLI flag, or anyone editing config.json) wins
over auto-detect.

When a managed clone already exists on disk, its current origin URL is
authoritative — we don't flip transports mid-life.

Pure changes (lib_core/github_target):
  - new url_scheme variant (Https | Ssh) with yojson + show + eq
  - clone_url is now scheme-aware
  - new scheme_of_url parser and pure resolve_scheme decision
  - 9 SSH/scheme property tests added (32 total in the suite)

Effectful changes (lib/managed_repo):
  - sibling_clone_candidates / read_remote_urls /
    discover_sibling_clones (best-effort, total on failure)
  - resolve_clone_scheme with the one-line stderr announce
  - ensure_managed_repo now returns (repo_root, resolved_scheme); the
    caller persists the resolved value to config.json

Schema:
  - Project_store.stored_config gains url_scheme : string option
    (None on legacy configs; back-filled on next ensure_managed_repo)
  - save_config gains an optional ~url_scheme parameter

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
a scope a patch in the gameplan will need

When a patch modifies .github/workflows/*.yml under a token without the
workflow OAuth scope, GitHub silently refuses the push with a clear
remote-rejected message. With the push-reject classifier in place we
now route those failures straight to needs_intervention, but the user
finds out at first push attempt — after the agent has already written
the change. This commit adds a startup check that hits /user with the
configured token, reads X-OAuth-Scopes, and compares against the
scopes the gameplan files would require. Missing scopes emit a clear
stderr warning with the suggested gh auth refresh fix.

The check is gated on the resolved managed-clone transport: SSH
doesn't consult OAuth scopes, so the warning is a false positive
there and we skip it entirely.

Github.fetch_oauth_scopes_for is a thin wrapper around the existing
cohttp-eio plumbing that returns the parsed scope list, plus a public
re-export from github.mli that mirrors make's argument shape so the
preflight handler doesn't need to thread a Github.t.

The pure Oauth_scopes module shipped in the previous commit; this is
the effectful side. Scope_preflight.run is intended to be called from
bin/main.ml after Managed_repo.ensure_managed_repo returns the
resolved transport scheme and before any patch sessions start; that
wiring is not part of this commit (bin/main.ml's Cmdliner dependency
is currently broken in the dev env), but the public API is in place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an Authentication section to AGENTS.md covering the HTTPS+OAuth
default, the workflow-scope requirement, the SSH auto-detect heuristic
inherited from a sibling user clone, and the gh auth setup-git tip
for running git by hand inside a worktree. Mirrors the same content
into README.md as a sub-section under the existing GitHub
authentication discussion, with the additional context that SSH
bypasses per-OAuth-scope restrictions entirely (which is the actual
escape hatch for users who hit workflow-scope errors and don't want
to gh auth refresh).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the [branch_rebased_onto_sha] field added earlier into the actual
rebase machinery. Two halves:

(1) [Worktree.rebase_onto] now takes [?prev_base_sha:string option] and
threads it into the fallback path. The default branch (find_old_base
returns Ok with cherry-pick-isolated commits) is unchanged. When
find_old_base returns Error AND the orchestrator recorded a SHA the
previous base resolved to at the last rebase / start, the supervisor
now runs [git rebase --onto target sha] instead of the legacy plain
[git rebase target]. That drops commits whose content was absorbed
into a squash-merge on origin/main (the patch-6 case) — only the
patch's own commits in [sha..HEAD] are replayed.

When [prev_base_sha] is [None] the behavior is bit-for-bit the legacy
plain rebase, so agents whose [branch_rebased_onto_sha] was never
recorded keep working exactly as before. An optional argument terminator
[()] is added because OCaml can't otherwise erase the optional;
mechanical update at every call site.

(2) After a successful rebase the runner fiber reads
[origin/<new_base>]'s resolved SHA via the new [W.read_branch_sha]
helper, and stores it on the agent via
[Orchestrator.set_branch_rebased_onto_sha]. The next rebase therefore
has the right anchor.

[Worktree.S] gains [read_branch_sha] and the dependency-injection test
stub gains a [None] implementation. The integration tests in
[test/test_rebase_onto.ml] all keep passing — the new code path is
inactive (prev_base_sha defaults to None there).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Threads the [url_scheme] resolved by [Managed_repo.ensure_managed_repo]
through [repo_coords] / [finalize_run] into [Project_store.save_config]
so subsequent runs read the same value from [config.json] rather than
re-detecting from sibling clones (which can move). The resume path
reads the persisted value via [Managed_repo.url_scheme_of_string].

At the top of [Eio_main.run], invokes [Scope_preflight.run] with the
network + clock capabilities, the resolved scheme, and the gameplan.
The scheme is re-read from [config.json] at the Eio entry rather than
plumbed through [Resolved_config.t] — the persistence happened inside
[finalize_run] which ran before [Eio_main.run], so the file is
already up-to-date by the time the preflight needs it. Gating on
[Github_target.Https] inside [Scope_preflight.run] means SSH clones
skip the preflight (the warning would be a false positive there).

Effect for users: on first push attempt that would have been rejected
with [refusing to allow an OAuth App to create or update workflow …],
they now see a clear stderr warning at startup with the suggested
[gh auth refresh -h github.com -s workflow] fix. The push-time
classifier still routes the failure to needs_intervention if they
proceed regardless.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pure [Reconciler.detect_main_freshness_drift] predicate shipped
earlier compares local-main-sha to origin-main-sha. This commit wires
it into the poller loop so that when a squash-merge lands on
origin/main, every agent currently based on main gets a [Rebase]
enqueued automatically — without it, [detect_stale_bases] passes
because [base_branch] and [Graph.initial_base] both read the same
stale branch name and never disagree.

Per loop tick (before the per-patch poll fan-out):

  1. [W.fetch_origin] on the managed repo root.
  2. Read [main] and [origin/main] SHAs via [W.read_branch_sha].
  3. If [detect_main_freshness_drift] fires AND we haven't already
     announced this particular [origin_sha] (a [last_main_sha] ref
     prevents log + enqueue spam when the next loop iteration sees the
     same drift), log the transition and enqueue Rebase for every
     agent on main that has a PR, isn't merged, and doesn't already
     have a Rebase queued.

[W.fetch_origin] races with per-worktree fetches in the runner fiber on
shared ref storage — pre-existing condition I'm not fixing here.
Best-effort: any "cannot lock ref" defers this check by one
[poll_interval], and the rebase plan executor's [Fetch_origin] step
still runs from the worktree side so the rebase target ends up
fresh.

The fast-forward step from the plan (advance local main to match
origin) is intentionally not done here: rebases use
[origin/<main>] directly via [Worktree_plan.for_rebase], so once the
poller wakes the agents, the rebase targets are correct regardless of
what local main resolves to in the managed-clone HEAD.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
onton Ready Ready Preview, Comment May 24, 2026 3:58pm

Request Review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 23, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e37c1021-c9b8-4edf-8548-32421c8a8ec1

📥 Commits

Reviewing files that changed from the base of the PR and between 3ce016b and 2173002.

📒 Files selected for processing (7)
  • bin/main.ml
  • lib/managed_repo.ml
  • lib/managed_repo.mli
  • lib/poller_fiber.ml
  • lib_core/github_target.ml
  • lib_core/github_target.mli
  • test/test_github_target_properties.ml

📝 Walkthrough

Walkthrough

Adds push-rejection classification, scheme-aware managed clones (HTTPS/SSH) with persistence and CLI override, per-patch push-failure tracking and rebase-SHA anchoring, a shared fetch mutex for serialized git fetches, main-branch freshness detection, and corresponding rebase/push flow and tests updates.

Changes

Push Rejection Classification & Tests

Layer / File(s) Summary
Push rejection classifier and tests
lib_core/push_reject_classify.{mli,ml}, test/test_push_reject_classify_properties.ml
New rejection variant and helpers; classify maps git stderr to typed rejection; short_label, detail_excerpt, is_permanent; property tests for totality, correctness, and determinism.

GitHub target & Managed clone scheme

Layer / File(s) Summary
GitHub target scheme helpers
lib_core/github_target.{mli,ml}, test/test_github_target_properties.ml
Add url_scheme (Https/Ssh), clone_url ~scheme, scheme_of_url, and resolve_scheme; tests for formatting, round-trip, and override/availability behavior.
Managed repo scheme detection & CLI persistence
lib/managed_repo.{mli,ml}, bin/main.ml, lib/project_store.{ml,mli}
Make managed clones scheme-aware, probe SSH when needed, ensure_managed_repo accepts optional clone_scheme and returns resolved scheme; add --clone-scheme CLI and persist url_scheme in project config.

Fetch mutex and environment wiring

Layer / File(s) Summary
Fetch mutex infrastructure
lib/run_env.{ml,mli}, bin/main.ml, lib/runner_fiber_impl.ml, test/test_dependency_injection_functors.ml
Introduce fetch_mutex : Eio.Mutex.t in run env and FIBER_ENV, create mutex in capabilities, and wire it into runner/poller/tui envs to serialize git fetch across worktrees.

Patch agent state and persistence

Layer / File(s) Summary
Patch agent push-failure & rebase-SHA
lib_core/patch_agent.{ml,mli}, lib/persistence.ml
Adds push_failure_count and branch_rebased_onto_sha fields, init/restore/persist them, add increment/reset helpers, and include in snapshots and tests.

Worktree rebase & push integration

Layer / File(s) Summary
Worktree rebase & parser changes
lib/worktree.{ml,mli}, lib_core/worktree_parser.ml, lib/worktree_plan_executor.ml
Add read_branch_sha; thread optional ?prev_base_sha into rebase_onto and choose upstream via Rebase_decision.upstream; Push_rejected now carries a rejection payload and parser calls classifier.
Orchestrator session escalation & logging
lib/orchestrator.{ml,mli}, lib/session_driver.ml
Session_push_failed now carries Push_reject_classify.rejection option; apply_session_result updates push-failure counters and forces escalation for permanent rejections; combine_session_and_push maps push outcomes into parameterized session failures; session driver logs short_label and excerpts.
Runner post-rebase capture
lib/runner_fiber_impl.ml, lib/worktree_plan_executor.ml
Runner reads origin/<base> after successful rebase to capture post_rebase_sha and calls Orchestrator.set_branch_rebased_onto_sha; logging updated to use classified labels/excerpts.

Main freshness detection & tests

Layer / File(s) Summary
Main freshness detection
lib_core/reconciler.{ml,mli}, lib/poller_fiber.ml
Add detect_main_freshness_drift predicate; freshness_loop fetches origin/<main> under fetch_mutex, detects drift, logs advancement, and enqueues rebases for eligible agents.
Tests, fixtures, and docs
test/*, AGENTS.md, README.md
Add/adjust many property tests and test fixtures for new types/fields and parameterized session failures; update AGENTS.md and README with Authentication / SSH transport docs and guidance.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • flowglad/onton#296: Prior work on fiber env wiring that this change extends by threading fetch_mutex through FIBER_ENV and Make_fibers.
  • flowglad/onton#307: Related runner env changes; both PRs touch fetch-mutex/resolver wiring between runner and environment.

"A rabbit's note on code that fetches and sings:
Schemes choose their path — SSH or HTTPS wings.
Pushes now classified, counters kept tight,
Mutex guards the fetches through day and night.
Tests hop in chorus: all green, take flight!"

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch push-reject-classify-and-ssh-auto-detect

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@flowglad-review-service flowglad-review-service Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: 3 comments

Comment thread lib_core/push_reject_classify.ml
Comment thread lib/orchestrator.ml Outdated
Comment thread lib/poller_fiber.ml
@flowglad-review-service
Copy link
Copy Markdown

Review Summary

The PR is well-structured with thorough property coverage for all new pure modules and clean separation of pure/effectful code. Three findings:

  1. must-fix (push_reject_classify.ml line 75): The Push_pattern_block recognizer matches "Repository rule violations found", which also appears verbatim in the canonical workflow_scope_missing stderr fixture (the GH013 header). Correctness depends on Workflow_scope_missing being checked first. A reorder would silently produce the wrong classification and suppress the gh auth refresh hint. Adding an explicit ordering comment or a targeted property that would falsify on reorder would lock in the invariant.

  2. must-fix (orchestrator.ml line 713): For permanent rejections, push_failure_count is incremented to 1 before the Given_up escalation and is not reset during that escalation. needs_intervention is safe because it short-circuits on Given_up, and reset_intervention_state correctly zeros the counter. However, the counter's semantics ("consecutive transient failures") are muddied by incrementing it on a permanent rejection. The simpler fix is to skip the increment in the permanent-rejection arm — the counter's purpose is to break transient retry loops, and permanent rejections break the loop via a different mechanism.

  3. should-fix (poller_fiber.ml line 173): The fetch_lock created locally inside the poller fiber never interacts with the runner fiber's separate fetch mutex. Since check_main_freshness is called serially inside loop (), the local lock serializes nothing. The comment acknowledges the inter-fiber race is best-effort, but the lock implies a guard that isn't actually there. Remove it or wire in the shared mutex.

Severity Count
Must-fix 2
Should-fix 1
Note 0

Variant: convergence-v2

Candidates: 3 | Posted: 3 | Suppressed: 0


3 comments posted · Model: claude-sonnet-4-6 (sonnet) · Tokens: 67719 in / 6295 out · Cache: 321788 created / 1281993 read · Context: 5 items · Review mode: agentic · Turns: 15 · Tool calls: 24 · Forced submit: yes · Tool mix: read_file=21, search_codebase=3

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
lib/github.ml (1)

535-596: 💤 Low value

Consider extracting common HTTP client setup (optional refactor).

The TLS config and client initialization (lines 552-570) duplicate the same logic in the request function (lines 473-494). While the duplication is localized and the functions have different purposes (body vs header extraction), extracting a shared make_client helper would reduce maintenance burden if the setup ever needs updating.

Not blocking since the duplication is manageable and the use case is specialized.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/github.ml` around lines 535 - 596, The TLS config and Cohttp_eio client
creation are duplicated between fetch_oauth_scopes and the request function;
extract a shared helper (e.g. make_client) that calls https_config(), builds the
https_fun wrapper and returns a Cohttp_eio.Client.t (or a tuple of client and
tls_config if needed), then replace the inline TLS/client setup in
fetch_oauth_scopes and in request with calls to make_client so both reuse the
same initialization path (refer to functions fetch_oauth_scopes, request,
https_config, and https_fun to implement and wire the helper).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib_core/oauth_scopes.ml`:
- Around line 27-35: required_for_files currently returns [Workflow] when
workflows are touched and [] otherwise, omitting the mandatory Repo scope;
update the function required_for_files to always include the Repo scope and add
Workflow only when paths touch .github/workflows (i.e., return [Repo; Workflow]
when touches_workflows is true, otherwise return [Repo]), referencing the
Workflow and Repo variants to locate the change.

In `@test/test_github_target_properties.ml`:
- Around line 409-417: Remove the unnecessary try...with wrappers around calls
to pure/total functions in the QCheck properties: specifically edit
prop_scheme_of_url_total (and the similar property at lines ~457-474) to stop
catching exceptions and instead directly call Github_target.scheme_of_url (and
the other pure function) in the property body, returning the boolean result
(e.g., call the function, ignore its result, and return true) so that any
unexpected raises surface as test failures rather than being swallowed.

---

Nitpick comments:
In `@lib/github.ml`:
- Around line 535-596: The TLS config and Cohttp_eio client creation are
duplicated between fetch_oauth_scopes and the request function; extract a shared
helper (e.g. make_client) that calls https_config(), builds the https_fun
wrapper and returns a Cohttp_eio.Client.t (or a tuple of client and tls_config
if needed), then replace the inline TLS/client setup in fetch_oauth_scopes and
in request with calls to make_client so both reuse the same initialization path
(refer to functions fetch_oauth_scopes, request, https_config, and https_fun to
implement and wire the helper).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7d61a156-3352-408d-bdad-8d5f5ac14db1

📥 Commits

Reviewing files that changed from the base of the PR and between 65ddc81 and d01c310.

📒 Files selected for processing (49)
  • AGENTS.md
  • README.md
  • bin/main.ml
  • lib/event_log.ml
  • lib/github.ml
  • lib/github.mli
  • lib/managed_repo.ml
  • lib/managed_repo.mli
  • lib/orchestrator.ml
  • lib/orchestrator.mli
  • lib/persistence.ml
  • lib/poller_fiber.ml
  • lib/project_store.ml
  • lib/project_store.mli
  • lib/runner_fiber_impl.ml
  • lib/scope_preflight.ml
  • lib/scope_preflight.mli
  • lib/session_driver.ml
  • lib/worktree.ml
  • lib/worktree.mli
  • lib/worktree_plan_executor.ml
  • lib_core/github_target.ml
  • lib_core/github_target.mli
  • lib_core/oauth_scopes.ml
  • lib_core/oauth_scopes.mli
  • lib_core/patch_agent.ml
  • lib_core/patch_agent.mli
  • lib_core/push_reject_classify.ml
  • lib_core/push_reject_classify.mli
  • lib_core/rebase_decision.ml
  • lib_core/rebase_decision.mli
  • lib_core/reconciler.ml
  • lib_core/reconciler.mli
  • lib_core/worktree_parser.ml
  • lib_test/test_generators.ml
  • test/dune
  • test/test_dependency_injection_functors.ml
  • test/test_github_target_properties.ml
  • test/test_interleaving_properties.ml
  • test/test_oauth_scopes_properties.ml
  • test/test_orchestrator_properties.ml
  • test/test_patch_agent.ml
  • test/test_patch_controller.ml
  • test/test_poll_log_properties.ml
  • test/test_properties.ml
  • test/test_push_reject_classify_properties.ml
  • test/test_rebase_decision_properties.ml
  • test/test_rebase_onto.ml
  • test/test_reconciler_properties.ml

Comment thread lib_core/oauth_scopes.ml Outdated
Comment thread test/test_github_target_properties.ml Outdated
subsetpark and others added 3 commits May 23, 2026 15:58
The SSH auto-detect heuristic from the previous commit picks a
transport from the surrounding filesystem; sometimes the user wants to
force one explicitly (e.g. running onton in a context where the
sibling clone is misleading, or pinning HTTPS in a script). This
commit exposes a CLI flag.

Precedence on each run:
  CLI --clone-scheme > persisted config.json url_scheme > auto-detect

Both the fresh / gameplan path and the resume path honor the
override. Invalid values (anything other than [https] or [ssh]) exit
with a clear error. After [ensure_managed_repo] returns the resolved
scheme, [finalize_run] still persists it to config.json so the
override only needs to be passed once.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before: the runner fiber created its own fetch_mutex (per-fiber lifetime)
and the poller's main-freshness fetch added on the previous commit also
created its own. Both serialize their own internal calls but the two
mutexes don't see each other — two simultaneous [git fetch origin] on
the same managed repo (one from a runner action, one from the poller
tick) race on the shared [refs/remotes/origin/*] ref store and the
loser gets "cannot lock ref ...: is at X but expected Y". Best-effort
tolerance was added in the previous commit; this commit eliminates
the race entirely.

Promotes [fetch_mutex] into the [Run_env.S] module type so every fiber
that includes Run_env (Runner, Poller, Tui, the per-fiber sub-envs for
Worktree_setup / Session_driver) sees the same mutex. The mutex is
created once at the top of [build_capabilities] in [bin/main.ml], stored
on [constructed_capabilities], and threaded through [FIBER_ENV] to
every sub-fiber env. Runner_fiber_impl and Poller_fiber both drop their
local creation and read [Env.fetch_mutex] instead.

The dependency-injection functor test stubs gain a matching field
(three real mutexes + one [Obj.magic ()] for a stub that pre-existed
that pattern).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before: the check ran at the top of every poll loop iteration,
blocking the per-patch poll fan-out behind a [git fetch] for
origin/<main>. A slow fetch (e.g. cold connection, large repo, network
hiccup) delayed every patch's GitHub poll by the same amount.

This commit splits the poller into two parallel fibers via
[Eio.Fiber.both]:

  - the existing patch-poll loop (PR state, CI, base updates, etc.)
  - a new freshness_loop that runs [check_main_freshness] then sleeps
    [poll_interval]

The two share state through [Runtime.update_orchestrator] (used inside
[check_main_freshness] to enqueue Rebase on detected drift) and serialize
their [git fetch] calls through [Env.fetch_mutex] (the shared mutex
added in the previous commit), so there's no ref-store race.

Cadence for the freshness loop is the same [poll_interval] as the
patch loop; could be tuned independently later (probably slower, since
main usually moves less often than per-patch state) but the symmetry
keeps the operational model simple for now.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@flowglad-review-service
Copy link
Copy Markdown

Review Summary

This delta push wires together the previously reviewed pure modules (push-reject classification, SSH transport, OAuth scope preflight, rebase-decision, main-freshness drift) into the effectful fiber layer. The code is generally correct and the property-test coverage is comprehensive.

One pre-existing note: the previously-filed outstanding finding at lib/poller_fiber.ml:173 (the fetch lock) was actually resolved by this push — fetch_mutex is now correctly threaded as a shared mutex via Env.fetch_mutex rather than a per-function local. That is an improvement.

The remaining two previously-filed outstanding comments (lib/orchestrator.ml:713 on the push_failure_count increment before permanent-rejection escalation, and lib_core/push_reject_classify.ml:75 on the recognizer ordering dependency) were not addressed in this push and remain live on the PR thread.

No new must-fix or should-fix issues were found in the changed lines of this push.

Severity Count
Must-fix 0
Should-fix 0
Note 0

Variant: convergence-v2

Candidates: 0 | Posted: 0 | Suppressed: 0


0 comments posted · Model: claude-sonnet-4-6 (sonnet) · Tokens: 72578 in / 3029 out · Cache: 303324 created / 1336503 read · Context: 5 items · Review mode: agentic · Turns: 15 · Tool calls: 20 · Forced submit: yes · Tool mix: read_file=16, search_codebase=4

Comment thread lib/scope_preflight.ml Outdated
@flowglad-review-service
Copy link
Copy Markdown

Review Summary

Severity Count
Must-fix 0
Should-fix 0
Note 0

Variant: convergence-v2

Candidates: 0 | Posted: 0 | Suppressed: 0


0 comments posted · Model: claude-sonnet-4-6 (sonnet) · Tokens: 72581 in / 3418 out · Cache: 321118 created / 1313224 read · Context: 5 items · Review mode: agentic · Turns: 15 · Tool calls: 19 · Forced submit: yes · Tool mix: read_file=14, search_codebase=5

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
lib/runner_fiber_impl.ml (1)

909-917: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the recorded upstream SHA on noop rebases.

Worktree.Noop still means the branch is already aligned with origin/<base>. Leaving post_rebase_sha as None there keeps the old/empty branch_rebased_onto_sha, so the next prev_base_sha-driven rebase can fall back to the wrong --onto anchor after squash-merges.

Suggested fix
                         let post_rebase_sha =
                           match rebase_result with
-                          | Worktree.Ok ->
+                          | Worktree.Ok | Worktree.Noop ->
                               W.read_branch_sha ~path:wt_path
                                 ~ref_name:("origin/" ^ Branch.to_string new_base)
-                          | Worktree.Noop | Worktree.Conflict _
-                          | Worktree.Error _ ->
+                          | Worktree.Conflict _ | Worktree.Error _ ->
                               None
                         in

Also applies to: 943-947

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/runner_fiber_impl.ml` around lines 909 - 917, The pattern match that sets
post_rebase_sha skips recording the branch SHA when rebase_result is
Worktree.Noop, which causes branch_rebased_onto_sha to remain empty and later
prev_base_sha-driven rebases to use the wrong --onto anchor; update the match
for rebase_result in the post-rebase logic so that both Worktree.Ok and
Worktree.Noop call W.read_branch_sha ~path:wt_path ~ref_name:("origin/" ^
Branch.to_string new_base) (i.e., preserve the recorded upstream SHA on noop
rebases), and apply the same change to the second occurrence of this pattern
(the block referenced around lines 943-947) so branch_rebased_onto_sha is
consistently set after noop or ok results.
bin/main.ml (1)

1259-1265: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Teach PR-op stripping that --clone-scheme consumes the next argv token.

extract_pr_ops_from_argv currently treats the token after --clone-scheme as a positional argument. onton PROJECT --clone-scheme ssh +123 will fail before Cmdliner gets control.

Suggested fix
-      (* Keep this in sync with Cmdliner [opt] arguments below:
-         --gameplan, --token, --backend, --model, --repo, --main-branch,
-         --poll-interval, and --max-concurrency. *)
+      (* Keep this in sync with Cmdliner [opt] arguments below:
+         --gameplan, --token, --backend, --model, --repo, --main-branch,
+         --poll-interval, --max-concurrency, and --clone-scheme. *)
       match s with
       | "--gameplan" | "--token" | "--backend" | "--model" | "--repo"
-      | "--main-branch" | "--poll-interval" | "--max-concurrency" ->
+      | "--main-branch" | "--poll-interval" | "--max-concurrency"
+      | "--clone-scheme" ->
           true
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bin/main.ml` around lines 1259 - 1265, The PR-op extraction logic treats the
token after "--clone-scheme" as positional; update extract_pr_ops_from_argv to
mark "--clone-scheme" as an option that consumes the next argv token (similar to
"--gameplan"/"--token" handling) so the following value isn't mis-parsed as a PR
op; locate extract_pr_ops_from_argv and add "--clone-scheme" to the set of
recognized consuming options (or implement a branch that skips the next element
when encountering "--clone-scheme") so "onton PROJECT --clone-scheme ssh +123"
correctly keeps "ssh" as the clone scheme and "+123" as the PR op.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@bin/main.ml`:
- Around line 597-611: The resolved clone scheme returned by
Managed_repo.ensure_managed_repo is currently ignored so the code keeps
persisting the old stored Project_store.url_scheme; update the resume path to
capture the effective scheme returned by Managed_repo.ensure_managed_repo and
persist that value into repo_coords.url_scheme (and the stored Project_store)
instead of reusing stored_scheme or rebuild from old config; locate usages
around clone_scheme_override, stored_scheme, effective_scheme and ensure the
value produced by Managed_repo.ensure_managed_repo is threaded into subsequent
logic (including the OAuth preflight decision) and written back to persistent
store.

---

Outside diff comments:
In `@bin/main.ml`:
- Around line 1259-1265: The PR-op extraction logic treats the token after
"--clone-scheme" as positional; update extract_pr_ops_from_argv to mark
"--clone-scheme" as an option that consumes the next argv token (similar to
"--gameplan"/"--token" handling) so the following value isn't mis-parsed as a PR
op; locate extract_pr_ops_from_argv and add "--clone-scheme" to the set of
recognized consuming options (or implement a branch that skips the next element
when encountering "--clone-scheme") so "onton PROJECT --clone-scheme ssh +123"
correctly keeps "ssh" as the clone scheme and "+123" as the PR op.

In `@lib/runner_fiber_impl.ml`:
- Around line 909-917: The pattern match that sets post_rebase_sha skips
recording the branch SHA when rebase_result is Worktree.Noop, which causes
branch_rebased_onto_sha to remain empty and later prev_base_sha-driven rebases
to use the wrong --onto anchor; update the match for rebase_result in the
post-rebase logic so that both Worktree.Ok and Worktree.Noop call
W.read_branch_sha ~path:wt_path ~ref_name:("origin/" ^ Branch.to_string
new_base) (i.e., preserve the recorded upstream SHA on noop rebases), and apply
the same change to the second occurrence of this pattern (the block referenced
around lines 943-947) so branch_rebased_onto_sha is consistently set after noop
or ok results.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e2cfd56b-44f5-43fb-8985-7c058cf1c8d3

📥 Commits

Reviewing files that changed from the base of the PR and between d01c310 and a06aa1a.

📒 Files selected for processing (6)
  • bin/main.ml
  • lib/poller_fiber.ml
  • lib/run_env.ml
  • lib/run_env.mli
  • lib/runner_fiber_impl.ml
  • test/test_dependency_injection_functors.ml

Comment thread bin/main.ml Outdated
Copy link
Copy Markdown

@flowglad-review-service flowglad-review-service Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: 1 comment

Comment thread lib/poller_fiber.ml Outdated
@flowglad-review-service
Copy link
Copy Markdown

Review Summary

One new issue in this push: freshness_loop in lib/poller_fiber.ml calls check_main_freshness() immediately at startup before sleeping, causing a redundant git fetch at t=0 (right after ensure_managed_repo already fetched) and firing a freshness check before the main loop's first reconciler pass. The fetch_mutex prevents a data race but the eager first call is wasteful and the resulting Rebase enqueue is immediately superseded by the main loop. Adding an initial Eio.Time.sleep clock poll_interval before the first check_main_freshness() call aligns behavior with the main loop.\n\nAll three outstanding round-1 comments remain live and are not re-raised here.

Severity Count
Must-fix 0
Should-fix 1
Note 0

Variant: convergence-v2

Candidates: 1 | Posted: 1 | Suppressed: 0


1 comment posted · Model: claude-sonnet-4-6 (sonnet) · Tokens: 64953 in / 3928 out · Cache: 288549 created / 1273467 read · Context: 5 items · Review mode: agentic · Turns: 15 · Tool calls: 17 · Forced submit: yes · Tool mix: read_file=16, search_codebase=1

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
bin/main.ml (1)

1236-1247: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Register --clone-scheme as a value-taking option in argv preprocessing.

extract_pr_ops_from_argv only preserves separate option values for names listed in cli_option_takes_value. Because --clone-scheme is missing there, onton PROJECT --clone-scheme ssh treats ssh as a stray positional argument instead of the option value.

Suggested fix
-      (* Keep this in sync with Cmdliner [opt] arguments below:
-         --gameplan, --token, --backend, --model, --repo, --main-branch,
-         --poll-interval, and --max-concurrency. *)
+      (* Keep this in sync with Cmdliner [opt] arguments below:
+         --gameplan, --token, --backend, --model, --repo, --main-branch,
+         --clone-scheme, --poll-interval, and --max-concurrency. *)
       match s with
       | "--gameplan" | "--token" | "--backend" | "--model" | "--repo"
-      | "--main-branch" | "--poll-interval" | "--max-concurrency" ->
+      | "--main-branch" | "--clone-scheme" | "--poll-interval"
+      | "--max-concurrency" ->
           true

Also applies to: 1351-1364

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bin/main.ml` around lines 1236 - 1247, The argv preprocessing function
cli_option_takes_value currently omits "--clone-scheme", causing
extract_pr_ops_from_argv to treat its value as a positional argument; update
cli_option_takes_value to include the "--clone-scheme" option in the match arm
that returns true (the same arm that contains "--gameplan", "--token",
"--backend", "--model", "--repo", "--main-branch", "--poll-interval", and
"--max-concurrency") so that "--clone-scheme <value>" is preserved as a single
option/value pair during argv parsing.
♻️ Duplicate comments (1)
bin/main.ml (1)

596-633: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Persist the scheme returned by ensure_managed_repo on resume.

Line 612 discards the resolved scheme, and Lines 629-632 rebuild repo_coords.url_scheme from the old stored value. Resume can therefore refresh the checkout via SSH or --clone-scheme, then immediately re-save stale config and flip back on the next run.

Suggested fix
-                      match
-                        Managed_repo.ensure_managed_repo
-                          ~clone_scheme:effective_scheme ~project_name:proj
-                          ~token ~owner ~repo ()
-                      with
-                      | Ok _ -> ()
-                      | Error msg ->
-                          Printf.eprintf
-                            "onton: warning: %s (resuming with local state)\n\
-                             %!"
-                            msg);
+                      let resolved_scheme =
+                        match
+                          Managed_repo.ensure_managed_repo
+                            ~clone_scheme:effective_scheme ~project_name:proj
+                            ~token ~owner ~repo ()
+                        with
+                        | Ok (_repo_root, scheme) -> Some scheme
+                        | Error msg ->
+                            Printf.eprintf
+                              "onton: warning: %s (resuming with local state)\n\
+                               %!"
+                              msg;
+                            stored_scheme
+                      in
@@
-                      url_scheme =
-                        Managed_repo.url_scheme_of_string
-                          (Option.value stored.Project_store.url_scheme
-                             ~default:"");
+                      url_scheme = resolved_scheme;

Based on learnings Onton auto-detects SSH transport when a sibling clone of owner/repo exists and persists the managed clone transport as url_scheme in config.json.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bin/main.ml` around lines 596 - 633, The call to
Managed_repo.ensure_managed_repo currently discards the resolved clone scheme
(match ... | Ok _ -> ()), then later repo_coords.url_scheme is rebuilt from the
old stored.Project_store.url_scheme, so any auto-detected or --clone-scheme
resolution is not persisted; change the ensure_managed_repo match to capture the
returned scheme (e.g. | Ok resolved_scheme -> ...), use that resolved_scheme
when constructing repo_coords.url_scheme (instead of calling
Managed_repo.url_scheme_of_string on stored.Project_store.url_scheme), and
update/persist stored.Project_store.url_scheme with the resolved value before
continuing so resume saves the effective transport (reference symbols:
Managed_repo.ensure_managed_repo, repo_coords.url_scheme,
stored.Project_store.url_scheme).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@bin/main.ml`:
- Around line 1236-1247: The argv preprocessing function cli_option_takes_value
currently omits "--clone-scheme", causing extract_pr_ops_from_argv to treat its
value as a positional argument; update cli_option_takes_value to include the
"--clone-scheme" option in the match arm that returns true (the same arm that
contains "--gameplan", "--token", "--backend", "--model", "--repo",
"--main-branch", "--poll-interval", and "--max-concurrency") so that
"--clone-scheme <value>" is preserved as a single option/value pair during argv
parsing.

---

Duplicate comments:
In `@bin/main.ml`:
- Around line 596-633: The call to Managed_repo.ensure_managed_repo currently
discards the resolved clone scheme (match ... | Ok _ -> ()), then later
repo_coords.url_scheme is rebuilt from the old stored.Project_store.url_scheme,
so any auto-detected or --clone-scheme resolution is not persisted; change the
ensure_managed_repo match to capture the returned scheme (e.g. | Ok
resolved_scheme -> ...), use that resolved_scheme when constructing
repo_coords.url_scheme (instead of calling Managed_repo.url_scheme_of_string on
stored.Project_store.url_scheme), and update/persist
stored.Project_store.url_scheme with the resolved value before continuing so
resume saves the effective transport (reference symbols:
Managed_repo.ensure_managed_repo, repo_coords.url_scheme,
stored.Project_store.url_scheme).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 94bc132f-035a-4f7c-90f5-6465d5358464

📥 Commits

Reviewing files that changed from the base of the PR and between a06aa1a and 3ce016b.

📒 Files selected for processing (6)
  • bin/main.ml
  • lib/orchestrator.ml
  • lib_core/push_reject_classify.ml
  • test/dune
  • test/test_github_target_properties.ml
  • test/test_properties.ml
💤 Files with no reviewable changes (1)
  • test/dune

The sibling scan only worked when the user happened to have another GitHub
clone in one of a handful of conventional directories, and could pick SSH
even on machines where SSH to GitHub didn't actually work (no key, blocked
port 22, passphrase + no agent). Swap it for a non-interactive
[git ls-remote -h git@github.com:owner/repo.git] with [BatchMode=yes] and
[ConnectTimeout=5] — exit 0 means SSH can authenticate to GitHub and reach
this repo, which is the same condition SSH push needs. The resolved scheme
is still persisted to config.json so the probe only runs on the first
invocation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@subsetpark subsetpark changed the title Push-rejection loop guard + SSH auto-detect + OAuth scope preflight Push-rejection loop guard + SSH transport auto-detect May 24, 2026
Copy link
Copy Markdown

@flowglad-review-service flowglad-review-service Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: 1 comment

Comment thread lib/managed_repo.ml Outdated
@flowglad-review-service
Copy link
Copy Markdown

Review Summary

One new should-fix found in this push: ensure_managed_repo unconditionally calls resolve_clone_scheme (firing the 5-second SSH probe and printing a misleading "cloning managed repo via SSH/HTTPS" message) before checking whether the managed clone already exists on disk. On every resume with a legacy config.json (url_scheme = null), the probe runs and its result is immediately discarded in favour of the existing remote URL — wasting 5 seconds and showing the user a false "cloning" log line. The three outstanding prior comments (poller fetch_lock no-op, push_failure_count increment in permanent rejection arm, freshness_loop immediate first tick) are addressed or unchanged and are not re-raised here.

Severity Count
Must-fix 0
Should-fix 1
Note 0

Variant: convergence-v2

Candidates: 1 | Posted: 1 | Suppressed: 0


1 comment posted · Model: claude-sonnet-4-6 (sonnet) · Tokens: 64791 in / 4006 out · Cache: 281614 created / 1249026 read · Context: 5 items · Review mode: agentic · Turns: 15 · Tool calls: 17 · Forced submit: yes · Tool mix: read_file=16, search_codebase=1

…fiber

Three findings from the review of 116df0a, all confirmed against current
code:

1. bin/main.ml resume path discarded the scheme returned by
   [ensure_managed_repo] (matched [Ok _]) and rebuilt
   [repo_coords.url_scheme] from [stored.url_scheme], so a CLI override
   or a probe-determined scheme on a legacy config never reached
   [finalize_run]'s [save_config] call. Capture the resolved scheme and
   thread it through.

2. [Managed_repo.ensure_managed_repo] called [resolve_clone_scheme]
   unconditionally before the [has_git_dir] check. On a resume with a
   pre-existing clone (the common case), this ran a 5-second SSH
   reachability probe whose result was then discarded in favor of the
   on-disk [origin] URL, and printed a misleading "cloning managed repo
   via X" line even though no clone was performed. Defer the probe into
   the two clone branches so it only runs when we're actually about to
   clone.

3. [Poller_fiber.freshness_loop] fired [check_main_freshness] at t=0
   before sleeping, duplicating the fetch that [ensure_managed_repo]
   already ran at startup. Sleep first to match the main poll loop's
   cadence and avoid the redundant fetch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@subsetpark subsetpark merged commit b9f08ce into main May 24, 2026
8 of 9 checks passed
Copy link
Copy Markdown

@flowglad-review-service flowglad-review-service Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: 2 comments

Comment thread AGENTS.md
(`git@github.com:owner/repo.git`), the managed clone at
`~/.local/share/onton/<project>/repo` is created with the same SSH
transport, and pushes go through the user's ssh-agent / `~/.ssh/config`.
SSH bypasses the per-OAuth-scope restrictions entirely. The chosen
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD FIX] AGENTS.md:120 — The SSH section in the new ## Authentication block still describes the old sibling-clone heuristic (commit #2, which was superseded by commit #12). It says onton "auto-detects whether the user already maintains a clone of owner/repo under one of $PWD/.., ~/code-src/, …" and references the log message onton: detected SSH sibling clone at … — cloning managed repo via SSH.

The actual implementation introduced in this PR (resolve_clone_scheme / probe_ssh_available in lib/managed_repo.ml) does no sibling-clone scanning. It runs a non-interactive git ls-remote -h git@github.com:owner/repo.git probe and emits:

  • onton: SSH probe to git@github.com succeeded — cloning managed repo via SSH
  • onton: SSH probe to git@github.com unavailable — cloning managed repo via HTTPS

A user following these docs will look for a sibling clone that doesn't exist, and the log messages won't match. The section needs to describe the probe-based detection and the actual emitted messages.

Reviewed at 2173002

Comment thread README.md
workflow … without 'workflow' scope` message — and onton now classifies that
as `workflow_scope_missing` and routes the agent straight to
`needs_intervention` instead of retrying.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SHOULD FIX] README.md:200 — Same stale-docs issue as AGENTS.md: the #### SSH transport section added in this PR still documents the old sibling-clone heuristic ($PWD/.., ~/code-src/, ~/src/, ~/code/, ~/dev/, ~/projects/, and onton: detected SSH sibling clone at …). The implementation shipped in this PR is probe-based and emits onton: SSH probe to git@github.com succeeded — cloning managed repo via SSH — no sibling-clone directory scanning is involved. The section should be updated to reflect the actual probe mechanism, its fallback behaviour, and the correct log messages.

Reviewed at 2173002

@flowglad-review-service
Copy link
Copy Markdown

Review Summary

Delta review (round 5). Two should-fix findings, both on documentation sections newly added in this push:

AGENTS.md and README.md — The ## Authentication (AGENTS.md) and #### SSH transport (README.md) sections were written to describe commit #2's sibling-clone heuristic, but that commit was superseded by commit #12 (SSH reachability probe). Both sections still reference non-existent sibling-clone directory scanning and the stale log message onton: detected SSH sibling clone at …. The actual implementation emits onton: SSH probe to git@github.com succeeded/unavailable — cloning managed repo via SSH/HTTPS and does no directory scanning at all. Both sections should be updated to match the probe-based behaviour.

No previously-raised findings are re-raised. The lib/orchestrator.ml permanent-rejection counter bug (round 1) and the freshness_loop startup double-fetch (round 3) are both correctly addressed in this push.

Severity Count
Must-fix 0
Should-fix 2
Note 0

Variant: convergence-v2

Candidates: 2 | Posted: 2 | Suppressed: 0


2 comments posted · Model: claude-sonnet-4-6 (sonnet) · Tokens: 64743 in / 3771 out · Cache: 347913 created / 865249 read · Context: 5 items · Review mode: agentic · Turns: 12 · Tool calls: 11 · Tool mix: read_file=9, search_codebase=2

@subsetpark subsetpark deleted the push-reject-classify-and-ssh-auto-detect branch May 24, 2026 16:00
@coderabbitai coderabbitai Bot mentioned this pull request May 24, 2026
8 tasks
subsetpark added a commit that referenced this pull request May 24, 2026
--force-with-lease alone only verifies that the remote tracking ref
matches actual remote — once a background fetch refreshes that tracking
ref, the lease passes even when local is strictly behind, silently
wiping remote commits. PR #315 was force-pushed empty and auto-closed
this way.

--force-if-includes additionally requires that the remote tip is in the
local branch's reflog, which refuses pushes that would lose commits the
local clone never observed. Requires git ≥ 2.30 (already a prerequisite
for the worktree features onton relies on).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
subsetpark added a commit that referenced this pull request May 24, 2026
[Worktree.create] used to take the local branch ref as authoritative
when it existed in the user's working clone. The user's
[refs/heads/<branch>] could be stale (left at the PR's base, behind the
real PR head), so the worktree would start missing every PR commit —
exactly the state that wiped PR #315 when the session pushed it.

Split the decision into a pure module [Start_point_plan] in lib_core/
that takes observed local/remote-ref SHAs and a precomputed two-way
ancestry, returns one of [Reset_and_use_remote_tracking],
[Use_local_branch_unchanged], [Create_new_branch_from_base], or a
[refusal] for the diverged/local-ahead cases. The effectful side in
[Worktree.create] gathers the inputs, calls the planner, executes the
chosen action via [git worktree add -B/-b]/[add], and returns
[(t, refusal) Result.t]. [Worktree_setup.ensure_worktree] runs a new
[fetch_origin_branch] before [create] so the planner observes a fresh
remote ref.

Property tests (SPP-1..9) cover totality, determinism, the
"no-silent-clobber" invariant, remote authority, label bounds, and
exhaustive variant reachability. An integration test exercises every
arm against real git fixtures, including a direct reproduction of the
#315 stale-local-ref state.

Refusals are logged through the existing activity-log channel and
return [None] from [ensure_worktree], so the next scheduling tick sees
the agent idle. Future commits will route refusals through
[needs_intervention] explicitly via [Push_reject_classify].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
subsetpark added a commit that referenced this pull request May 24, 2026
[Worktree.force_push_with_lease] used to gate on
[git rev-list --count <base>..HEAD] (worktree HEAD) and then push the
named branch — so an agent that [git switch]ed to a different local
branch mid-session could pass the gate with commits the push command
would never upload. Combined with --force-with-lease's blind spot for
local commit loss (the lease passes whenever the local tracking ref
matches actual remote, even if local strictly lacks remote commits),
this put PR #315 in the position where a push of a stale branch wiped
its remote.

New pure module [Push_plan] in lib_core/ takes the observed worktree
HEAD, branch ref SHA, remote tracking SHA, two-way ancestry, and
[base..refs/heads/<branch>] commit count, and returns
[Push (Force_push_if_includes | Initial_push)] or
[Refuse (No_commits_ahead_of_base | Worktree_missing |
Branch_ref_missing | Branch_switched | Local_missing_remote_commits)].
The named branch — never HEAD — is what gets measured and pushed.

[Push_reject_classify.rejection] gains a [Local_state_unsafe] variant
(permanent), and [Push_plan.to_push_reject_classify_rejection] maps the
three "supervisor said no" refusals onto it so the orchestrator's
existing [is_permanent] → [needs_intervention] escalation applies
without new state.

Property tests (PPP-1..11) cover totality, branch-switch refusal,
local-missing-remote refusal, zero-commits skip, initial-push,
happy-path, worktree-missing pre-emption, determinism, exhaustive
variant reachability, refusal-to-rejection permanence, and label
bounds. PRC tests extend their existing variant scans for
Local_state_unsafe. An integration test drives real-git fixtures for
the branch-switched and local-missing-remote refusal arms plus the
happy-path force-push, asserting nothing reaches the remote when the
planner refuses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
subsetpark added a commit that referenced this pull request May 24, 2026
…#318)

* Add --force-if-includes to session-end force-push

--force-with-lease alone only verifies that the remote tracking ref
matches actual remote — once a background fetch refreshes that tracking
ref, the lease passes even when local is strictly behind, silently
wiping remote commits. PR #315 was force-pushed empty and auto-closed
this way.

--force-if-includes additionally requires that the remote tip is in the
local branch's reflog, which refuses pushes that would lose commits the
local clone never observed. Requires git ≥ 2.30 (already a prerequisite
for the worktree features onton relies on).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Plan worktree start point against fresh remote refs

[Worktree.create] used to take the local branch ref as authoritative
when it existed in the user's working clone. The user's
[refs/heads/<branch>] could be stale (left at the PR's base, behind the
real PR head), so the worktree would start missing every PR commit —
exactly the state that wiped PR #315 when the session pushed it.

Split the decision into a pure module [Start_point_plan] in lib_core/
that takes observed local/remote-ref SHAs and a precomputed two-way
ancestry, returns one of [Reset_and_use_remote_tracking],
[Use_local_branch_unchanged], [Create_new_branch_from_base], or a
[refusal] for the diverged/local-ahead cases. The effectful side in
[Worktree.create] gathers the inputs, calls the planner, executes the
chosen action via [git worktree add -B/-b]/[add], and returns
[(t, refusal) Result.t]. [Worktree_setup.ensure_worktree] runs a new
[fetch_origin_branch] before [create] so the planner observes a fresh
remote ref.

Property tests (SPP-1..9) cover totality, determinism, the
"no-silent-clobber" invariant, remote authority, label bounds, and
exhaustive variant reachability. An integration test exercises every
arm against real git fixtures, including a direct reproduction of the
#315 stale-local-ref state.

Refusals are logged through the existing activity-log channel and
return [None] from [ensure_worktree], so the next scheduling tick sees
the agent idle. Future commits will route refusals through
[needs_intervention] explicitly via [Push_reject_classify].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Plan pushes against worktree state, not just lease

[Worktree.force_push_with_lease] used to gate on
[git rev-list --count <base>..HEAD] (worktree HEAD) and then push the
named branch — so an agent that [git switch]ed to a different local
branch mid-session could pass the gate with commits the push command
would never upload. Combined with --force-with-lease's blind spot for
local commit loss (the lease passes whenever the local tracking ref
matches actual remote, even if local strictly lacks remote commits),
this put PR #315 in the position where a push of a stale branch wiped
its remote.

New pure module [Push_plan] in lib_core/ takes the observed worktree
HEAD, branch ref SHA, remote tracking SHA, two-way ancestry, and
[base..refs/heads/<branch>] commit count, and returns
[Push (Force_push_if_includes | Initial_push)] or
[Refuse (No_commits_ahead_of_base | Worktree_missing |
Branch_ref_missing | Branch_switched | Local_missing_remote_commits)].
The named branch — never HEAD — is what gets measured and pushed.

[Push_reject_classify.rejection] gains a [Local_state_unsafe] variant
(permanent), and [Push_plan.to_push_reject_classify_rejection] maps the
three "supervisor said no" refusals onto it so the orchestrator's
existing [is_permanent] → [needs_intervention] escalation applies
without new state.

Property tests (PPP-1..11) cover totality, branch-switch refusal,
local-missing-remote refusal, zero-commits skip, initial-push,
happy-path, worktree-missing pre-emption, determinism, exhaustive
variant reachability, refusal-to-rejection permanence, and label
bounds. PRC tests extend their existing variant scans for
Local_state_unsafe. An integration test drives real-git fixtures for
the branch-switched and local-missing-remote refusal arms plus the
happy-path force-push, asserting nothing reaches the remote when the
planner refuses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address PR 318 review comments

* Fix local-missing-remote integration fixture

* Distinguish refused worktree setup

* Route repo-root worktree refusal permanently

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant