Skip to content

feat: parallel chunk PUT and GET#8

Merged
sanity merged 4 commits into
mainfrom
fix-7
May 1, 2026
Merged

feat: parallel chunk PUT and GET#8
sanity merged 4 commits into
mainfrom
fix-7

Conversation

@sanity
Copy link
Copy Markdown
Contributor

@sanity sanity commented May 1, 2026

Problem

ChunkedPack publish and fetch are serial: each chunk operation waits for the previous to complete. Against the Freenet host under load each chunk takes ~60s, so a 200-chunk push (full freenet-core history at 1 MiB chunks) takes ~3-4 hours of wall time. That's the dominant pain point for routine work on big repos and is called out as the last "not yet supported" item in the README.

Approach

Run chunk operations in parallel across a small pool of WebSocket connections to the local Freenet node.

  • Open N connections up front (Vec<Arc<Mutex>>).
  • Round-robin chunks to connections (chunk_index % N).
  • Drive completion with futures::stream::iter(...).map(...).buffer_unordered(N).
  • For fetch, slot results into Vec<Option<Vec>> indexed by chunk position so reassembly is in manifest order regardless of completion order.
  • Manifest PUT and re-GET stay single-shot on the first connection (they are one-of-one operations).

N defaults to 8 (DEFAULT_PARALLEL_OPS) and can be overridden with FREENET_GIT_PARALLEL_OPS=K. Setting it to 1 forces serial behavior. The pool is capped at the chunk count to avoid wasted connections for small payloads.

API change

publish_chunked_pack and fetch_chunked_pack now take ws_url: &str instead of &mut WebApi because they open their own pool internally. The _with_progress variants gain a parallelism: usize parameter. Progress callbacks fire on completion (with completed-count) instead of before each op, which is the correct semantics for out-of-order parallel work.

This is a breaking lib API change. The lib is internal to the binaries in this same crate; no external consumers known. The callers in git-remote-freenet.rs are updated.

Rejected alternatives

  • Multiplex one connection. WebApi is !Sync and the WebSocket stream isn't designed for interleaved txns from one client; one connection per concurrent op is simpler and the resource cost is small.
  • Spawn N worker tasks pulling from a channel. Equivalent in throughput but more code; the buffer_unordered form is shorter and the per-task index propagation is direct.

Testing

New unit tests in chunked.rs::tests:

  • assemble_chunks_concatenates_in_order: gives the assembler the same chunks in slot order and again in shuffled completion order, asserts both produce the same byte vector. Load-bearing piece for parallel fetch correctness.
  • assemble_chunks_errors_on_missing_slot: a None slot must error rather than silently truncate.
  • assemble_chunks_errors_on_total_size_mismatch: defends the manifest total_size invariant.
  • parallelism_from_env_handles_unset_zero_and_invalid: env-var parsing including 0 and non-integer fallback.

Manual e2e against a real Freenet node is the planned next step before merge; will report results in a follow-up comment.

Why didn't CI catch this?

The serial implementation wasn't a bug, it was a perf limitation. There is no slow-operation regression CI check. The new parallel impl could regress in two ways:

  • Reassembly mis-ordering. Closed by the assemble_chunks unit tests.
  • Hung locks / deadlock from an .await while holding a mutex guard wrong. Caught by running the existing test suite under tokio runtime; could be made stronger with a mock-WebApi integration test in a follow-up.

Out-of-scope (filed for follow-up if needed)

  • Parallelize rescue_chunked in main.rs (different control flow: GET-then-PUT per chunk).
  • Streaming chunks from git pack-objects directly without staging the full pack in memory. Bigger refactor; do separately if it's needed.

Fixes

Closes #7

[AI-assisted - Claude]

sanity and others added 4 commits May 1, 2026 09:47
ChunkedPack publish and fetch were serial: each chunk operation
waited for the previous to complete. Against the Freenet host under
load each chunk takes ~60s, so a 200-chunk push (full freenet-core
history at 1 MiB chunks) took ~3-4 hours.

This PR runs chunk operations in parallel across a small pool of
WebSocket connections. Pool size defaults to 8, configurable via
FREENET_GIT_PARALLEL_OPS=N. Round-robin chunk-to-conn dispatch with
buffer_unordered so completion order is decoupled from chunk index.

Reassembly is the load-bearing piece: chunks complete out of order
but must be glued back in manifest order. The new assemble_chunks
helper takes Vec<Option<Vec<u8>>> indexed by chunk position and
concatenates in slot order, with unit tests covering the in-order,
shuffled, missing-slot, and total-size-mismatch cases.

Manifest PUT and re-GET stay single-shot on the first connection
because they are one-of-one operations.

Expected speedup for a 200-chunk push at 60s/chunk: ~200 minutes
serial -> ~25 minutes 8-way parallel.

API change: publish_chunked_pack and fetch_chunked_pack now take a
ws_url: &str instead of &mut WebApi, because they open their own
pool of N connections internally. Both add a parallelism: usize
parameter on the _with_progress variants. Progress callbacks fire
on completion (with completed-count) instead of before each op,
which is the correct semantics for out-of-order parallel work.

Out-of-scope (filed for follow-up if needed):
- Parallelize rescue_chunked in main.rs (different control flow,
  GET-then-PUT per chunk).
- Streaming chunks: read directly from pack-objects subprocess
  and feed chunks as they're produced. Bigger refactor.

Closes #7

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Wrap pack_wasm in Arc<Vec<u8>> so phase 2 (verify) and fetch phase
  4 don't deep-clone the WASM bytes per chunk; phase 1 still clones
  because put_pack consumes Vec<u8> for retries.
- Take ownership of chunks via .into_iter() in phase 1 to remove the
  per-chunk clone (chunks isn't used after phase 1).
- Rename mutex guards from `g` to `conn`.
- Replace `Ok::<_, anyhow::Error>(...)` with `anyhow::Ok(...)`.
- Replace `(0..n).map(|_| None).collect()` with `vec![None; n]`.
- Drop a redundant `async {}` wrapper in open_pool.

No behavior change. All 19 unit tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… parsing

Addresses review findings on PR #8:

Codex P1: open_pool now degrades gracefully. If the local node refuses
extra connections beyond the first, we proceed with however many we
got rather than aborting the whole publish/fetch. Only fails if we
cannot open even one connection. The pre-parallel implementation
managed with one, so this matches that compatibility floor.

Testing P1 / Skeptical M1: parallelism_from_env split into a thin env
reader plus a pure parse_parallelism(Option<&str>) helper. The unit
test exercises the parser directly, so we no longer mutate process
env state in tests. Removes a known race that becomes a hard error in
edition 2024 where set_var/remove_var are unsafe.

Testing P1: extracted collect_chunks_into_slots helper from the fetch
chunked-GET loop. Two new tokio::test cases drive it with a
deliberately scrambled stream of (chunk_index, bytes) results, so the
"completion order does not affect slot placement" invariant is now
genuinely tested. The previous assemble_chunks "shuffled" test was
testing Vec iteration, not the load-bearing reassembly logic.

Code-first / Skeptical N1: removed redundant drop(stream) calls. The
while-let loops only exit when the stream is empty, so the explicit
drops were no-ops.

Skeptical N3: unified the actual_parallelism clamp form between
publish and fetch (both now use parallelism.min((n as usize).max(1))).

Skeptical N4: tightened the Arc<Vec<u8>> comment to reflect that the
phase 1 win is small (avoids one local move) and the real fix is
plumbing &[u8] through wsclient::put_pack, deferred as a follow-up.

Skeptical N5: added a comment that phase 2's BLAKE3 verify is
belt-and-braces with wsclient::get_pack's own check; kept anyway as
defense-in-depth at the publish-verify layer.

Skeptical N7: renamed shadowed `completed` -> `puts_completed` /
`verifies_completed` so the publish phase 1 / phase 2 counters are
visibly distinct.

Skeptical M2: documented the assumption that per-chunk latency is
roughly uniform. With round-robin and one stuck chunk on conn k, every
chunk where i % N == k queues behind it; for content-addressed PUT/GET
this is fine but worth noting if costs become more variable.

Big-picture: updated the two user-facing progress strings in
git-remote-freenet.rs to disclose the parallelism N, so users don't
multiply ~60s × 200 chunks in their head.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sanity
Copy link
Copy Markdown
Contributor Author

sanity commented May 1, 2026

Review pass complete

Ran four internal review agents (code-first, testing, skeptical, big-picture) and Codex in parallel. Summary of what was applied and what was punted.

Applied to this branch

  • [Codex P1] Graceful pool degradation. `open_pool` no longer aborts the whole publish/fetch if the local node refuses connection N+1; it proceeds with however many it got and warns. Falls back to error only if it cannot open even one (matching the pre-parallel single-connection behavior).
  • [Testing P1 + Skeptical M1] Race-free env parsing. Split `parallelism_from_env` into a thin reader plus a pure `parse_parallelism(Option<&str>)` helper. Tests now exercise the parser directly instead of mutating process env state. Avoids the known anti-pattern that becomes a hard error in edition 2024.
  • [Testing P1] Real out-of-order completion test. Extracted `collect_chunks_into_slots` from the fetch loop and added two `#[tokio::test]` cases that drive it with a deliberately scrambled `stream::iter`. The previous "shuffled" test was testing Vec iteration, not the load-bearing reassembly logic.
  • [Skeptical N1] Removed redundant `drop(stream)` calls that were no-ops.
  • [Skeptical N3] Unified the actual_parallelism clamp form between publish and fetch.
  • [Skeptical N4] Tightened the Arc<Vec> comment about phase 1 vs phase 2 cloning.
  • [Skeptical N5] Added a comment that phase 2's BLAKE3 verify is belt-and-braces with wsclient::get_pack's own check.
  • [Skeptical N7] Renamed shadowed `completed` to `puts_completed` / `verifies_completed`.
  • [Skeptical M2] Documented the uniform-per-chunk-latency assumption: round-robin + one stuck chunk on conn k means siblings on that conn queue serially.
  • [Big-picture] Updated the two user-facing progress strings to disclose the parallelism N, so users don't multiply ~60s × 200 chunks in their head.
  • [Big-picture] Removed the stale README "parallel chunk uploads not yet shipped" caveat in an earlier commit on this branch.

Filed as follow-ups (out of scope for #7)

Validation gap

No CI in this repo, and there is no mock `WebApi`. The orchestration logic (`buffer_unordered` over a `Vec<Arc<Mutex>>` pool, round-robin chunk dispatch, slot-indexed reassembly) is unit-tested in pieces but not end-to-end. Big-picture review noted this and asked for a real-network smoke test before merge confirming 8-way concurrent ops actually work and the speedup is in the expected ballpark. Pending.

[AI-assisted - Claude]

@sanity sanity merged commit c099ea1 into main May 1, 2026
1 check passed
sanity added a commit that referenced this pull request May 6, 2026
…marize

Addresses all four internal review agents + Codex on PR #17.

Critical:

- Snapshot tree build now uses `git archive HEAD | tar -xf -` instead
  of `tar --exclude=.git`. The previous approach copied the worktree
  and ran `git add .` in the fresh repo, which re-applied .gitignore
  and silently dropped force-added files. `git archive` dumps exactly
  HEAD's tracked tree, and `git add -f .` then commits all of it.
  (Codex P2; Skeptical M1; Code-first nit.)

- Every `${{ inputs.* }}` reference in `run:` blocks now goes through
  `env:` indirection. Reusable callers are in-org and trusted today,
  but the previous direct interpolation made the workflow a footgun
  for any future caller -- a stray `"` in a branch name or version
  string was a shell-injection vector.
  (Skeptical H1, H2; Code-first should-fix #3.)

- Workflow-level `concurrency:` block keyed on `freenet_repo`. The
  repo contract is read-modify-write; two parallel pushes to the same
  prefix race on `update_seq` and the second arriving is silently
  dropped by the CRDT merge. `cancel-in-progress: false` queues new
  runs so a fresh push never loses to an in-flight stale one.
  (Skeptical H3; Code-first should-fix #5; Big-picture future-work.)

- History mode now does `git checkout -B "$BRANCH"` after
  actions/checkout to defend against detached-HEAD checkouts that
  would break the `<branch>:<branch>` refspec.
  (Skeptical M2.)

Should-fix:

- Toolchain pin dropped from `1.93.0` to `dtolnay/rust-toolchain@stable`
  (no `with: toolchain:`). The pin doesn't aid determinism for a CLI
  install, and pinning to a specific version risks both workflows
  breaking simultaneously if a freenet-git release picks up a
  transitive dep with a higher MSRV.
  (Skeptical M3; Code-first nit #8.)

- Bundle-write step now starts with `set -euo pipefail` and validates
  the file with `test -s` rather than running `freenet-git whoami`
  (the latter prints registry contents to CI logs unnecessarily; both
  paths are audited safe per skeptical L4-L5).
  (Code-first should-fix #4.)

- `freenet_repo` input now validated against the `<prefix>/<label>`
  shape in a dedicated step, with a directed error if the caller
  accidentally includes the `freenet:` scheme prefix.
  (Code-first should-fix #6; Skeptical L6.)

- Workflow header gained a "Choosing mode" stanza explaining
  snapshot vs history semantics, and a paragraph noting that
  snapshot mode does not handle submodules or git-LFS.
  (Big-picture #2; Code-first should-fix #1, #2.)

- New `freenet_git_extra_args` passthrough input so caller workflows
  can extend the `git push` invocation (e.g. timeouts) without
  pushing back changes here.
  (Big-picture future-work #1.)

- New `workflow_dispatch` trigger so a maintainer can manually
  exercise the workflow against a throwaway target before relying
  on it from a caller repo.
  (Code-first concern.)

rescue-demos.yml:

- New `summarize` job aggregates matrix results so a partial-green
  "rescue worked for 2/3 repos" registers as a workflow-level
  failure (and triggers GitHub's default failure-notification email).
  (Big-picture #3; Skeptical L3.)

- Header now flags the dependency on the gateway holding the bytes
  -- if the gateway evicts before the next scheduled run, rescue is
  a no-op for that URL. Future `freenet-git rescue --from <git-dir>`
  will let us pass a clone path and remove that dependency.
  (Big-picture #1.)

- Toolchain pin dropped here too; matches mirror-repo.yml.

Pushed back / deferred:

- Code-first #7 (matrix to JSON file): three URLs is fine, would add
  surface area without proportionate benefit.
- Code-first #11 (cron times): subjective, and GH cron is best-effort
  anyway.
- Code-first #12 (README duplication): pre-existing, out of scope.
- Code-first #13 (test-plan checkboxes): cosmetic.
- Big-picture #4 (`latest` vs pin): default kept as `latest` with the
  override input documented; pinning by default would freeze the
  pipeline against the buggy 0.1.12 prior to this PR's release.
- Big-picture #5 (docs/0001-large-repos.md): a design doc for
  ChunkedPack, not a workflow doc. Out of scope.
- Big-picture #7 (agent-memory rot): handled in a separate ~/.claude
  commit after this PR merges, per the global commit-claude-config
  rule.
- Skeptical M4 (`force_push_allowed`): verified safe -- the contract's
  update_seq monotonicity gates writes regardless of the `+` flag.

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sanity added a commit that referenced this pull request May 6, 2026
* feat: add mirror + rescue workflows; rotate demo URLs

Three new artifacts.

`.github/workflows/mirror-repo.yml` (reusable workflow_call). Caller
repos (freenet-core, freenet-stdlib) invoke it via
`uses: freenet/freenet-git/.github/workflows/mirror-repo.yml@<ref>`
to push their content into a Freenet-hosted repo on every commit to
main. Inputs: target prefix/label, mode (`snapshot` for HEAD-only
orphan commits, `history` for full git history), branch, and an
optional `freenet_git_version` pin. Caller passes the identity bundle
and gateway WS URL via secrets.

`.github/workflows/rescue-demos.yml` (scheduled). Twice a day, runs
`freenet-git rescue` for the three demo URLs so they don't fall out
of peers' LRU caches. Rescue is permissionless, so this only needs
`FREENET_GIT_WS_URL`.

README updates: the demo URLs are rotated to fresh prefixes derived
from a no-passphrase identity bundle, replacing the old
chunked-demo-bundle URLs (`AaRxPZVdWrPh/freenet-core` and
`2pyvKxrozxgT/freenet-stdlib`). The "Keeping a published repo alive"
section now points at the rescue workflow as the canonical
keep-alive mechanism for these specific URLs.

New URLs:
- freenet:3GEERif5ihbf/freenet-core (snapshot, mirrored from
  freenet/freenet-core@main)
- freenet:96rknpy1GYhZ/freenet-stdlib (full history, mirrored from
  freenet/freenet-stdlib@main)
- freenet:99TmCayXn6Tm/freenet-git (this repo's own mirror)

Initial state for the first two is already published; the workflow
just keeps it fresh and re-publishes on upstream pushes.

Required secrets in caller repos:
- FREENET_GIT_IDENTITY_BUNDLE_BASE64 -- base64 of the no-passphrase
  bundle that owns the prefixes above
- FREENET_GIT_WS_URL -- WebSocket URL of a reachable Freenet node
  (we reuse RIVER_GATEWAY_URL in this org)

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* review fixes: env-var indirection, concurrency, snapshot rebuild, summarize

Addresses all four internal review agents + Codex on PR #17.

Critical:

- Snapshot tree build now uses `git archive HEAD | tar -xf -` instead
  of `tar --exclude=.git`. The previous approach copied the worktree
  and ran `git add .` in the fresh repo, which re-applied .gitignore
  and silently dropped force-added files. `git archive` dumps exactly
  HEAD's tracked tree, and `git add -f .` then commits all of it.
  (Codex P2; Skeptical M1; Code-first nit.)

- Every `${{ inputs.* }}` reference in `run:` blocks now goes through
  `env:` indirection. Reusable callers are in-org and trusted today,
  but the previous direct interpolation made the workflow a footgun
  for any future caller -- a stray `"` in a branch name or version
  string was a shell-injection vector.
  (Skeptical H1, H2; Code-first should-fix #3.)

- Workflow-level `concurrency:` block keyed on `freenet_repo`. The
  repo contract is read-modify-write; two parallel pushes to the same
  prefix race on `update_seq` and the second arriving is silently
  dropped by the CRDT merge. `cancel-in-progress: false` queues new
  runs so a fresh push never loses to an in-flight stale one.
  (Skeptical H3; Code-first should-fix #5; Big-picture future-work.)

- History mode now does `git checkout -B "$BRANCH"` after
  actions/checkout to defend against detached-HEAD checkouts that
  would break the `<branch>:<branch>` refspec.
  (Skeptical M2.)

Should-fix:

- Toolchain pin dropped from `1.93.0` to `dtolnay/rust-toolchain@stable`
  (no `with: toolchain:`). The pin doesn't aid determinism for a CLI
  install, and pinning to a specific version risks both workflows
  breaking simultaneously if a freenet-git release picks up a
  transitive dep with a higher MSRV.
  (Skeptical M3; Code-first nit #8.)

- Bundle-write step now starts with `set -euo pipefail` and validates
  the file with `test -s` rather than running `freenet-git whoami`
  (the latter prints registry contents to CI logs unnecessarily; both
  paths are audited safe per skeptical L4-L5).
  (Code-first should-fix #4.)

- `freenet_repo` input now validated against the `<prefix>/<label>`
  shape in a dedicated step, with a directed error if the caller
  accidentally includes the `freenet:` scheme prefix.
  (Code-first should-fix #6; Skeptical L6.)

- Workflow header gained a "Choosing mode" stanza explaining
  snapshot vs history semantics, and a paragraph noting that
  snapshot mode does not handle submodules or git-LFS.
  (Big-picture #2; Code-first should-fix #1, #2.)

- New `freenet_git_extra_args` passthrough input so caller workflows
  can extend the `git push` invocation (e.g. timeouts) without
  pushing back changes here.
  (Big-picture future-work #1.)

- New `workflow_dispatch` trigger so a maintainer can manually
  exercise the workflow against a throwaway target before relying
  on it from a caller repo.
  (Code-first concern.)

rescue-demos.yml:

- New `summarize` job aggregates matrix results so a partial-green
  "rescue worked for 2/3 repos" registers as a workflow-level
  failure (and triggers GitHub's default failure-notification email).
  (Big-picture #3; Skeptical L3.)

- Header now flags the dependency on the gateway holding the bytes
  -- if the gateway evicts before the next scheduled run, rescue is
  a no-op for that URL. Future `freenet-git rescue --from <git-dir>`
  will let us pass a clone path and remove that dependency.
  (Big-picture #1.)

- Toolchain pin dropped here too; matches mirror-repo.yml.

Pushed back / deferred:

- Code-first #7 (matrix to JSON file): three URLs is fine, would add
  surface area without proportionate benefit.
- Code-first #11 (cron times): subjective, and GH cron is best-effort
  anyway.
- Code-first #12 (README duplication): pre-existing, out of scope.
- Code-first #13 (test-plan checkboxes): cosmetic.
- Big-picture #4 (`latest` vs pin): default kept as `latest` with the
  override input documented; pinning by default would freeze the
  pipeline against the buggy 0.1.12 prior to this PR's release.
- Big-picture #5 (docs/0001-large-repos.md): a design doc for
  ChunkedPack, not a workflow doc. Out of scope.
- Big-picture #7 (agent-memory rot): handled in a separate ~/.claude
  commit after this PR merges, per the global commit-claude-config
  rule.
- Skeptical M4 (`force_push_allowed`): verified safe -- the contract's
  update_seq monotonicity gates writes regardless of the `+` flag.

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sanity added a commit that referenced this pull request May 7, 2026
Addresses all four internal reviewers + Codex on PR #26.

Codex P2 / Code-first #1 / Skeptical M1: object existence check now
distinguishes "missing", "non-commit", and "tooling failure" cases.
Renamed `object_exists` -> `commit_exists` and reimplemented via
`git rev-parse --verify --quiet <sha>^{commit}`. The new check:

- Returns Ok(true) only if <sha> is present and is a commit object.
  cat-file -e succeeded on blobs and trees, which would have left
  the cryptic `fatal: not a commit object` rev-list error reachable
  for corrupted local stores.
- Distinguishes missing from corrupt-tooling via exit code 1 vs 128;
  the latter bubbles up rather than silently widening the pack
  (force) or producing a misleading "run git fetch" message
  (non-force).
- New test `commit_exists_rejects_blob_and_tree` pins this against
  blob and tree SHAs from the same repo.

Skeptical M2: directed error message reordered to lead with --force
since snapshot/orphan-style pushes are the canonical case driving
this PR. `git fetch` is still listed for the regular fast-forward
recovery scenario but no longer the first remediation.

Code-first #5: error message wording improved -- "use `git push +`"
isn't a real invocation; rephrased as "use `git push --force` (or
prefix the refspec with `+`)". The prior bail!(...) string was
preserved but rephrased.

Code-first #7 / Skeptical L3: test setup now sets per-repo
commit.gpgsign=false and tag.gpgsign=false, so a developer with
GPG signing on globally without a working key won't trip a prompt
in tests.

Code-first #8: `build_pack_have_present_succeeds` was a no-op
(let _ = pack). Renamed to `build_pack_have_equal_to_want_emits_
empty_pack` and now asserts the resulting pack is smaller than
the no-have pack -- pinning that rev-list short-circuits when
have == want.

Testing must-add #1: new test
`build_pack_fast_forward_have_is_real_ancestor` exercises the
production happy path -- have is the parent commit, want is the
child. Asserts the incremental pack is non-empty AND smaller than
the full pack, so a future regression that drops `^<have>` from
the rev-list args silently would be caught.

Testing should-add #3: directed-error wording now pinned via
substring assertions on "not present", "local repo", "git fetch",
"--force", and the bogus SHA itself.

Big-picture LOW: docstring "Phase 1.0 caveats" block in the helper
gained a "Push semantics" bullet documenting non-force vs force
expectations.

Pushed back / not addressing in this PR:

- Code-first #2 (silent SinglePack/ChunkedPack flip when force drops
  have for large `want` history): real concern but architectural;
  for snapshot mode the orphan commits are small so not a current
  practical issue. Worth a follow-up if a future caller pushes a
  large-history force.
- Code-first #4 (parse_push_spec allows force=true with empty src):
  caller rejects empty src before reading force; future-proofing
  with a defensive parser check is unnecessary today.
- Skeptical L2 (race between cat-file and rev-list under concurrent
  gc): theoretical, two-Command::status() window. The helper is
  invoked synchronously by git push; concurrent gc on the same
  repo is rare. Not in scope.
- Skeptical L4 (git init -b main needs git >= 2.28): all current
  CI runners have a modern git. Not in scope.
- Skeptical L5 (pack uploaded before contract validation runs):
  pre-existing ordering, not introduced here.
- Testing should-add #2 (parse-spec -> build-pack glue test): the
  existing test surface is sufficient given each function is
  unit-tested. A full handle_push test would require WS-API
  scaffolding that's overkill for this scope.
- Testing should-add #4 (bogus `want`): the existing
  `git_resolve_ref` call path catches that earlier in handle_push.
- Big-picture LOW (file follow-up issue for workflow workaround
  cleanup): tracked in issue body as "follow-up"; will file a
  separate ticket once this lands and the caller pin is bumped.

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sanity added a commit that referenced this pull request May 7, 2026
* fix: build_pack handles force-push when local lacks the remote tip

Closes #21.

## Problem

`build_pack` invoked `git rev-list ^<have> <want>` blindly. When
`<have>` is the remote tip reported by the contract state but the
local repo has never seen it (canonical case: force-pushing an orphan
commit, exactly what snapshot-mode mirroring does), rev-list bailed
with `fatal: bad object <sha>` and the entire push failed.

This was the root cause behind the freenet-core mirror failing 5/5
runs after merge. PR #20 worked around it at the workflow layer with
a `git fetch freenet $BRANCH` before each push; that fix keeps
working but adds bandwidth and only helps callers of the reusable
workflow.

## Fix

Two pieces:

1. `parse_push_spec` now returns `(force, src, dst)` instead of
   silently discarding the leading `+`.
2. `build_pack` takes the new `force: bool` and pre-validates `<have>`
   with `git cat-file -e`. If `<have>` is missing locally and the
   push is a force, drop `<have>` from the rev-list args and send
   everything reachable from `<want>` -- the right semantic for
   "replace whatever was there." If `<have>` is missing and the push
   is non-force, return a directed error pointing the user at
   `git fetch` or `+` rather than the cryptic `fatal: bad object`
   from rev-list.

## Tests

Six new unit tests in `git-remote-freenet.rs` covering:

- `parse_push_spec_force_flag` -- `+` is parsed as force, plain spec
  is non-force.
- `build_pack_no_have_succeeds` -- empty have is the first-push case.
- `build_pack_have_present_succeeds` -- existing fast-forward path.
- `build_pack_missing_have_non_force_fails_with_directed_error` --
  the non-force surface, directed error message.
- `build_pack_missing_have_force_drops_have_and_succeeds` -- the
  exact case that was broken before.
- `object_exists_distinguishes_real_from_bogus` -- pins the
  `cat-file -e` helper.

## End-to-end verification

Reproduced the original failure with the patched binary against the
live demo URL `freenet:3GEERif5ihbf/freenet-core`:

  $ # Fresh orphan repo, force push, no fetch:
  $ git push freenet +main:main
  ==> reading repo state from Freenet
  ==> publishing 226 B pack as a single bundle
  ==> updating repo state on Freenet
  ==> done
   + b2a8836...5d4a733 main -> main (forced update)

  $ # Non-force from fresh orphan -- gets the directed error:
  $ git push freenet main:main
   ! [remote rejected] main -> main (remote tip 5d4a7334... is not
     present in the local repo; run `git fetch` to populate it, or
     use `git push +`/`--force` to overwrite (which will discard the
     remote tip's history))

The freenet-core demo was clobbered briefly during testing and
re-published to its proper snapshot (`b2a8836`) after.

## Follow-up

Once this ships in a release and the caller workflow's pin is
bumped, drop the `git fetch` workaround from
`.github/workflows/mirror-repo.yml`.

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* review fixes: tighten object check, reorder error, harden tests

Addresses all four internal reviewers + Codex on PR #26.

Codex P2 / Code-first #1 / Skeptical M1: object existence check now
distinguishes "missing", "non-commit", and "tooling failure" cases.
Renamed `object_exists` -> `commit_exists` and reimplemented via
`git rev-parse --verify --quiet <sha>^{commit}`. The new check:

- Returns Ok(true) only if <sha> is present and is a commit object.
  cat-file -e succeeded on blobs and trees, which would have left
  the cryptic `fatal: not a commit object` rev-list error reachable
  for corrupted local stores.
- Distinguishes missing from corrupt-tooling via exit code 1 vs 128;
  the latter bubbles up rather than silently widening the pack
  (force) or producing a misleading "run git fetch" message
  (non-force).
- New test `commit_exists_rejects_blob_and_tree` pins this against
  blob and tree SHAs from the same repo.

Skeptical M2: directed error message reordered to lead with --force
since snapshot/orphan-style pushes are the canonical case driving
this PR. `git fetch` is still listed for the regular fast-forward
recovery scenario but no longer the first remediation.

Code-first #5: error message wording improved -- "use `git push +`"
isn't a real invocation; rephrased as "use `git push --force` (or
prefix the refspec with `+`)". The prior bail!(...) string was
preserved but rephrased.

Code-first #7 / Skeptical L3: test setup now sets per-repo
commit.gpgsign=false and tag.gpgsign=false, so a developer with
GPG signing on globally without a working key won't trip a prompt
in tests.

Code-first #8: `build_pack_have_present_succeeds` was a no-op
(let _ = pack). Renamed to `build_pack_have_equal_to_want_emits_
empty_pack` and now asserts the resulting pack is smaller than
the no-have pack -- pinning that rev-list short-circuits when
have == want.

Testing must-add #1: new test
`build_pack_fast_forward_have_is_real_ancestor` exercises the
production happy path -- have is the parent commit, want is the
child. Asserts the incremental pack is non-empty AND smaller than
the full pack, so a future regression that drops `^<have>` from
the rev-list args silently would be caught.

Testing should-add #3: directed-error wording now pinned via
substring assertions on "not present", "local repo", "git fetch",
"--force", and the bogus SHA itself.

Big-picture LOW: docstring "Phase 1.0 caveats" block in the helper
gained a "Push semantics" bullet documenting non-force vs force
expectations.

Pushed back / not addressing in this PR:

- Code-first #2 (silent SinglePack/ChunkedPack flip when force drops
  have for large `want` history): real concern but architectural;
  for snapshot mode the orphan commits are small so not a current
  practical issue. Worth a follow-up if a future caller pushes a
  large-history force.
- Code-first #4 (parse_push_spec allows force=true with empty src):
  caller rejects empty src before reading force; future-proofing
  with a defensive parser check is unnecessary today.
- Skeptical L2 (race between cat-file and rev-list under concurrent
  gc): theoretical, two-Command::status() window. The helper is
  invoked synchronously by git push; concurrent gc on the same
  repo is rare. Not in scope.
- Skeptical L4 (git init -b main needs git >= 2.28): all current
  CI runners have a modern git. Not in scope.
- Skeptical L5 (pack uploaded before contract validation runs):
  pre-existing ordering, not introduced here.
- Testing should-add #2 (parse-spec -> build-pack glue test): the
  existing test surface is sufficient given each function is
  unit-tested. A full handle_push test would require WS-API
  scaffolding that's overkill for this scope.
- Testing should-add #4 (bogus `want`): the existing
  `git_resolve_ref` call path catches that earlier in handle_push.
- Big-picture LOW (file follow-up issue for workflow workaround
  cleanup): tracked in issue body as "follow-up"; will file a
  separate ticket once this lands and the caller pin is bumped.

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sanity added a commit that referenced this pull request May 7, 2026
…tifier-fail

Addresses Codex P2 + skeptical 1, 3, 8 + big-picture minor on PR #28.

- `olabiniV2/matrix-message@v0.0.1` -> SHA pin
  `5927bc6d98755e4364377fcfe0074aacbc01d237` (with inline `# v0.0.1`
  comment) so a re-tag of v0.0.1 by an external party can't silently
  swap in a different commit with secret access. The org's existing
  `matrix_pr_merge_notify.yml` precedent uses tag-pinning; that's a
  pre-existing footgun this PR opts NOT to extend.
  (Skeptical #3, big-picture minor.)

- `if: failure()` -> `if: failure() || cancelled()` on the mirror
  notification step. A 30-minute job timeout or runner shutdown
  fires `cancelled()` rather than `failure()` and would otherwise
  silently leave the demo behind upstream with no alert.
  (Skeptical #1.)

- The summarize job's `Fail if any rescue failed` step is now
  `if: always() && needs.rescue.result != 'success'`. Without
  always(), a failure in the Matrix-notify step above would skip
  the summary error and the workflow would fail with the notifier's
  error (e.g. missing secrets) instead of the actual rescue failure.
  (Codex P2.)

- Updated the inline comment to acknowledge the
  optional-secrets-mean-extra-red-step UX (the prior wording said
  "errors quietly" which is over-confident -- skeptical #8).

- Tightened wording of the mirror failure message ("Demo URL may
  be behind upstream" rather than the snapshot-mode-specific
  "going stale").

Pushed back / out of scope:

- Skeptical #2 (caller-side gap: explicit `secrets:` map in
  freenet-core / freenet-stdlib doesn't pass the new optional
  Matrix secrets, so the alert is inert for the two production
  callers until they SHA-bump and update their pass-through):
  acknowledged; addressed by companion PRs to those repos that
  land with this one. The "finish the fix" criterion is met by
  the three PRs together, not by this PR alone.

- Skeptical #4 (`secrets.X` in `${{ }}` `with:` input): the
  matrix-message action's API requires this. Switching to a
  better-maintained action with env-var inputs is a bigger
  change worth filing separately, not in this PR.

- Skeptical #5 (push-storm dedup): real concern but requires a
  cross-run notification deduper. Today's worst case is a
  single Matrix room getting N copies of the same alert during
  a sustained gateway outage, which is annoying but actionable.
  Defer until we've seen it happen.

- Skeptical #6 (untrusted input.freenet_repo from fork PRs):
  not reachable today (callers are trusted org repos); the
  workflow header comment already documents that the reusable
  workflow trusts its inputs.

- Skeptical #7 (style: `failure()` vs `result != 'success'`):
  the two halves correctly use different idioms because the
  rescue side is a `summarize` job depending on a matrix job
  (where `result != 'success'` is the canonical aggregate
  check), while the mirror side is a step inside the same job
  (where `failure() || cancelled()` is the canonical step
  predicate).

- Skeptical #9 (snapshot/history wording): reworded to be
  mode-neutral.

- Skeptical #10 (emoji): consistent with org precedent in
  matrix_pr_merge_notify.yml; not changing.

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sanity added a commit that referenced this pull request May 7, 2026
* feat: matrix notification on mirror/rescue failure

Closes #23.

## Problem

When `mirror-to-freenet.yml` (in caller repos) or `rescue-demos.yml`
fails, the only signal is GitHub's default email-on-failure-to-author.
In practice nobody actively watches it -- five consecutive
freenet-core mirror failures and a community report of cold clones
on 2026-05-07 went unnoticed for hours despite the email going out.

## Fix

Add a `Notify Matrix on failure` step to both workflows. Uses the
`olabiniV2/matrix-message@v0.0.1` action that's already in use
elsewhere in the org (see `freenet-stdlib/.github/workflows/
matrix_pr_merge_notify.yml`).

`MATRIX_ACCESS_TOKEN` and `MATRIX_ROOM_ID` are org-level secrets at
freenet (visibility: all), so any repo invoking the reusable
workflow with `secrets: inherit` (or passing them explicitly) gets
alerts for free without per-repo setup. Reusable workflow declares
both as optional `secrets:`; if absent, the action errors quietly
inside an already-failing job -- acceptable noise.

The rescue-demos `summarize` job already aggregated matrix-cell
failures into a workflow-level failure status; the new step posts
to Matrix before failing the workflow.

## Testing

- Both YAML files parse.
- Cannot easily smoke-test without inducing a real failure; will
  observe on next genuine failure.

## Out of scope / follow-up

The caller workflows (freenet-core, freenet-stdlib) currently pass
only `FREENET_GIT_IDENTITY_BUNDLE_BASE64` and `FREENET_GIT_WS_URL`
explicitly. To opt in to Matrix alerts they need to either switch
to `secrets: inherit` (broader blast radius -- skeptical reviewer
flagged this on PR #71) or add `MATRIX_ACCESS_TOKEN` and
`MATRIX_ROOM_ID` to the explicit pass-through. Will follow up with
SHA-bump PRs to the callers after this lands.

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* review fixes: SHA-pin matrix-message, fail-on-cancel, summarize-on-notifier-fail

Addresses Codex P2 + skeptical 1, 3, 8 + big-picture minor on PR #28.

- `olabiniV2/matrix-message@v0.0.1` -> SHA pin
  `5927bc6d98755e4364377fcfe0074aacbc01d237` (with inline `# v0.0.1`
  comment) so a re-tag of v0.0.1 by an external party can't silently
  swap in a different commit with secret access. The org's existing
  `matrix_pr_merge_notify.yml` precedent uses tag-pinning; that's a
  pre-existing footgun this PR opts NOT to extend.
  (Skeptical #3, big-picture minor.)

- `if: failure()` -> `if: failure() || cancelled()` on the mirror
  notification step. A 30-minute job timeout or runner shutdown
  fires `cancelled()` rather than `failure()` and would otherwise
  silently leave the demo behind upstream with no alert.
  (Skeptical #1.)

- The summarize job's `Fail if any rescue failed` step is now
  `if: always() && needs.rescue.result != 'success'`. Without
  always(), a failure in the Matrix-notify step above would skip
  the summary error and the workflow would fail with the notifier's
  error (e.g. missing secrets) instead of the actual rescue failure.
  (Codex P2.)

- Updated the inline comment to acknowledge the
  optional-secrets-mean-extra-red-step UX (the prior wording said
  "errors quietly" which is over-confident -- skeptical #8).

- Tightened wording of the mirror failure message ("Demo URL may
  be behind upstream" rather than the snapshot-mode-specific
  "going stale").

Pushed back / out of scope:

- Skeptical #2 (caller-side gap: explicit `secrets:` map in
  freenet-core / freenet-stdlib doesn't pass the new optional
  Matrix secrets, so the alert is inert for the two production
  callers until they SHA-bump and update their pass-through):
  acknowledged; addressed by companion PRs to those repos that
  land with this one. The "finish the fix" criterion is met by
  the three PRs together, not by this PR alone.

- Skeptical #4 (`secrets.X` in `${{ }}` `with:` input): the
  matrix-message action's API requires this. Switching to a
  better-maintained action with env-var inputs is a bigger
  change worth filing separately, not in this PR.

- Skeptical #5 (push-storm dedup): real concern but requires a
  cross-run notification deduper. Today's worst case is a
  single Matrix room getting N copies of the same alert during
  a sustained gateway outage, which is annoying but actionable.
  Defer until we've seen it happen.

- Skeptical #6 (untrusted input.freenet_repo from fork PRs):
  not reachable today (callers are trusted org repos); the
  workflow header comment already documents that the reusable
  workflow trusts its inputs.

- Skeptical #7 (style: `failure()` vs `result != 'success'`):
  the two halves correctly use different idioms because the
  rescue side is a `summarize` job depending on a matrix job
  (where `result != 'success'` is the canonical aggregate
  check), while the mirror side is a step inside the same job
  (where `failure() || cancelled()` is the canonical step
  predicate).

- Skeptical #9 (snapshot/history wording): reworded to be
  mode-neutral.

- Skeptical #10 (emoji): consistent with org precedent in
  matrix_pr_merge_notify.yml; not changing.

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parallel chunk PUTs for ChunkedPack

1 participant