Skip to content

Relay based sync#1

Merged
Soph merged 32 commits intomainfrom
soph/add-bootstrap
Apr 11, 2026
Merged

Relay based sync#1
Soph merged 32 commits intomainfrom
soph/add-bootstrap

Conversation

@Soph
Copy link
Copy Markdown
Contributor

@Soph Soph commented Apr 9, 2026

Summary

This branch adds a new relay-based sync path for initial and selected incremental remote-to-remote Git syncs, plus the harness and observability needed to validate it.

The main changes are:

  • new bootstrap command for create-only initial seeding into empty targets
  • automatic bootstrap selection from sync when the managed target refs are absent
  • a narrow incremental relay path in sync for relay-safe non-empty target updates
  • optional memory/elapsed-time measurement output
  • stronger local integration coverage using git-http-backend
  • clearer relay diagnostics in JSON and text output

What Changed

bootstrap

  • Added git-sync bootstrap <source> <target> as a separate create-only relay path.
  • It streams the fetched source pack directly into target receive-pack instead of decoding objects into the local object store first.
  • Supports:
    • branches by default
    • --map
    • --tags
    • --stats
    • --json
    • --protocol auto|v1|v2
    • --max-pack-bytes
  • Fails when managed target refs already exist.
  • Explicitly does not support --force, --prune, or dry-run.

Automatic empty-target relay

  • sync now auto-selects the bootstrap relay path when all managed target refs are absent and the run matches bootstrap semantics.
  • plan does not execute relay, but it now surfaces when bootstrap is the appropriate path.

Incremental relay

  • Added a narrow incremental relay mode inside sync for relay-safe non-empty target updates.
  • Current supported relay-safe cases:
    • fast-forward branch updates
    • multi-branch fast-forward batches
    • branch-to-branch mappings
    • create-only tags
  • Current fallback-to-normal-sync cases:
    • diverged targets
    • --force
    • --prune
    • tag retargets
    • deletes
    • more complex mixed updates

Measurement and diagnostics

  • Added --measure-memory to sync, plan, bootstrap, probe, and fetch.
  • Output now includes:
    • elapsed_millis
    • peak_alloc_bytes
    • peak_heap_inuse_bytes
    • total_alloc_bytes
    • gc_count
  • plan/sync output now also includes:
    • relay
    • relay_mode
    • relay_reason

Auth and testability

  • Added fallback auth via local git credential fill when explicit flags/env vars are not provided.
  • This makes local testing against writable dummy repos much easier.

Integration coverage

  • Extended in-process integration coverage for:
    • bootstrap empty-target success
    • bootstrap failure when target refs already exist
    • bootstrap mappings
    • bootstrap tags
    • bootstrap pack-size threshold failures
  • Added optional git-http-backend end-to-end tests for:
    • bootstrap relay
    • incremental fast-forward relay
    • multi-branch fast-forward relay
    • mapped-branch fast-forward relay
    • tag creation relay
    • diverged target blocking
  • Added an opt-in live read-only smoke test against torvalds/linux for large-source fetch/measurement checks.

Why

The original sync path works well for general incremental reconciliation, but it is a poor fit for large initial syncs because it materializes fetched objects locally before pushing them onward.

This branch introduces relay-based paths to reduce local memory and CPU cost in the cases where that is safest:

  • initial empty-target bootstrap
  • selected fast-forward incremental updates

It also adds the measurement and test harness needed to validate where relay makes sense and where normal decode/repack behavior should remain the default.

Safety / Scope

Relay is intentionally conservative.

  • Empty target: relay is preferred and auto-selected by sync.
  • Non-empty target: relay is only used when the update set is narrow and relay-safe.
  • Anything outside that set falls back to the existing local decode-and-repack path.
  • Diverged targets still block.

How To Try It

Initial seeding:

go run ./cmd/git-sync bootstrap --stats <source-url> <target-url>

Normal sync with automatic empty-target relay:

go run ./cmd/git-sync sync --stats <source-url> <target-url>

Compare memory behavior:

go run ./cmd/git-sync bootstrap --measure-memory --json <source-url> <target-url>
go run ./cmd/git-sync sync --measure-memory --json <source-url> <target-url>

Optional local backend tests:

env GOCACHE=/tmp/go-build go test ./...
env GOCACHE=/tmp/go-build GITSYNC_E2E_GIT_HTTP_BACKEND=1 go test ./internal/syncer -run TestRun_GitHTTPBackendSync -v

Validation

  • env GOCACHE=/tmp/go-build go test ./...
  • Optional git-http-backend tests passed for bootstrap, incremental relay expansions, and diverged-target blocking

Follow-Up

Not included in this branch:

  • tag retarget relay
  • delete/prune relay
  • broader mixed-mode relay planning

Those are possible next steps, but they are meaningfully riskier than the relay-safe scope added here.


Note

Medium Risk
Adds new pack-relay execution paths (including a new bootstrap command and automatic relay selection in sync) that change how objects are fetched/pushed and could surface edge-case protocol/capability issues across remotes. Risk is mitigated by conservative gating (no force/prune, fast-forward/create-only constraints) and expanded integration/e2e coverage.

Overview
Adds a new relay-based sync mode to avoid local decode/repack when it’s safe. Introduces git-sync bootstrap for create-only seeding of empty targets by streaming the source pack directly into target receive-pack, including a --max-pack-bytes safety limit.

Enhances sync/plan behavior and diagnostics: sync auto-selects the bootstrap relay path when all managed target refs are absent, and adds a narrow incremental relay path for relay-safe fast-forward branch updates and tag creation (with fallback to the existing path otherwise). JSON/text output now includes relay, relay_mode, relay_reason, and bootstrap_suggested to explain decisions.

Adds observability and tests: introduces --measure-memory across commands with measurement fields in outputs, adds a live read-only Linux fetch smoke test, and significantly expands integration and git-http-backend end-to-end tests to cover bootstrap, incremental relay scenarios, and diverged-target blocking.

Reviewed by Cursor Bugbot for commit d971ec0. Configure here.

Soph added 13 commits April 9, 2026 12:52
Entire-Checkpoint: 2dcf2f1a3166
Entire-Checkpoint: 0bdb781a9478
Entire-Checkpoint: f09ed59d22eb
Entire-Checkpoint: 68e3d42cfb81
Entire-Checkpoint: 72f07f367d81
Entire-Checkpoint: 4805c08b89a8
Entire-Checkpoint: 8620d1269809
Entire-Checkpoint: 4a7575f2583f
Entire-Checkpoint: 894a895034bb
Entire-Checkpoint: 03bc9b185ab8
Entire-Checkpoint: 78bcff081d69
Entire-Checkpoint: 90fce803a9ea
Entire-Checkpoint: 896f021c4767
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 5 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit d971ec0. Configure here.

Comment thread internal/syncer/syncer.go
Protocol: sourceService.protocol,
}, nil
}
return bootstrapWithInputs(ctx, cfg, stats, sourceConn, targetConn, sourceService, targetAdv, desiredRefs, targetRefMap, reason)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-bootstrap path missing measurement, leaking goroutine

High Severity

When the Run function takes the non-dry-run bootstrap relay path, it returns early via bootstrapWithInputs without calling the measurementDone() closure. This prevents the memory measurement goroutine from stopping, causing a leak, and results in a zero-valued Measurement in the returned Result.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit d971ec0. Configure here.

Comment thread internal/syncer/syncer.go
RelayMode: "",
RelayReason: "",
Stats: stats.snapshot(),
Measurement: measurementDone(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Measurement frozen before push operations complete

Medium Severity

The Run function's memory measurement is finalized prematurely. measurementDone() is called when initializing the Result struct, before any push or incremental relay operations. Since startMeasurement uses sync.Once, this early invocation freezes the measurement, causing it to miss the elapsed time, peak memory, and GC activity from the actual push phase.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit d971ec0. Configure here.

Comment thread README.md

There is a planned `bootstrap` command path for large initial syncs into an empty target. The intent is to relay a fetched source pack directly into target `receive-pack` instead of decoding the full object graph into local memory first.

The design note is in [docs/bootstrap.md](/Users/soph/Work/entire/devenv/git-sync/docs/bootstrap.md).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local filesystem path accidentally committed in README

Medium Severity

The README.md includes a link to docs/bootstrap.md that uses an absolute local filesystem path. This makes the link broken for other users and exposes a local machine path.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit d971ec0. Configure here.

Comment thread internal/syncer/syncer.go
} else {
return reason
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant if/else always returns same value

Low Severity

The relayFallbackReason function's if/else statement always returns the same reason value. This makes the ok check from canIncrementalRelay redundant and suggests the conditional was intended for distinct logic.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit d971ec0. Configure here.

Comment thread internal/syncer/syncer.go
return result, fmt.Errorf("fetch source pack: %w", err)
}
defer packReader.Close()
packReader = limitPackReadCloser(packReader, cfg.MaxPackBytes)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pack reader double-closed after successful push

Medium Severity

In both bootstrapWithInputs and the incremental relay path in Run(), defer packReader.Close() is registered, and then packReader is reassigned to a limitPackReadCloser wrapper. On success, receivePackStream calls pack.Close() on the wrapper. Then the deferred close fires on the same wrapper (Go defers capture the variable, not the value), causing a double close of the underlying reader/session. For the V1 path's sessionReadCloser, this can double-close the upload-pack session.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit d971ec0. Configure here.

Soph added 16 commits April 9, 2026 17:04
Entire-Checkpoint: 9425efd8edfc
Entire-Checkpoint: d005afe6724c
Entire-Checkpoint: d50053e4c868
Entire-Checkpoint: a2d74df554b4
Entire-Checkpoint: aebdecbeb417
Entire-Checkpoint: 5cc48c743bfd
Entire-Checkpoint: a859cf299e7d
Entire-Checkpoint: 4adfd6c1e723
Entire-Checkpoint: ffd71d2ccb2a
Entire-Checkpoint: 4733a258b7e6
Entire-Checkpoint: 6ef2787ebf99
Entire-Checkpoint: 21576324299f
Entire-Checkpoint: 6afc8032e5a1
Entire-Checkpoint: 0670ea271363
Entire-Checkpoint: 121669700ffc
Entire-Checkpoint: 2fd4174c6988
Soph added 3 commits April 10, 2026 09:45
Entire-Checkpoint: 67706ea6e332
Entire-Checkpoint: 3ad59c31a883
Entire-Checkpoint: 33dc63f83cff
@Soph Soph merged commit 1519f32 into main Apr 11, 2026
@Soph Soph deleted the soph/add-bootstrap branch April 11, 2026 07:21
Soph added a commit that referenced this pull request Apr 12, 2026
Break the monolithic syncer.go (3143 lines) into 7 focused packages:

- internal/gitproto: pkt-line, smart HTTP, capability negotiation, v1/v2 fetch/push
- internal/planner: mapping validation, planning, relay eligibility, checkpoints
- internal/auth: credential resolution, Entire DB tokens, git credential helper
- internal/strategy/bootstrap: one-shot + batched bootstrap, GitHub preflight
- internal/strategy/incremental: incremental relay execution
- internal/strategy/materialized: materialized fallback push with size guard
- internal/syncer: slim orchestrator (734 lines), stats, measurement

Addresses all 22 issues from docs/rewrite-issue-list.md:

Correctness: tag ref creation independent of pack (#1), duplicate target
mapping rejection (#2), cross-kind mapping rejection (#3), sideband-64k
preference (#4), pack reader close discipline (#5), include-tag capability
gating (#6), OAuth refresh error propagation (#7).

Concurrency: mutex-protected stats (#8), bounded response reads (#9),
flock-based file token store locking (#10).

Architecture: package decomposition (#11), shared session setup (#12),
explicit Params structs (#13).

Performance: commit-count batch sizing heuristic (#14), materialized
object count guard (#15), bounded ancestry checks with ErrAncestryDepthExceeded
(#16), reusable pkt-line buffer (#17).

Testing: 73 test functions, 7 benchmarks, coverage 41-58% on core packages.
Protocol malformed-input tests (#18-20), behavioral edge cases (#21),
benchmarks for planning/protocol hot paths (#22).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 27633b8ca595
Soph added a commit that referenced this pull request Apr 29, 2026
Closes the credential-leak advisory tracked as CVE-2026-41506
(GHSA-3xc5-wrhm-f963 / Dependabot alert #1). Our smart-HTTP path
already used Go's stdlib http.Client directly, which strips the
Authorization header on cross-host redirects since 1.8 — but
upgrading clears the alert and pulls in the upstream
http.followRedirects controls.

Alpha.2 is a major rewrite of plumbing/transport. Translation:

- *transport.Endpoint (struct) → *url.URL throughout. Field
  accesses (.Scheme, .Host, .Path, .User, .Hostname()) are
  unchanged.
- transport.NewEndpoint → transport.ParseURL.
- transport.AuthMethod (interface) is gone. Defined our own
  auth.Method and gitproto.AuthMethod with a single
  Authorizer(*http.Request) error method, satisfied by
  *transporthttp.BasicAuth and *transporthttp.TokenAuth (whose
  SetAuth methods were renamed to Authorizer).
- transport.Service (typed) → string constants. Function
  parameters take string.
- transporthttp.NewTransport(*TransportOptions) →
  NewTransport(Options) (value, not pointer).
- transport.AdvertiseReferences → transport.AdvertiseRefs.
- transport.UploadPackOptions → transport.UploadPackRequest;
  transport.ReceivePackOptions → transport.ReceivePackRequest.
- transport.Register / transport.Get were removed. The TestMain
  shims in syncer/integration_test.go and cmd/git-sync/main_test.go
  registered a custom HTTP transport for go-git's transport
  registry, but our code never goes through that registry — it
  hits the network through gitproto's own http.Client. Dropped
  both shims as dead code.

Also dropped the now-unused Conn.Transport field; nothing in
git-sync read it.

Updated .golangci.yaml ireturn allowlist to permit the new
auth.Method interface where the previous transport.AuthMethod
allowance lived.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: b0d39f2320f3
Soph added a commit that referenced this pull request Apr 29, 2026
Closes the credential-leak advisory tracked as CVE-2026-41506
(GHSA-3xc5-wrhm-f963 / Dependabot alert #1). Our smart-HTTP path
already used Go's stdlib http.Client directly, which strips the
Authorization header on cross-host redirects since 1.8 — but
upgrading clears the alert and pulls in the upstream
http.followRedirects controls.

Alpha.2 is a major rewrite of plumbing/transport. Translation:

- *transport.Endpoint (struct) → *url.URL throughout. Field
  accesses (.Scheme, .Host, .Path, .User, .Hostname()) are
  unchanged.
- transport.NewEndpoint → transport.ParseURL.
- transport.AuthMethod (interface) is gone. Defined our own
  auth.Method and gitproto.AuthMethod with a single
  Authorizer(*http.Request) error method, satisfied by
  *transporthttp.BasicAuth and *transporthttp.TokenAuth (whose
  SetAuth methods were renamed to Authorizer).
- transport.Service (typed) → string constants. Function
  parameters take string.
- transporthttp.NewTransport(*TransportOptions) →
  NewTransport(Options) (value, not pointer).
- transport.AdvertiseReferences → transport.AdvertiseRefs.
- transport.UploadPackOptions → transport.UploadPackRequest;
  transport.ReceivePackOptions → transport.ReceivePackRequest.
- transport.Register / transport.Get were removed. The TestMain
  shims in syncer/integration_test.go and cmd/git-sync/main_test.go
  registered a custom HTTP transport for go-git's transport
  registry, but our code never goes through that registry — it
  hits the network through gitproto's own http.Client. Dropped
  both shims as dead code.

Also dropped the now-unused Conn.Transport field; nothing in
git-sync read it.

Updated .golangci.yaml ireturn allowlist to permit the new
auth.Method interface where the previous transport.AuthMethod
allowance lived.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: b0d39f2320f3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant