Feature Description
Creating a new linked worktree (or checking out a branch) currently decompresses every file from the object store, even when most files are already materialized in an existing worktree. This is the same approach used by Git C's git worktree add (which runs git reset --hard internally), but Git C compensates with parallel checkout workers and streaming blob writes — optimizations go-git currently lacks.
For large repositories, this makes worktree creation and clone checkout significantly slower than necessary.
Background
When git worktree add -b feature HEAD is executed, the new branch points to the same commit as the source. Every file in the target tree is byte-for-byte identical to the existing worktree. Despite this, both Git C and go-git decompress every blob from the object store individually.
Git C mitigates this with:
- Parallel checkout (parallel-checkout.c) — multiple worker processes write files concurrently.
- Streaming blob writes (odb_stream_blob_to_fd) — blobs stream directly to disk without full memory buffering.
- Cache-tree fast path (traverse_by_cache_tree) — skips entire subtrees when the index cache-tree matches the target tree OID.
- CE_UPTODATE / FSMonitor — avoids redundant lstat calls for files known to be clean.
go-git's current checkout path (resetWorktreeToTree → checkoutChange → checkoutFile → copyObjectToWorktree) has none of these, plus a few additional per-file overheads:
- r.Config() is called inside copyObjectToWorktree for every file to check AutoCRLF.
- When AutoCRLF is enabled, each blob is decompressed twice (once for binary detection, once for writing).
Proposed Improvements
Tier 1 — Low-hanging fruit:
Tier 2 — Parallel file extraction:
Tier 3 — Copy from existing worktree (novel, beyond Git C):
Tier 4 — Index-aware optimizations:
Considerations
- Tier 3 requires a billy.Filesystem extension or type assertion to access platform-specific copy syscalls. A CopyFile(src, dst string) error method on billy.Filesystem could be a clean abstraction, with a fallback to io.Copy.
- Tier 3 introduces a dependency between the source and destination worktree during creation. If the source worktree is modified concurrently, the copy could be inconsistent. A snapshot of the source index at the start of the operation, combined with stat-validation before each copy, mitigates this (same guarantee as Git C's racy-git model).
- Tier 2 and Tier 3 are independent and can be combined — parallel goroutines can handle both copy-from-worktree and decompress-from-ODB workloads.
References:
Feature Description
Creating a new linked worktree (or checking out a branch) currently decompresses every file from the object store, even when most files are already materialized in an existing worktree. This is the same approach used by Git C's git worktree add (which runs git reset --hard internally), but Git C compensates with parallel checkout workers and streaming blob writes — optimizations go-git currently lacks.
For large repositories, this makes worktree creation and clone checkout significantly slower than necessary.
Background
When git worktree add -b feature HEAD is executed, the new branch points to the same commit as the source. Every file in the target tree is byte-for-byte identical to the existing worktree. Despite this, both Git C and go-git decompress every blob from the object store individually.
Git C mitigates this with:
go-git's current checkout path (resetWorktreeToTree → checkoutChange → checkoutFile → copyObjectToWorktree) has none of these, plus a few additional per-file overheads:
Proposed Improvements
Tier 1 — Low-hanging fruit:
Tier 2 — Parallel file extraction:
Tier 3 — Copy from existing worktree (novel, beyond Git C):
Tier 4 — Index-aware optimizations:
Considerations
References: