Skip to content

worktree: Optimize checkout performance for worktree creation and branch switching #1956

@cedric-appdirect

Description

@cedric-appdirect

Feature Description

Creating a new linked worktree (or checking out a branch) currently decompresses every file from the object store, even when most files are already materialized in an existing worktree. This is the same approach used by Git C's git worktree add (which runs git reset --hard internally), but Git C compensates with parallel checkout workers and streaming blob writes — optimizations go-git currently lacks.

For large repositories, this makes worktree creation and clone checkout significantly slower than necessary.

Background

When git worktree add -b feature HEAD is executed, the new branch points to the same commit as the source. Every file in the target tree is byte-for-byte identical to the existing worktree. Despite this, both Git C and go-git decompress every blob from the object store individually.

Git C mitigates this with:

  • Parallel checkout (parallel-checkout.c) — multiple worker processes write files concurrently.
  • Streaming blob writes (odb_stream_blob_to_fd) — blobs stream directly to disk without full memory buffering.
  • Cache-tree fast path (traverse_by_cache_tree) — skips entire subtrees when the index cache-tree matches the target tree OID.
  • CE_UPTODATE / FSMonitor — avoids redundant lstat calls for files known to be clean.

go-git's current checkout path (resetWorktreeToTree → checkoutChange → checkoutFile → copyObjectToWorktree) has none of these, plus a few additional per-file overheads:

  • r.Config() is called inside copyObjectToWorktree for every file to check AutoCRLF.
  • When AutoCRLF is enabled, each blob is decompressed twice (once for binary detection, once for writing).

Proposed Improvements

Tier 1 — Low-hanging fruit:

  • Cache r.Config() result outside the checkout file loop, pass it through as a parameter.
  • Fix AutoCRLF double-decompression: buffer the initial peek for binary detection and reuse it for writing (or use a TeeReader).

Tier 2 — Parallel file extraction:

Tier 3 — Copy from existing worktree (novel, beyond Git C):

  • When creating a worktree from the same commit, copy files directly from the source worktree instead of decompressing from the object store. The source worktree's index provides the expected OID — if stat data matches, the file is known clean and safe to copy.
  • When creating a worktree from a different commit, use diffTree(sourceTree, targetTree) to identify unchanged files (same OID). Copy those from the source worktree; decompress only changed/new files from the object store.
  • Use platform-accelerated copy where available (clonefile on macOS/APFS for CoW, copy_file_range on Linux).

Tier 4 — Index-aware optimizations:

  • Implement a cache-tree fast path: when the source index has a valid Cache tree and a subtree OID matches the target, skip traversal of that subtree entirely.
  • For branch switching with an existing index, build the new index incrementally from the old one plus the tree diff, rather than rebuilding from scratch.

Considerations

  • Tier 3 requires a billy.Filesystem extension or type assertion to access platform-specific copy syscalls. A CopyFile(src, dst string) error method on billy.Filesystem could be a clean abstraction, with a fallback to io.Copy.
  • Tier 3 introduces a dependency between the source and destination worktree during creation. If the source worktree is modified concurrently, the copy could be inconsistent. A snapshot of the source index at the start of the operation, combined with stat-validation before each copy, mitigates this (same guarantee as Git C's racy-git model).
  • Tier 2 and Tier 3 are independent and can be combined — parallel goroutines can handle both copy-from-worktree and decompress-from-ODB workloads.

References:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions