experiment: parallel pre-built head-blobs pack download (Hybrid mode)#2
Merged
Merged
Conversation
… extraction - Add StreamingBlobPackBuilder with a thread pool of compressors and a single sequential writer so zlib compression parallelizes without producing an invalid pack file. - Increase default blob-pack channel depth from 64 to 2048 and add RIPCLONE_BLOB_PACK_CHANNEL_DEPTH / RIPCLONE_BLOB_PACK_THREADS knobs. - Add RIPCLONE_BLOB_PACK_COMPRESSION_LEVEL (default 3) and switch flate2 to the zlib-ng backend. - Pass single-fragment blobs to the pack builder as zero-copy slices into the shared decompressed frame via BlobPackInput::FrameSlice; move assembled multi-fragment blobs after writing to avoid an extra clone. - Add RIPCLONE_SKIP_SHA1_VERIFY opt-out for trusted server output. - Replace the ripclone-proxy global Mutex<TokenBucket> with a lock-free atomic token bucket so concurrent streams do not serialize on the bandwidth shaper. - Add scripts/benchmark_matrix.sh, scripts/profile_one.sh, and scripts/verify_full_clone.sh for matrix/profile/correctness testing.
Resurrects the parallel head-blobs pack download for Hybrid mode while keeping Full as the local blob-pack build path. - mode.rs: Hybrid now means archive extraction + pre-built head-blobs pack download; Full keeps the streaming local pack builder. - client.rs: adds install_prebuilt_blob_pack helper and spawns it in parallel with archive downloads/workers for Hybrid mode. - scripts/benchmark_matrix.sh and scripts/verify_full_clone.sh now accept a MODE environment variable. Benchmarks show Hybrid is faster on high-bandwidth/low-latency links (~3.4 s at 1 Gbps/50 ms/8 cores vs ~4.7 s for Full) but slower on constrained links because it downloads an extra ~66 MB pack.
- Rename RIPCLONE_SKIP_SHA1_VERIFY to RIPCLONE_UNSAFE_SKIP_SHA1_VERIFY and document risk. - Add blob_sha1_to_array helper in extract.rs; replace .expect/.ok() with proper Result propagation. - Fix blob_pack builder mutex poisoning and remove unnecessary fsync in finalize. - Stream prebuilt head-blobs pack download with bounded concurrency instead of flattening into Vec. - Replace split-atomic token bucket with a mutex-protected state to avoid token-accounting races. - Make zlib-ng an optional default feature for constrained build environments. - Update README clone-mode descriptions and add build-options / env-var docs.
- Stream prebuilt head-blobs pack into a NamedTempFile to avoid leaking temp files on error and avoid needing a manual rename. - Remove unused sha1/File imports introduced in the previous fix.
russellromney
added a commit
that referenced
this pull request
Jun 20, 2026
- Validate manifest blob_sha1 length early in extract.rs - Remove unused git_blob_object_data helper/test - Add malformed blob_sha1 test - Expand verify_full_clone.sh checks
russellromney
added a commit
that referenced
this pull request
Jun 20, 2026
russellromney
added a commit
that referenced
this pull request
Jun 20, 2026
…eline - Merge main (including optional target_dir refactor) into prototype/rcgit. - In lazy_clone, run extract_archive_with_chunk_fetcher with no target_dir so the same PR #2 pipeline builds a HEAD-blobs pack from downloaded archive chunks. - Delegate rcgit show to real git now that the pack contains all blobs.
russellromney
added a commit
that referenced
this pull request
Jun 20, 2026
…k builder - lazy_clone now prefers the server pre-built HEAD-blobs pack when available, reusing Client::install_prebuilt_blob_pack (the same streaming, SHA1-named installer used by Full/Hybrid mode on main). - Falls back to downloading archive chunks and building a local blob pack via extract_archive_with_chunk_fetcher (target_dir=None), the same core pipeline PR #2 introduced. - Make install_prebuilt_blob_pack pub and dedupe the legacy direct-install pack-writing code in client.rs.
russellromney
added a commit
that referenced
this pull request
Jun 21, 2026
…2e matrix #1 Incremental history reuse from Tigris + compaction - Two-phase phase 2 now builds only the tail since the last sealed level and references prior levels by hash from object storage (never rebuilt); idx are assembled from local-or-Tigris. Steady-state re-sync ~ O(new commits). - Seal every advancing tail (RIPCLONE_LSM_SEAL_BYTES default 1) so reuse kicks in for normal repos; size-tiered compaction (RIPCLONE_LSM_MAX_LEVELS default 16) bounds the level/pack count. Shared seal_and_compact across single/two-phase. - LSM is now the default path (RIPCLONE_LSM=0 to disable). #2 Move the zstd archive off the depth=1 critical path - Phase 1 builds only the cheap files table (build_files_table: paths/modes/ blob-sha1, no frames) that editable depth=1 needs; the full zstd archive is built in phase 2 and attached to the full variant. Removes archive time from time-to-depth=1. #3 Reachability bitmaps - write_bitmap (git multi-pack-index write --bitmap), written once before the heavy full enumerations (phase 2 / single-phase, never the depth=1 path). Full rev-list enumerations pass --use-bitmap-index (safe no-op without one). Robustness - Lower /sync hold to 25s (RIPCLONE_SYNC_WAIT_SECS) so it returns 202 before edge/proxy request timeouts reset the connection. Tests - Comprehensive e2e matrix: a shared lifecycle battery (first sync, re-sync, multi-commit growth x depth=1/depth=0/files, fsck + usability) run across all six server build-path configs (single/two-phase x LSM on/off x async). - e2e_compaction; two-phase files-mode test; tracing subscriber + richer poll-timeout diagnostics in the harness. 65 tests pass, clippy/fmt clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resurrects the parallel head-blobs pack download for
Hybridmode while keepingFullas the local blob-pack build path.mode.rs:Hybridnow means archive extraction + pre-built head-blobs pack download;Fullkeeps the streaming local pack builder.client.rs: addsinstall_prebuilt_blob_packhelper and spawns it in parallel with archive downloads/workers for Hybrid mode.scripts/benchmark_matrix.shandscripts/verify_full_clone.shnow accept aMODEenvironment variable.Results on
oven-sh/bun: