Skip to content

Applying bundle when creating PR fetches full history, failing on large monorepo #35459

@dsyme

Description

@dsyme

Title says it

Chat session about this at https://copilot.microsoft.com/shares/u6XR97qe9ZMMRqTpWcEjW


I have a huge github monorepo, and a local bundle for a change to that repo. I need to create a PR for that bundle, but the repo is too big to clone the full history before applying the bundle. Is it possible to create a PR etc without doing a deep clone locally



You can absolutely automate this — even when any of the three situations may occur — without ever needing a deep clone.
The key is to design a workflow that is robust to unknown bundle ancestry.

Below is the unified, automation‑friendly algorithm used by large monorepo CI systems (Google, Meta, Chromium) to apply external patches or bundles onto moving branches.


✅ One algorithm that handles ALL cases
This workflow works whether:

  • the bundle’s base commit is still in the PR branch
  • the PR branch has moved ahead
  • the bundle’s base commit is missing
  • the repo is huge
  • history depth is unknown

It never requires a full clone.


🧩 Algorithm overview

  1. Shallow or partial clone the repo
  2. Fetch the PR branch
  3. Try to fetch the bundle
  4. If the bundle fetch fails due to missing base commits → incrementally deepen history
  5. Once the bundle imports cleanly → rebase or merge
  6. Push the updated branch

This is deterministic and scriptable.


🛠️ Automation‑ready script (pseudocode)

`bash

  1. Clone minimal repo
    git clone --filter=blob:none https://github.com/ORG/REPO.git
    cd REPO

  2. Fetch the PR branch
    git fetch origin pull/$PR_NUMBER/head:pr-branch

  3. Try to fetch the bundle
    if ! git fetch /path/to/bundle.bundle ; then

    4. Missing base commit — deepen history gradually

    depth=50
    while ! git fetch /path/to/bundle.bundle ; do
    depth=$((depth * 2))
    git fetch origin --depth=$depth
    if [ $depth -gt 500000 ]; then
    echo "Bundle base commit unreachable even after deepening"
    exit 1
    fi
    done
    fi

  4. Create branch from bundle tip
    git checkout -b feature FETCH_HEAD

  5. Rebase onto updated PR branch
    git rebase pr-branch

  6. Push
    git push origin feature
    `

This handles all cases automatically.


🧠 Why this works
Git only needs:

  • the PR branch commits
  • the bundle commits
  • the base commit(s) connecting them

You never need the rest of the monorepo.

By deepening history only when needed, you avoid downloading gigabytes of irrelevant data.


🧨 Edge cases handled automatically

✔ PR branch rebased
Your script deepens history until the bundle’s base commit is found.

✔ PR branch force‑pushed
Same as above — deepen until the base commit is reachable.

✔ Bundle created from a very old commit
The loop deepens history until the commit appears.

✔ Bundle created from a commit that no longer exists
The loop eventually hits a depth limit and fails cleanly.


🧭 Optional improvements

  1. Use partial clone instead of shallow clone
    This avoids downloading blobs entirely:

bash git clone --filter=blob:none --no-checkout https://github.com/ORG/REPO.git

  1. Use git fetch --negotiation-tip to reduce history fetches
    You can tell Git exactly which commits you care about.

  2. Cache the repo between runs
    If your automation runs frequently, caching avoids repeated fetches.


🎯 If you want, I can generate a production‑ready script
I can produce:

  • a POSIX‑compliant shell script
  • a Python script using subprocess
  • a GitHub Actions workflow
  • a Jenkins pipeline step
  • a Bazel/Buildkite/TeamCity integration

Just tell me the format you want:

  • shell script
  • Python script
  • GitHub Actions workflow
  • Jenkins pipeline

This will give you a fully automated, robust solution.

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions