Skip to content

Should not use filter-branch --tree-filter #435

@arichardson

Description

@arichardson

I have been using git-subrepo in various projects and the performance doesn't really matter. However, I recently updated a mirror of the libc++ subdirectory of llvm-project using git-subrepo and that wanted to pushed about 10000 new commits (of which only a fraction touch the subdirectory).

git-subrepo spends many hours running:
it filter-branch -f --prune-empty --tree-filter rm -f .gitrepo db47e4cfa09972aa929ca352b8bfdc64a634d855..subrepo/libcxx

Using a tree-filter is really slow because git has to perform a checkout for every commit.
Ideally git-subrepo should be detecting the presence of the faster git-filter-repo and using that instead.

But even with filter-branch it should be possible to get much better performance by using an index-filter:

I haven't tried this yet, but looking at the examples section of the filter-branch documentation (https://git-scm.com/docs/git-filter-branch#_examples), the following might work:

git filter-branch --index-filter 'git rm --cached --ignore-unmatch .gitrepo' db47e4cfa09972aa929ca352b8bfdc64a634d855..subrepo/libcxx

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions