Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Considerations for multiple branches #49

Closed
m1nkeh opened this issue Jan 24, 2020 · 6 comments
Closed

Considerations for multiple branches #49

m1nkeh opened this issue Jan 24, 2020 · 6 comments

Comments

@m1nkeh
Copy link

m1nkeh commented Jan 24, 2020

Given the scenario where there are branches master, develop as well as a bunch of feature/cool-feature-1 remotes.. i can see that when running a command such as:

git filter-repo --path folder1/ --path folder2/ --path-rename 'folder1':'src' --path-rename 'folder2':'src'

the filtering is done across all branches, regardless of if i have actively checked them out or not.. they all end up on my machine and it works really nicely. You can then push them back to a (new) remote etc.. and carry on with your day.

Is there a performance cost to this? A repo i am working with has absolutely hundreds of remote branches that are in various states of decay, is there a way i can exclude them?

Is there a way to define which branches to consider? Either explicitly.. or maybe 'only ones that i have checked out' maybe?

@newren
Copy link
Owner

newren commented Jan 24, 2020

The performance cost of updating more branches, even hundreds more, tends to be quite small. filter-repo (via fast-export and fast-import) write a modified version of each commit and then update each branch pointer to the newly rewritten commits. The only way handling hundreds of extra branches would add significantly to the overhead is if those branches didn't share common history (or only shared a little) with the branches you care more about, and the "extraneous" branches had lots of unique commits. Typically, all branches share a bunch of common history and you are rewriting all those commits anyway, so the cost of including more branches (some of the repositories I've worked on had thousands or tens of thousands of refs) is pretty small.

If you really just don't want to see them, though, you can definitely prune them either before or after doing the history rewrite. I tend to just open up the .git/packed-refs file and start deleting lines, but that's because I'm not afraid to muck with git internal storage whose format may change in the future. You may want to instead use git update-refs for this job, perhaps with the --stdin flag so you can just give it a whole bunch of delete refs/remotes/origin/CRUFTY_BRANCH_N lines.

Hope that helps; let me know if anything isn't clear.

@newren newren closed this as completed Jan 24, 2020
@m1nkeh
Copy link
Author

m1nkeh commented Feb 3, 2020

Hmm.. didn't get any notification of this response, apologies. Make sense re: the performance aspect... explanation understood 👍🏼

Could you please elaborate on how to use the update/delete refs command? That's not something i have used previously.. all the branches i need to prune have irregular names.. i am keen to some sort of prune everything except these specific branches can the command be inverted?

While you reply.. i shall go check the doco! 🙂

@newren
Copy link
Owner

newren commented Feb 3, 2020

git update-ref -d <refname> will let you delete a ref (similar to branch -d or tag -d, but you need to specify fully qualified refnames, such as refs/heads/master instead of just master). You can get help with git update-ref --help, but a brief intro:

update-ref can also batch delete refs if you feed it a bunch of input, e.g.
printf "delete refs/heads/master\ndelete refs/heads/maint\n" | git update-ref --stdin

If you wanted to delete all but a few refs, I'd run
git show-ref | sed -e s/[0-9a-f]*/delete/ >commands-to-delete-all-refs.txt
then edit commands-to-delete-all-refs.txt and remove the lines corresponding to refs you want to keep, then run
cat commands-to-delete-all-refs.txt | git update-ref --stdin

@m1nkeh
Copy link
Author

m1nkeh commented Feb 3, 2020

Nice. Had a play with w. the update-ref cmd, seems pretty self explanatory and i think i follow what you mean with --stdin but will check out your cmd above.

I don't want to mess with my original repo, so i think i'm just gonna do the filter-repo, which brings everything down.. and then get rid of the refs/heads/..

Not sure what to do with the refs/tags, does the filter-repo ignore commits and tags for tags??

@m1nkeh
Copy link
Author

m1nkeh commented Feb 4, 2020

Righto, got this doing what i think i want it to..

  1. Clone to new location
  2. Do my filtering, which works fine, and brings down all refs/remotes/origin/ to end up in refs/heads/
  3. Lose everything in refs/heads/ other than develop, master, release
  4. Push the remaining branches to a new origin 👍🏼

Do i need to explicitly do anything w. refs/replace/ at all? I am thinking that the answer is no, as they must have been merged.. but not 100%

Also , a question still remains about the refs/tags/ though.. but will try to do some more research on that..

@newren
Copy link
Owner

newren commented Feb 4, 2020

The refs/replace/ merely provide a mapping from old commits IDs to new commit IDs, allowing you to pass old commit IDs to git commands and have git be able to bring up the new commits. It's there to help facilitate the transition, for teams that need it. If you can do a clean switchover to new IDs, then you can just drop those.

As for the tags, yeah git-filter-repo will rewrite those too just like branches. Whether you want or need the tags that are in your project, is completely up to your project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants