Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewriting repo history to use Git LFS #12

Closed
dhimmel opened this issue Nov 7, 2018 · 3 comments
Closed

Rewriting repo history to use Git LFS #12

dhimmel opened this issue Nov 7, 2018 · 3 comments

Comments

@dhimmel
Copy link
Member

dhimmel commented Nov 7, 2018

In 23f6117, we began using LFS to store large files. While this commit uses LFS, the history still contains non-LFS files. Therefore, we can use the BFG Repo Cleaner to create a history where all files use LFS. We will keep the pre-LFS history available in branches, but not master.

@dhimmel
Copy link
Member Author

dhimmel commented Nov 7, 2018

I ran the following commands:

# https://rtyley.github.io/bfg-repo-cleaner/
# https://confluence.atlassian.com/bitbucket/use-bfg-to-migrate-a-repo-to-git-lfs-834233484.html

# Outside of original repository
git clone --mirror git@github.com:hetio/hetionet.git hetionet-bfg.git
java -jar ~/Downloads/bfg-1.13.0.jar \
  --convert-to-git-lfs "*.{bz2,gz,xz,zip}" \
  hetionet-bfg.git
cd hetionet-bfg.git
git reflog expire --expire=now --all && git gc --prune=now --aggressive
git lfs fetch --all
git lfs push --all origin master

# From within original repository
git checkout -b master-bfg
git remote add bfg /home/dhimmel/Desktop/hetionet-bfg.git
git fetch bfg
git reset --hard bfg/master
git push --set-upstream origin master-bfg

The bfg command produced the following output:

Using repo : /home/dhimmel/Desktop/hetionet-bfg.git

Found 44 objects to protect
Found 9 commit-pointing refs : HEAD, refs/heads/master, refs/heads/matrix, ...

Protected commits
-----------------

These are your protected commits, and so their contents will NOT be altered:

 * commit 23f6117c (protected by 'HEAD')

Cleaning
--------

Found 70 commits
Cleaning commits:       100% (70/70)
Cleaning commits completed in 3,231 ms.

Updating 8 Refs
---------------

	Ref                    Before     After   
	------------------------------------------
	refs/heads/master    | 23f6117c | 3cf25f2c
	refs/heads/matrix    | 3a09715e | f7190594
	refs/heads/neo4j-2.3 | 7eec671b | ec10b0a4
	refs/heads/neo4j-3.0 | 7d3d257c | 95818820
	refs/heads/pre-lfs   | 23f6117c | 3cf25f2c
	refs/pull/11/head    | 3a09715e | f7190594
	refs/pull/11/merge   | caaedf26 | 5af06766
	refs/tags/v1.0.0     | 4933ca17 | 3bbc130a

Updating references:    100% (8/8)
...Ref update completed in 33 ms.

Commit Tree-Dirt History
------------------------

	Earliest                                              Latest
	|                                                          |
	DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDmmmm

	D = dirty commits (file tree fixed)
	m = modified commits (commit message or parents changed)
	. = clean commits (no changes to file tree)

	                        Before     After   
	-------------------------------------------
	First modified commit | 0009918a | 38873689
	Last dirty commit     | 9f214ab7 | ff4f2eef

Changed files
-------------

	Filename                          Before & After                          
	--------------------------------------------------------------------------
	hetionet-v1.0-edges.sif.gz      | 81f59db4 ⇒ 90dc2b88                     
	hetionet-v1.0-perm-1.db.tar.bz2 | f55f7c4d ⇒ 3459eedb                     
	hetionet-v1.0-perm-1.json.bz2   | 735791b0 ⇒ 430b6f3e                     
	hetionet-v1.0-perm-2.db.tar.bz2 | a49c15dc ⇒ 76303e92                     
	hetionet-v1.0-perm-2.json.bz2   | d92bd3d6 ⇒ 928c919d                     
	hetionet-v1.0-perm-3.db.tar.bz2 | d27169c3 ⇒ ae4716bb                     
	hetionet-v1.0-perm-3.json.bz2   | 5ecba96e ⇒ 1e420742                     
	hetionet-v1.0-perm-4.db.tar.bz2 | 3bbf56d4 ⇒ a832f8b9                     
	hetionet-v1.0-perm-4.json.bz2   | 2b5ca7b0 ⇒ cbeb6364                     
	hetionet-v1.0-perm-5.db.tar.bz2 | f0df2d09 ⇒ 381b0472                     
	hetionet-v1.0-perm-5.json.bz2   | c51dcd1e ⇒ d18d6af3                     
	hetionet-v1.0.db.tar.bz2        | 152c7796 ⇒ 76141ba3, 36ae082b ⇒ df19a5fc
	hetionet-v1.0.json.bz2          | 54177a6a ⇒ ce8ba918                     


In total, 267 object ids were changed. Full details are logged here:

	/home/dhimmel/Desktop/hetionet-bfg.git.bfg-report/2018-11-07/11-21-52

BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive

hetionet-bfg.git.bfg-report/2018-11-07/11-21-52 contains the three text files:

When running git push --set-upstream origin master-bfg

I kept getting

open /home/dhimmel/Documents/serg/rephetio/hetionet/hetnet/permuted/neo4j/hetionet-v1.0-perm-1.db.tar.bz2: no such file or directory
error: failed to push some refs to 'git@github.com:hetio/hetionet.git'

I added --no-verify, which made the upload work. Hopefully, this isn't too dangerous!.

So this repo now has a master-bfg branch, which I will switch to master.

@dhimmel
Copy link
Member Author

dhimmel commented Nov 7, 2018

The master branch prior to the BFG rewrite is available at https://github.com/hetio/hetionet/tree/pre-lfs

@dhimmel
Copy link
Member Author

dhimmel commented Nov 7, 2018

I have decided to undo this, but keep the BFG rewrite around in a bfg-lfs-rewrite branch. I decided it was too risky for too little benefit. The repo size is not too big at the moment. Specifically, I was having issues migrating #11 to be based on the rewritten master. I tried cherry-picking, checking out individual files, and rebasing... none which worked. I also got errors during the rebase:

Encountered 5 file(s) that should have been pointers, but weren't:
	hetnet/permuted/neo4j/hetionet-v1.0-perm-1.db.tar.bz2
	hetnet/permuted/neo4j/hetionet-v1.0-perm-2.db.tar.bz2
	hetnet/permuted/neo4j/hetionet-v1.0-perm-3.db.tar.bz2
	hetnet/permuted/neo4j/hetionet-v1.0-perm-4.db.tar.bz2
	hetnet/permuted/neo4j/hetionet-v1.0-perm-5.db.tar.bz2

So I'm going back!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant