Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lost commit history during move to new repo #48

Closed
paulczar opened this issue Sep 20, 2018 · 14 comments
Closed

lost commit history during move to new repo #48

paulczar opened this issue Sep 20, 2018 · 14 comments
Assignees

Comments

@paulczar
Copy link
Contributor

Hi,

I just found the new repo for harbor's helm repo and noticed the commit history has all been squashed out. It would be really nice to have used git filter-branch --prune-empty --subdirectory-filter FOLDER-NAME BRANCH-NAME to split it out while keeping the commit history.

Also I believe this also makes the DCO for the original commit somewhat contentious as it does not list all contributors of code to the commit.

There is probably not much that can be done, its possible we could go back to the original repo and git filter-branch it and then re-apply the commits in here. I see there's a [incomplete] list of contributors in a contributors file so that at least helps acknowledge the hard work done by the various contributors.

@scottrigby
Copy link

I'm a strong advocate of retaining git history when moving or splitting repos. I had started on a tool to automatically do this for the Helm's upcoming Distributed Search Hub, but haven't completed that as a full project yet. It likely wouldn't help this project anyway, since it would take some manual work to get it back to the current state but with full history at the start.

It may be slightly tricky to splice history in at the first or second commit, but you should be able to do that, resolve conflicts, then replay the newer commits on top. If you want to create a test branch and want another set of eyes on it let me know.

I'm not sure about the DCO question on original commits, but that may be something worth looking into.

@reasonerjt
Copy link
Contributor

@paulczar
Thanks for pointing it out.
Sorry we didn't consider that while splitting the repo.

We'll make update to contributors list to acknowledge the contribution by community, and investigate how we can restore the commit histories later.

@paulczar
Copy link
Contributor Author

Wicked! thanks for the response @reasonerjt !

@scottrigby
Copy link

OK I spliced it back in. I can't open a PR because I changed the initial commit (added history from before chart was deleted from https://github.com/goharbor/harbor), then replayed all of this repo's commits on top. But you can - if you want to - take the work I did in my branch (https://github.com/scottrigby/harbor-helm/tree/recover-history), and force push it to this repo.

You can also redo this yourself (the only file that requires a manual resolution is .gitignore).

My steps

  1. Check out new repo:

    WORKSPACE=~/development/github.com/
    DIR=goharbor/harbor-helm
    URL=git@github.com:goharbor/harbor-helm.git
    cd $WORKSPACE && mkdir -p $DIR && git clone -o new  $URL $DIR && cd $DIR
    
  2. Add old repo as remote:

    URL=git@github.com:goharbor/harbor.git
    git remote add old  $URL
    
  3. Find the commit just before the helm chart was deleted:

    git fetch old
    LAST=$(git log -1 --pretty='%P' old/master -- contrib/helm/harbor/Chart.yaml)
    

    In this case, -1 gets the delete commit:

    commit 05739bd4e56ca1d7cb09a3f68889a96bef115605
    Author: Wenkai Yin <yinw@vmware.com>
    Date:   Wed Aug 22 14:08:22 2018 +0800
    
        Remove the Harbor chart directory
        
        As we moving the Harbor chart into a separate repository, this commit removes the chart directory from goharbor/harbor repository.
        
        Signed-off-by: Wenkai Yin <yinw@vmware.com>
    

    Then format '%P' gets the parent of that commit, just before deletion.

  4. Check out at the commit before chart deletion:

    git reset --hard $LAST
    
  5. Filter branch just to helm chart:

    git filter-branch --prune-empty --subdirectory-filter contrib/helm/harbor/ master
    

    (I suppose we could have done this earlier).

  6. Preview the list of commits to play on top of current position (excluding merge commits):

    git log new/master --no-merges --oneline
    
  7. Now replay the commits from bottom to top:

    commits=($(git log new/master --pretty=format:%h --no-merges))
    for i in "${commits[@]}"; do git cherry-pick $i; done
    

    Manually fix as needed, git add, then git cherry-pick --continue

@paulczar
Copy link
Contributor Author

Woah @scottrigby that's some sweet git magic! thanks so much for doing that.

@reasonerjt
Copy link
Contributor

Thanks a lot
@scottrigby
We'll try it later in October as most of us are out of office for public holiday.

@reasonerjt reasonerjt assigned reasonerjt and unassigned ywk253100 Oct 23, 2018
@reasonerjt reasonerjt added this to the Sprint 46 milestone Oct 23, 2018
@reasonerjt reasonerjt modified the milestones: Sprint 46, Sprint 47 Nov 6, 2018
@paulczar
Copy link
Contributor Author

paulczar commented Feb 7, 2019

I feel like the longer this is left as is the harder it will be to rectify. It would be a shame to not acknowledge the work of the original contributors and might prevent this helm chart from achieving CNCF conformance.

@reasonerjt
Copy link
Contributor

@paulczar actually we tried couple of times to restore the commit history but failed at various errors. How about we list the lost commit history in a section in readme.md and acknowledge the great work of the contributors?

@scottrigby
Copy link

This can be kind of a pain (but only the first time, after that it's back to normal), but I'm happy to help with this again, just let me know when you want to and we can get it done.

@reasonerjt reasonerjt removed this from the Sprint 47 milestone Feb 21, 2019
@scottrigby
Copy link

Update: we're following up by email

@scottrigby
Copy link

scottrigby commented Mar 7, 2019

OK, I just updated https://github.com/scottrigby/harbor-helm/tree/recover-history. It should be ready to go.

Before you force-push to master, please follow these steps (paraphrased from my email):

  • Temporary merge freeze so you don't accidentally miss new commits to master during the process (@reasonerjt & I already arranged for me to update this Tonight)
  • Create a backup branch from master during merge-freeze, as a best practice
  • Force-push my recover-history branch to master
  • For sanity, do a recursive file diff between the backup branch files and the new force-pushed files. Also do any evaluation needed of the force-pushed changes (so long as the file diff is what you expect, this should not be necessary as the goal is just to recover git history, not change the files in this chart. The only exception is one file was missing from the old chart, I assume by accident. I explain in detail below. You will just want to check that adding this back is what you want)
  • Once your evaluation is over – if you decide to move ahead with this, there are a few things you need to do
    • If you decide to keep the new file, you will want to version bump the chart (as Helm chart versions should be immutable. Helm itself doesn't care about git history only that files have not changed between versions)
    • Any open PRs will need to be remade (it's pretty easy, just a cherry-pick of the PRs commits back onto the newly force-pushed master. It will loose commit/comment order, but that's always the price for force-pushing PR branches)
  • If after evaluation you decide to not move ahead with this, just restore the backup branch to master and we can discuss further

Here were the exact steps in creating my recover-history branch, if you want to reproduce this locally to compare results:

  • Same steps as 1-5 from lost commit history during move to new repo #48 (comment)
  • For step 6 (preview the list of replay commits, excluding merge commits), I didn't document this correctly. They should be in reverse order. I added tac:
    $ git log new/master --no-merges --oneline | tac
    c92b8028 Initial commit
    13e165ef Migrate Harbor chart to the new repository
    9ea6569b Remove the "vmware" from files
    dcab613a Using Deployment rather than StatefulSet for adminserver, registry and chartmuseum
    ...
  • Step 7 needed to change accordingly. However – I may be a little tired or lazy now but – adding tac to this command oddly concatenates the first two SHAs into one line:
    $ git log new/master --pretty=format:%h --no-merges | tac
    c92b802813e165ef
    9ea6569b
    dcab613a
    ...
    Perhaps someone who wants to spend a little time on this can figure out why. For now, it's not necessary because I just worked around that by cherry picking the first two SHAs manually (Note I add theirs merge strategy option because it appears some files were not copied over initially – presumably by accident – that were later added with slight changes in subsequent commits):
    $ git cherry-pick c92b8028  --strategy-option theirs
    $ git cherry-pick 13e165ef --strategy-option theirs
    Then I scripted the rest similar to the above, by popping off the first line containing these two manually cherry-picked commits, then looping (again, adding theirs merge strategy):
    $ commits=($(git log new/master --pretty=format:%h --no-merges | tac | tail -n +2))
    $ for i in "${commits[@]}"; do git cherry-pick $i --strategy-option theirs; done
  • Now let's compare the final files. Because the commit histories now differ entirely according to git, we can't use git diff, so let's simply use a recursive diff against a clean checkout of master, excluding the .git directories:
    $ cd ../
    $ git clone git@github.com:goharbor/harbor-helm.git harbor-helm-new
    $ diff -r --exclude=".git" harbor-helm/ harbor-helm-new/
    Only in harbor-helm/: requirements.lock
    It appears the old requirements.lock file was never copied over to the new repo (presumably by accident). We now have that restored. If this isn't what you want (if it was intentionally omitted), you should make a new commit on top of my branch to delete it.

@ywk253100
Copy link
Collaborator

@paulczar The lost commits are recovered.
@scottrigby Thanks again for the detailed steps.

@paulczar
Copy link
Contributor Author

Amazing job folks! thanks so much everyone, especially @scottrigby !

@frecks
Copy link

frecks commented Apr 26, 2022

@scottrigby Thanks for documenting the steps you followed to recover history. It serves as a useful example.

  • Step 7 needed to change accordingly. However – I may be a little tired or lazy now but – adding tac to this command oddly concatenates the first two SHAs into one line:
    $ git log new/master --pretty=format:%h --no-merges | tac
    c92b802813e165ef
    9ea6569b
    dcab613a
    ...
    

For anyone who may be curious: The command you want is git log new/master --pretty=tformat:%h --no-merges | tac. As described at the very bottom of pretty-formats, tformat works exactly like format except it provides "terminator" semantics instead of "separator" semantics. The tac command handles the input properly when it comes with terminator characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants