Rewrite history or start over with a clean repo #887

Closed
koenpunt opened this Issue May 30, 2013 · 23 comments

Projects

None yet

3 participants

@koenpunt

Maybe it is an idea to rewrite the repositories history, because ~170mb for a repo is a lot of data for a working copy of ~20mb...

I did a test with git filter-branch and that resulted in a total repo size of ~25mb.

The command I did run:

git filter-branch  --force --index-filter 'git rm --cached --ignore-unmatch \
 media/perfect-timecoded.* \
 media/AirReview-Landmarks-02-ChasingCorporate.mp3 \
 media/echo-hereweare.webm \
 media/echo-hereweare.ogv \
 media/Thumbs.db \
 media/frameaccuracy_logo.jpg \
 media/Parades-PastLives.mp3 \
 media/echo-hereweare.m4v \
 media/grampa-laughing.flv \
 media/jsaddington.jpg \
 media/jsaddington.mp4\
 media/xbox.wmv' --prune-empty --tag-name-filter cat -- --all

Only the problem with this is that it changes commit id's, which can lead to unexpected results, so probably starting with a clean repo is a better solution.

@ron666
Collaborator
ron666 commented Aug 4, 2016

@koenpunt Is this still an issue?

@koenpunt
koenpunt commented Aug 4, 2016 edited

Yes it still is: (188.28 MiB)

$ time git clone https://github.com/johndyer/mediaelement/                                                                                                                                      Cloning into 'mediaelement'...
remote: Counting objects: 12779, done.
remote: Compressing objects: 100% (239/239), done.
remote: Total 12779 (delta 107), reused 0 (delta 0), pack-reused 12528
Receiving objects: 100% (12779/12779), 188.28 MiB | 4.28 MiB/s, done.
Resolving deltas: 100% (7222/7222), done.
Checking connectivity... done.
git clone https://github.com/johndyer/mediaelement/  3.21s user 4.33s system 13% cpu 54.878 total

Cloning with a depth of, for example, 100 is much more acceptable: (18.77 MiB)

$ time git clone https://github.com/johndyer/mediaelement/ --depth 100
Cloning into 'mediaelement'...
remote: Counting objects: 2627, done.
remote: Compressing objects: 100% (1249/1249), done.
remote: Total 2627 (delta 1517), reused 2463 (delta 1376), pack-reused 0
Receiving objects: 100% (2627/2627), 18.77 MiB | 1.91 MiB/s, done.
Resolving deltas: 100% (1517/1517), done.
Checking connectivity... done.
git clone https://github.com/johndyer/mediaelement/ --depth 100   0.56s user 0.63s system 8% cpu 13.750 total

But I still think that mediafiles should not be part of the repo.

@ron666 ron666 added the Feature label Aug 10, 2016
@ron666
Collaborator
ron666 commented Sep 28, 2016

@johndyer What are your thoughts about this?

@johndyer
Owner

Yes, I do think a rewrite would be helpful. I'd hate to blow up people's
work, but it would be nice to have a much smaller list of files in there.

I'm totally fine with removing all the files I originally used and
replacing them with smaller files.

It'd be great to have an mp4, webm, and mp3 file in there that were all
over 1 minute (so the controls function properly), but under 5mb total to
keep the repo really small. Any ideas on good samples?

On Tue, Sep 27, 2016 at 11:15 PM, Rafael Miranda notifications@github.com
wrote:

@johndyer https://github.com/johndyer What are your thoughts about this?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#887 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AANE09PjpxNmBw0nNAreUZ4gtijdtcf6ks5quenRgaJpZM4AsfAc
.

John Dyer - http://j.hn/

@ron666
Collaborator
ron666 commented Sep 29, 2016 edited

I'd basically not put any media in the /media/ folder; just a README file that indicates that you need to download the media from https://github.com/johndyer/mediaelement-files to make the demos and test to work. Any thoughts on this?

@koenpunt

You could then make mediaelement-files a submodule. Although maybe that's not common practice

@ron666
Collaborator
ron666 commented Oct 2, 2016

Not common and problematic in my experience

@ron666
Collaborator
ron666 commented Oct 4, 2016

@johndyer are we doing then the rewrite for 3.0 version? And if so I'd recommend not to put any files at all to keep the repo small

@ron666 ron666 added the Completed label Nov 29, 2016
@ron666
Collaborator
ron666 commented Dec 4, 2016

@koenpunt The history has been rewritten; please test and let me know if it works for you

@koenpunt
koenpunt commented Dec 4, 2016

Where should I look? 3.x dev branch?

@ron666
Collaborator
ron666 commented Dec 4, 2016

Yes

@ron666
Collaborator
ron666 commented Dec 4, 2016

Please look at the 3.x-dev branch and let me know I used the method above and seemed faster to me but let's confirm

@ron666
Collaborator
ron666 commented Dec 7, 2016

Can this issue be closed?

@koenpunt
koenpunt commented Dec 9, 2016

The 3.x branch is far more acceptable yes:

$ git clone https://github.com/johndyer/mediaelement -b 3.x-dev --single-branch
Cloning into 'mediaelement'...
remote: Counting objects: 16211, done.
remote: Compressing objects: 100% (22/22), done.
remote: Total 16211 (delta 0), reused 18 (delta 0), pack-reused 16189
Receiving objects: 100% (16211/16211), 42.13 MiB | 3.79 MiB/s, done.
Resolving deltas: 100% (10349/10349), done.

But I think this issue should be closed 3.x becomes master.

@koenpunt
koenpunt commented Dec 9, 2016 edited

I think the safest path to take is to make 3.x the default branch, rename master to something like pre-3.x or 1.x-and-2.x, so that the commits behind the tags are still accessible, and then rename 3.x to master.

@koenpunt
koenpunt commented Dec 9, 2016

And of course add a note in the readme about that.

@ron666
Collaborator
ron666 commented Dec 9, 2016

The thing is that when you perform a rewrite you have to do it across all branches in a repo; meaning that right now master, 3.x-dev, etc, plus tags history was rewritten, because all of them had references to the removed files. So I don't know if the last part will be needed. And this is something only people that creates PR need to be aware of

@koenpunt
koenpunt commented Dec 9, 2016

Doesn't clone by default just only clone the default branch?

@koenpunt
koenpunt commented Dec 9, 2016

Or maybe I'm wrong there, as clone implies full copy.

@koenpunt
koenpunt commented Dec 9, 2016

Is your intent to update (force push) all the tags too?

@ron666
Collaborator
ron666 commented Dec 9, 2016 edited

Clone copies everything. If they get a clean clone the users who desire to contribute won't have any issues; if they had a prior version they need to do a git rebase first. And yes the tags had to be updated as well since they carried as well references to those files; I tried not to but in the end every time I pushed back the rewritten history there were references and didn't change the size of the repo. That's why even tags had to be updated

@ron666 ron666 removed the Feature label Dec 13, 2016
@ron666
Collaborator
ron666 commented Jan 16, 2017

3.0 version released. Closing issue

@ron666 ron666 closed this Jan 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment