Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enormous repo size (~400mb) #3

Closed
metasyn opened this issue Sep 27, 2018 · 6 comments
Closed

enormous repo size (~400mb) #3

metasyn opened this issue Sep 27, 2018 · 6 comments
Labels
enhancement New feature or request

Comments

@metasyn
Copy link

metasyn commented Sep 27, 2018

Hey - interesting project. Just git cloned it to check out some things and realized it was taking a lot longer than expected, especially given I had already seen the number of files in the github UI. So after cloning:

du -h .

gives

 48K	./.git/hooks
8.0K	./.git/info
  0B	./.git/logs/refs/heads
  0B	./.git/logs/refs/remotes/origin
  0B	./.git/logs/refs/remotes
  0B	./.git/logs/refs
  0B	./.git/logs
4.0K	./.git/objects/info
339M	./.git/objects/pack
339M	./.git/objects
  0B	./.git/refs/heads
4.0K	./.git/refs/remotes/origin
4.0K	./.git/refs/remotes
  0B	./.git/refs/tags
4.0K	./.git/refs
340M	./.git
 20K	./.idea
 36K	./animl/viz
 48K	./animl
5.5M	./notebooks
8.0K	./testing/bin
188K	./testing/data
 64M	./testing/samples
 66M	./testing
412M	.

A lot of it looks like duplicates of these images in the history:

git rev-list --objects --all \
| git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \
| sed -n 's/^blob //p' \
| sort --numeric-sort --key=2 \
| cut -c 1-12,41- \
| gnumfmt --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
| tail 20

(from https://stackoverflow.com/a/42544963)
gives

387fa734f114  4.9MiB testing/samples/fires-TD-4.svg
1eb65f5cc60b  4.9MiB testing/samples/fires-TD-4.svg
4b5db9e74df0  4.9MiB testing/samples/fires-TD-4.svg
d89306ae2202  5.5MiB notebooks/examples.ipynb
a114a526f20e  5.6MiB testing/playground.ipynb
1689888cef83  6.9MiB testing/samples/sweets-TD-2.svg
54a0623f219b  6.9MiB testing/samples/sweets-LR-2-X.svg
72b5602a4d49  7.0MiB testing/samples/sweets-TD-2.svg
4fbf5680d31f  7.0MiB testing/samples/sweets-LR-2-X.svg
32367bcf5ae5  7.1MiB testing/samples/sweets-TD-2.svg
74a48fb74d54  7.2MiB testing/samples/sweets-LR-2-X.svg
eb1fd3844434  9.3MiB testing/samples/sweets-LR-3.svg
12c0992eac36  9.4MiB testing/samples/sweets-TD-3-X.svg
090c453c6736  9.5MiB testing/samples/sweets-TD-3-X.svg
cef390169f37  9.5MiB testing/samples/sweets-LR-3.svg
ba34b922a1c3  9.6MiB testing/samples/sweets-TD-3-X.svg
0aeb5d2e2c35  9.9MiB testing/samples/sweets-LR-3.svg
c0e22ce1ad0b   12MiB testing/samples/sweets-TD-4.svg
92e68cee1463   12MiB testing/samples/sweets-TD-4.svg
5f83e5ce4ca6   12MiB testing/samples/sweets-TD-4.svg

Maybe one of these answers could help you out?
https://stackoverflow.com/questions/2100907/how-to-remove-delete-a-large-file-from-commit-history-in-git-repository

@parrt
Copy link
Owner

parrt commented Sep 27, 2018

Yeah, sorry. it's those damn samples :( i wanted them around for people to see a gallery but gees they are big. (resisting XML/SVG target practice here...)

@metasyn
Copy link
Author

metasyn commented Sep 27, 2018

I think having the samples makes sense - but you can see you actually have 3 copies of the same file in git history:

c0e22ce1ad0b   12MiB testing/samples/sweets-TD-4.svg
92e68cee1463   12MiB testing/samples/sweets-TD-4.svg
5f83e5ce4ca6   12MiB testing/samples/sweets-TD-4.svg

where testing/samples is not big:

 64M	./testing/samples

Heh, obviously your call. I think it took me 6 minutes to clone, I actually think it might impact people's interest over time. It might be pretty frustrating for people that don't have decent internet connections. Maybe I should've just done:

git clone --depth 1 https://github.com/parrt/animl

which cuts the .git size down to 15M. I don't personally have a habit of doing that on every new repo, but this one makes me think maybe I should, haha.

@parrt
Copy link
Owner

parrt commented Sep 27, 2018

hahah. yep! i learned to do depth 1 on the intellij repo which has millions of commits. I'll try to clean this up. you're absolutely right.

@parrt
Copy link
Owner

parrt commented Sep 28, 2018

Hmmm...I looked at people's suggestions for removing previous versions of large files and it's pretty terrifying. Maybe the easiest thing to do is simply destroy the repository and copy the files fresh to simulate a depth of one. I'm not in love with my trail of dead code and commit messages so it would not be a huge loss. This might depend on what I do in response to this issue

@parrt
Copy link
Owner

parrt commented Sep 29, 2018

I'm going to rename this repo to dtreeviz and squash all commits. :)

@parrt
Copy link
Owner

parrt commented Sep 29, 2018

Closing as I fixed with 8411604

@parrt parrt added the enhancement New feature or request label Sep 29, 2018
@parrt parrt closed this as completed Sep 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants