Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serious issue: Please do *not* accept commits with extremely large files #605

Closed
KevinNorth opened this issue Apr 10, 2015 · 37 comments
Closed

Comments

@KevinNorth
Copy link
Contributor

This is a serious issue. I have an actual concern I'd like to address.

I'm having a lot of fun with this repo! When I cloned it onto my own machine, though, it took about 15 minutes to clone. Looking through the repo, I noticed that about 1.5 GB of content was coming from just two top-level directories. One looked like a 1GB iOS app that included several 100+MB images, and the second looks like a clone of a linux distro.

Under the stated rule of "Don't be a d---," I request that you not accept pull requests with extremely large files. Waiting 15 minutes to clone my project really wasn't that bad, and I have more than 1GB to spare, so I'm not upset about the aforementioned files. However, if many people commit 1GB+ files, or someone tries to commit something on the order of 10-100GB+, it could make cloning the project almost impossible, killing all of the fun.

@KevinNorth
Copy link
Contributor Author

See #555

@jlu5
Copy link
Contributor

jlu5 commented Apr 10, 2015

I have to agree.

@hhirsch
Copy link

hhirsch commented Apr 10, 2015

Serious projects should take a long time to clone. Where else should we put the ios app?

@MiLk
Copy link
Member

MiLk commented Apr 10, 2015

@jlu5
Copy link
Contributor

jlu5 commented Apr 10, 2015

@hhirsch Submodule.

@progval
Copy link
Contributor

progval commented Apr 10, 2015

Could someone rewrite history to remove those huge files?

@jlu5
Copy link
Contributor

jlu5 commented Apr 10, 2015

@progval I've tried, but it ends up freezing my CPU for way too long. Way too many commits to go through.

@jlu5
Copy link
Contributor

jlu5 commented Apr 10, 2015

I am going to rewrite the history very soon, replacing it with an initial commit.

Removed folders: bbb/ linux/ data/random-data

@jlu5
Copy link
Contributor

jlu5 commented Apr 10, 2015

The repository has been rebased. Unfortunately, I can't seem to update the old, backup copy of master due to file size constraints...

@MiLk
Copy link
Member

MiLk commented Apr 10, 2015

:/

@jlu5
Copy link
Contributor

jlu5 commented Apr 10, 2015

@MiLk I don't like deleting content either, but the repo size really needs a limit. I can't even run git pull efficiently sometimes without the CPU spiking super high. A git clone feels like it'd take days to complete.

@KevinNorth
Copy link
Contributor Author

Thanks for taking care of this issue! 😃

@MiLk
Copy link
Member

MiLk commented Apr 11, 2015

Next time: https://help.github.com/articles/remove-sensitive-data/
It's possible to remove large files without loosing the history.

@ghost
Copy link

ghost commented Apr 11, 2015

👍

@jlu5
Copy link
Contributor

jlu5 commented Apr 11, 2015

@MiLk There were literally 90000 commits or whatever for filter-branch to go through. Which would have taken forever to complete...

@MiLk
Copy link
Member

MiLk commented Apr 11, 2015

@jlu5
Copy link
Contributor

jlu5 commented Apr 11, 2015

👍

How long did it take?

@MiLk
Copy link
Member

MiLk commented Apr 11, 2015

23 hours

@MiLk
Copy link
Member

MiLk commented Apr 11, 2015

I will try to cherry pick everything which has been made since the end of my rewritten branch.
If I manage to cherry-pick everything we will be able to swap the master branch.

@MiLk
Copy link
Member

MiLk commented Apr 11, 2015

@illacceptanything/owners I would like to start a vote to replace the current master by https://github.com/illacceptanything/illacceptanything/commits/rewrite
The history has been kept, we shouldn't have lost anything.

The vote will end in 12 hours. (2am UTC / 11am JST / 7pm PDT)
A majority is required to make the replacement.

Until the end of the vote, please don't merge anything.

@hiiru
Copy link
Member

hiiru commented Apr 11, 2015

@MiLk I didn't see this issue before, so I merged a minor change a few minutes ago... (only readme was affected) - I won't merge anything else till the end of the vote

👍 I agree with the rewrite, seems fine.

@dbalatero
Copy link
Contributor

Sounds good to me

On Sat, Apr 11, 2015 at 8:01 AM, hiiru notifications@github.com wrote:

@MiLk I didn't see this issue before, so I merged a minor change a few minutes ago... (only readme was affected) - I won't merge anything else till the end of the vote

👍 I agree with the rewrite, seems fine.

Reply to this email directly or view it on GitHub:
#605 (comment)

@jlu5
Copy link
Contributor

jlu5 commented Apr 11, 2015

Would a squash merge do it? I don't really care, honestly.

@dbalatero
Copy link
Contributor

I'll Accept Anything ™️

On Sat, Apr 11, 2015 at 9:35 AM, James Lu notifications@github.com
wrote:

Would a squash merge do it? I don't really care, honestly.

Reply to this email directly or view it on GitHub:
#605 (comment)

@jlu5
Copy link
Contributor

jlu5 commented Apr 11, 2015

👍 Go ahead and do it!

@MapleWorld
Copy link
Contributor

Lol, totally agree, toke forever to clone. First time contributing to open source project don't know how it works, cloned like 15 times, went way over my internet bandwidth.

@ghost
Copy link

ghost commented Apr 11, 2015

I'll check right now which directory/file is the worst offender.

Here are all files greater than 1M:

illacceptanything/code/exec_rb/js/webruby.js
illacceptanything/code/hardware/hackrf-one.brd
illacceptanything/data/bigriff.mp3
illacceptanything/data/worst-commits
illacceptanything/data/NOT-ENOUGH-REACT.jpg
illacceptanything/data/dsp/basilisk.png
illacceptanything/data/words.txt
illacceptanything/data/shakespeare.txt
illacceptanything/data/warandpeace.txt
illacceptanything/data/badger.gif
illacceptanything/data/Old git log/old_git_log.txt.xz
illacceptanything/data/composer.json
illacceptanything/web/??? ??? ??? ??? ??? .html
illacceptanything/web/js-frameworks/ember.debug.js
illacceptanything/web/???????????)??????????????? .html
  • .git object files excluded.

@ghost
Copy link

ghost commented Apr 11, 2015

Lol just had a funny idea: change this repo to illdenyeverymerge.

@jlu5
Copy link
Contributor

jlu5 commented Apr 11, 2015

@initbar no :(

@ghost
Copy link

ghost commented Apr 11, 2015

suggestion: can we devote data folder to contain type files and ask to link them instead of a fragmented distribution (unless the latter is the target philosophy of this repo)?

For example, we can have:

data
|-- [.png]
|-- [.py]
     |-- stuff.py
|-- [.js]
     |-- this\ is\ a\ test.js
     `-- random_stuff.js
labyrinth
|-- enter
etc.

Which looks better than:

data
|-- [dir1]
|-- [dir2]
     |-- stuff.py
|-- [dir3]
     |-- this\ is\ a\ test.jpg
     `-- random_stuff.java
labyrinth
|-- enter
etc.

This way, (1) when someone needs to use random_stuff.js, they just link them, and (2) easily locate files that does nothing but be a dick.

@gnurag
Copy link
Contributor

gnurag commented Apr 11, 2015

+1

@jlu5
Copy link
Contributor

jlu5 commented Apr 11, 2015

Most of the stuff in data/ is just random junk people uploaded, lol

@progval
Copy link
Contributor

progval commented Apr 11, 2015

but you deleted the most random of them :(

@progval progval mentioned this issue Apr 11, 2015
@ghost
Copy link

ghost commented Apr 11, 2015

I could do a massive organization pretty easily: should I do the cleanup and make a pull request (I promise not to delete anything)?

@MiLk
Copy link
Member

MiLk commented Apr 12, 2015

Done.

@initbar Please open a PR.

@MiLk MiLk closed this as completed Apr 12, 2015
@jlu5
Copy link
Contributor

jlu5 commented Apr 12, 2015

I don't really think more sorting is needed atm... Either way, don't prefix the directory names with a . for the extension; that makes them hidden by default, which can be confusing.

@seyfer
Copy link
Contributor

seyfer commented Sep 5, 2015

It's still sooo big...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants