Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent big files from being merged to the master with CI/CD #1372

Closed
1 of 2 tasks
omerXfaruq opened this issue May 24, 2022 · 10 comments · Fixed by #3013
Closed
1 of 2 tasks

Prevent big files from being merged to the master with CI/CD #1372

omerXfaruq opened this issue May 24, 2022 · 10 comments · Fixed by #3013
Assignees
Labels
testing Related to testing and CI
Milestone

Comments

@omerXfaruq
Copy link
Contributor

omerXfaruq commented May 24, 2022

  • I have searched to see if a similar issue already exists.

  • Find or write a CI tool to warn or report about introduced size difference by PRs.

@omerXfaruq omerXfaruq added this to the 3.1 milestone May 24, 2022
@omerXfaruq omerXfaruq self-assigned this May 24, 2022
@abidlabs abidlabs added the enhancement New feature or request label May 24, 2022
@omerXfaruq
Copy link
Contributor Author

Added the CI for this and testing it in #1376, however there are some problems with the sizewatcher, might use smt different if it won't be solved.

Current policy of the check:
10MB+ diff: error
1MB+ diff: warning

@omerXfaruq
Copy link
Contributor Author

omerXfaruq commented May 25, 2022

I think we should also clear the big files from history while we're at it. Clearing the history is easy with bfg like in #813, however we should back it up with CI or we get big files back.

I say we remove unnecessary files with >3MB from history, and have ban against them. If there is a gif or image, let them get uploaded to cloud in the future, as a policy. Don't want to go overkill about this, let me hear your thoughts as well. @aliabd, @aliabid94 , @abidlabs. We might want to clean only recent history to not touch the big part of the history, though touching old history is not a big problem either.

Another alternative would be not caring about the git size and letting git run wildly :D


➜  gradio git:(main) ✗ git rev-list --objects --all |
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
  sed -n 's/^blob //p' |
  sort --numeric-sort --key=2 |
  cut -c 1-12,41- |
  $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest | tail -n 100
410e6cff9a29  957KiB demo/screenshots/image_mod/1.png
41e4b7abd736  958KiB gradio/frontend/static/bundle.js
6f747d0ec84f  958KiB gradio/frontend/static/bundle.js
ff7abf8a0c05  958KiB gradio/frontend/static/bundle.js
7e45d3e6b424  975KiB gradio/test_data.py
3657cb4601e5  975KiB gradio/test_data.py
f1dc8c5d95e0  975KiB gradio/test_data.py
69dc98958dbd  988KiB gradio/media_data.py
43d3f34dab16  988KiB gradio/media_data.py
1f91c6e92f67  988KiB gradio/media_data.py
97b4b5af5fa9  988KiB gradio/media_data.py
2b38ac091c4a  1.1MiB demo/blocks_flipper/screenshot.gif
c8af75396128  1.1MiB demo/image_mod/screenshot.png
1e6ea1d63285  1.1MiB demo/hello_world_2/screenshot.gif
6df68af06d56  1.1MiB demo/screenshots/image_mod/cheetah2.png
e19b5821bf5d  1.1MiB dist/gradio-0.9.6.tar.gz
63c97394e52b  1.1MiB dist/gradio-0.9.4.tar.gz
fac64578e65b  1.1MiB dist/gradio-0.9.5.tar.gz
cdc296c1b92e  1.1MiB dist/gradio-0.9.7.tar.gz
8129be23f547  1.1MiB dist/gradio-0.9.9.5.tar.gz
ca4b28262bbc  1.2MiB dist/gradio-0.9.0-py3-none-any.whl
3bb5e91c3353  1.2MiB dist/gradio-0.8.1-py3-none-any.whl
a36167412d9b  1.2MiB dist/gradio-0.9.1-py3-none-any.whl
735a09d886e0  1.2MiB dist/gradio-0.9.9-py3-none-any.whl
a6750044bbbc  1.2MiB dist/gradio-0.9.6-py3-none-any.whl
5ef368d9a8d6  1.2MiB dist/gradio-0.9.9.2-py3-none-any.whl
01c93ab4a8c4  1.2MiB dist/gradio-0.9.9.5-py3-none-any.whl
01c89fb36d88  1.2MiB dist/gradio-0.9.4-py3-none-any.whl
be8536659aee  1.2MiB dist/gradio-0.9.3-py3-none-any.whl
d42e616f195e  1.2MiB dist/gradio-0.9.5-py3-none-any.whl
b57c57b3a7bb  1.2MiB dist/gradio-0.9.7-py3-none-any.whl
2290541b0262  1.2MiB demo/screenshots/image_mod/cheetah1.png
3f067e14485a  1.2MiB dist/gradio-0.8.0-py3-none-any.whl
2ccc652ddfc1  1.3MiB website/homepage/src/assets/img/guides/using_flagging/flag_button.gif
dd880afc8980  1.3MiB dist/gradio-0.4.1-py3-none-any.whl
60ac9613bb09  1.3MiB dist/gradio-0.4.0-py3-none-any.whl
007ebda3e9ff  1.3MiB dist/gradio-0.4.2-py3-none-any.whl
b141c82a4006  1.4MiB demo/screenshots/image_mod/lion.png
82259f2449d2  1.4MiB demo/screenshots/image_mod/1.png
23e35e4d515a  1.4MiB dist/gradio-0.4.4-py3-none-any.whl
486c2783edf6  1.4MiB website/homepage/src/assets/img/meta-image-2.png
2bd26b9b9044  1.4MiB frontend/package-lock.json
bdddee90bd71  1.4MiB frontend/package-lock.json
252029b2da66  1.4MiB frontend/package-lock.json
937016821907  1.5MiB frontend/package-lock.json
6f93cbffad23  1.5MiB frontend/package-lock.json
e9debc1d2f5d  1.5MiB frontend/package-lock.json
37c5c23e5e9f  1.5MiB frontend/package-lock.json
596e869ef26e  1.5MiB frontend/package-lock.json
ffb6570446f8  1.5MiB frontend/package-lock.json
550eab33b314  1.5MiB frontend/package-lock.json
6f63e8b38e80  1.5MiB frontend/package-lock.json
d47fa686fc12  1.5MiB frontend/package-lock.json
fa194df72e24  1.5MiB frontend/package-lock.json
de6d16abb05a  1.5MiB demo/blocks_neural_instrument_coding/sax.wav
b11552f9cb69  1.5MiB demo/kitchen_sink/files/world.mp4
de1b020218ed  1.6MiB frontend/package-lock.json
1515e1c65104  1.6MiB frontend/package-lock.json
7c1449ff3750  1.6MiB frontend/package-lock.json
2d0ac6fa6154  1.6MiB frontend/package-lock.json
8d0bb5677e16  1.6MiB frontend/package-lock.json
cb2bb9e2daf6  1.6MiB website/homepage/src/assets/img/meta-image.png
13f4faeae920  1.7MiB dist/gradio-0.8.1-py3.6.egg
a32c6d2d665a  1.7MiB frontend/package-lock.json
9f0b99790ac2  1.7MiB dist/gradio-0.9.0-py3.7.egg
c7cd52296317  1.7MiB dist/gradio-0.9.0-py3.7.egg
c36b0a2c1229  1.7MiB dist/gradio-0.9.0-py3.7.egg
b03802c230e5  1.7MiB dist/gradio-0.9.0-py3.7.egg
538c46fc991d  1.7MiB dist/gradio-0.9.6-py3.7.egg
34e4821c2e47  1.7MiB dist/gradio-0.9.1-py3.6.egg
35d678fc3a31  1.7MiB dist/gradio-0.9.1-py3.7.egg
6aca7741f639  1.7MiB dist/gradio-0.9.3-py3.7.egg
0836a1692570  1.7MiB dist/gradio-0.9.1-py3.7.egg
bc04934cdff0  1.7MiB dist/gradio-0.9.4-py3.7.egg
a30cb41e6bd5  1.7MiB dist/gradio-0.9.4-py3.7.egg
d4e44fad06f6  1.7MiB dist/gradio-0.9.5-py3.7.egg
3d17ea24ac80  1.7MiB dist/gradio-0.9.9.1-py3.7.egg
6a6c24a65436  1.7MiB dist/gradio-0.5.0-py3-none-any.whl
e01955b9e1ff  1.8MiB frontend/package-lock.json
78e2897dbe3d  1.8MiB frontend/package-lock.json
e61383c96d87  1.8MiB frontend/package-lock.json
40c7ac95141a  1.8MiB dist/gradio-0.7.0-py3-none-any.whl
9899ab5ae5b0  1.8MiB dist/gradio-0.7.1-py3-none-any.whl
5f70ec7a179c  1.8MiB dist/gradio-0.7.2-py3-none-any.whl
6019e0bbd667  1.8MiB dist/gradio-0.7.4-py3-none-any.whl
62cb4d471402  1.8MiB dist/gradio-0.7.5-py3-none-any.whl
f27866e02462  1.8MiB dist/gradio-0.7.6-py3-none-any.whl
37ae79b4a37b  1.8MiB dist/gradio-0.7.3-py3-none-any.whl
3a653eb2157a  2.5MiB demo/sepia_filter/screenshot.gif
ff9a6cc22019  3.0MiB dump_data
3a714b0883bc  3.4MiB gradio/frontend/static/js/2.0081c136.chunk.js.map
076bdc2f0211  3.6MiB demo/sepia_filter/screenshot.gif
e56bbca9b6c1  3.6MiB gradio/frontend/static/bundle.js.map
29ea5dade54c  3.6MiB gradio/frontend/static/bundle.js.map
758bbe5bfe94  3.6MiB gradio/frontend/static/bundle.js.map
b94db7a6e197  3.6MiB gradio/frontend/static/bundle.js.map
84ff86c4e649  5.5MiB demo/calculator/screenshot.gif
c13ffbcf391d  5.6MiB demo/sales_projections/screenshot.gif
4f61e1e15af2  8.2MiB website/homepage/src/assets/img/hf_demo.gif
e2b50f155e80   53MiB demo/streaming_stt/deepspeech-0.8.2-models.pbmm

@aliabd
Copy link
Collaborator

aliabd commented May 25, 2022

Definitely we should clean the history one last time. The last 13 files above are >100MB

@aliabd
Copy link
Collaborator

aliabd commented May 25, 2022

I'm fine adopting a policy for images and gifs but we should explore where that would affect performance on eg the website

@omerXfaruq
Copy link
Contributor Author

@aliabid94 and @abidlabs do you agree as well?

@abidlabs
Copy link
Member

Sounds good

@aliabid94
Copy link
Collaborator

I think that's fine for the python library, but I also don't want to overcomplicate things if there are a a few GIFs used for the README. Is there anywhere else we use a GIF?

@omerXfaruq
Copy link
Contributor Author

omerXfaruq commented May 26, 2022

Abubakar used unsplash before, and it would work for README. Not sure about the website performance though. Probably would not affect it.

@omerXfaruq
Copy link
Contributor Author

Sizewatcher was faulty and did not serve our needs, thus got removed from the repo. see

@abidlabs abidlabs modified the milestones: 3.1, 3.x May 31, 2022
@omerXfaruq
Copy link
Contributor Author

Even if having a big repo size is not a major problem, it is not nice to maintain in the long run. Both from the perspective of new user git clones or git pulls taking too long or repo taking too much space.

Could not find any tool to serve our needs on this issue, will try to solve it with github actions when I find the time. Any help is most welcome 😸

@omerXfaruq omerXfaruq added the testing Related to testing and CI label Aug 13, 2022
@omerXfaruq omerXfaruq modified the milestone: 3.x Aug 13, 2022
@omerXfaruq omerXfaruq removed their assignment Aug 13, 2022
@abidlabs abidlabs self-assigned this Jan 17, 2023
@abidlabs abidlabs removed the enhancement New feature or request label Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing Related to testing and CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants