Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

big source tarball #221

Open
rofl0r opened this issue Oct 13, 2018 · 8 comments

Comments

@rofl0r
Copy link

commented Oct 13, 2018

hi, i've just updated re2c from 0.14 to 1.1.1 >>0<< and noticed the tarball grew from 2.5 to 5.9MB, which is quite big considering the built executable is only 300 KB!
maybe there's some binary documentation inside that bloats the source unnecessarily and could be offered as a secondary download to interested people ?
since folks like me who build their stuff from source usually keep the tarballs around, every MB counts...thanks for considering to shrinking it a bit.
maybe offering a download as tar.xz could also bring the size down a bit.

@trofi

This comment has been minimized.

Copy link
Contributor

commented Oct 13, 2018

We can check what is actually used in tarball by re-compressing individual directories to get the idea:

re2c-1.1.1 $ ls -lh *.gz --sort=size
3.6M test.tar.gz
889K doc.tar.gz
115K src.tar.gz
59K bootstrap.tar.gz
42K examples.tar.gz

As we can see these are mostly tests. 6MB does not sounds like an unreasonable number.

@skvadrik

This comment has been minimized.

Copy link
Owner

commented Oct 13, 2018

My conclusion is the same: tests eat all the space (and they will only grow over time). Without tests the resulting tarball is about 100K. We can potentially package stripped tarball without tests, but that requires fiddling with automake conditionals to make things like make check pass for the stripped tarball. Is it worth the effort?

@rofl0r: Since you build from source, it is likely that you also want to run the tests, right?

@trofi thanks for measuring!

@skvadrik

This comment has been minimized.

Copy link
Owner

commented Oct 13, 2018

I measured .tar.xz size, and it is about 2.7M, so maybe using xz the best option.

@leo-yuriev

This comment has been minimized.

Copy link

commented Oct 13, 2018

git-submodule for test's code and data?

@rofl0r

This comment has been minimized.

Copy link
Author

commented Oct 13, 2018

@rofl0r: Since you build from source, it is likely that you also want to run the tests, right?

not really. all our packages do --disable-tests, when possible, to save time, and to allow cross-compilation (tests work only for native compiles anyway).

@rofl0r

This comment has been minimized.

Copy link
Author

commented Oct 14, 2018

another aspect of having a big source tarball is that it suggests a bloated program inside.
for example i was evaluating lexers and when re2c was mentioned i looked into my pkg recipe and was immediately repelled by the 2.5 MB tarball size, to the point of immediately abandoning it as a potential candidate. but since i already had it installed for php, i looked a bit closer and figured the program itself is reasonably small.

trofi added a commit to trofi/re2c that referenced this issue Oct 16, 2018

configure.ac: enable xz tarballs instead of gzip by default
`xz` compresses twice as good as `gzip` on `re2c` sources:

```
$ ls -lh *1.1.1*
4,8M re2c-1.1.1.tar.gz
2,5M re2c-1.1.1.tar.xz
```

Switch `make dist` to `xz by default. `gzip` is still available
via `make dist-gzip`.

Reported-by: rofl0r
Bug: skvadrik#221
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
@skvadrik

This comment has been minimized.

Copy link
Owner

commented Oct 16, 2018

@trofi pushed a fix that switches to .tar.xz by default and reduces tarball size 2x.

another aspect of having a big source tarball is that it suggests a bloated program inside.
for example i was evaluating lexers and when re2c was mentioned i looked into my pkg recipe and was immediately repelled by the 2.5 MB tarball size, to the point of immediately abandoning it as a potential candidate. but since i already had it installed for php, i looked a bit closer and figured the program itself is reasonably small.

Hmm... I might be wrong, but it seems that a moderately-lazy distro maintainer would grab the same tarball, or do the same git clone command regardless of whether the tests are enabled or not, and later on decide what to do with the tests. So git submodules won't prevent you from mis-judging the size of the program by the size of the tarball.

What I'm not happy with is fat git history, which is caused by committing large binary blobs in the past (they were deleted shortly after committing when I realized my mistake, but the history has them).

@rofl0r

This comment has been minimized.

Copy link
Author

commented Oct 16, 2018

@trofi pushed a fix that switches to .tar.xz by default and reduces tarball size 2x.

great, thanks!

git submodules won't prevent you from mis-judging the size of the program by the size of the tarball.

git submodules weren't my idea, and i don't think they'd help tbh. i'm specifically talking about release tarballs, not github's tarball-from-tag feature.

since the testsuite of re2c seems to be incredibly voluminous, my personal preference would be to create 2 tarballs during make dist: re2c-x.y.z.tar.xz and re2c-testsuite-x.y.z.tar.xz or alternatively re2c-x.y.z-onlysource.tar.xz and re2c-x.y.z-full.tar.xz.
git does a similar thing: git sources are distributes in git-x.y.z.tar.xz, and manpages in git-manpages-x.y.z.tar.xz.

What I'm not happy with is fat git history, which is caused by committing large binary blobs in the past (they were deleted shortly after committing when I realized my mistake, but the history has them).

yeah, this is a quite annoying problem: if you dont notice the mistake within a short time (during which you could force-push to correct it), you're stuck with it, unless you rewrite the history... that's why i always look at my commits with gitk before pushing. also i tell ppl to never use git commit -a ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.