Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: build only compressor #77

Closed
annulen opened this issue Nov 20, 2015 · 25 comments
Closed

Feature request: build only compressor #77

annulen opened this issue Nov 20, 2015 · 25 comments

Comments

@annulen
Copy link

annulen commented Nov 20, 2015

Could you add #define to optionally disable compilation of decompression code? Thanks.

@Cyan4973
Copy link
Contributor

Hi Konstantin

That's a pretty good idea.
Actually, I was expecting the other way round, that is, some application being interested in the decompression code without the compression part.
That being said, both objectives implies the same capability.

I feel we should wait for v0.4 do look into this issue.
The reason is, the code structure will be changed, in a way which will make this capability easier to create. It will probably not be enough, but at least a good step into the right direction, so it will be easier to study what remains to be done.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Dec 2, 2015

With the release of 0.4.x serie, it becomes possible to target this objective.

Zstd code is now more clearly separated between compression and decompression.
It is also possible to not generate legacy code (a logical complementary capability).

So, what's missing now is primarily the lack of compression / decompression separation within huff0 and fse.

There are mainly 2 ways to achieve this :

  • Modify huff0 and fse, in order to separate compress and decompress functions
    • possible, will require some time, and increase number of files
  • Make huff0 and fse integration static
    • expectation : unused static functions will simply not be generated (dead code elimination)
    • added benefit : currently public fse / huff0 symbols will no longer be present within ABI. Only zstd public symbols would remain.
    • requires a few tricky source code modifications.

The second option currently looks the more promising to me.

@annulen
Copy link
Author

annulen commented Dec 2, 2015

+1 for static

@Cyan4973
Copy link
Contributor

Cyan4973 commented Dec 2, 2015

There is also a need to define what kind of objective to target.
Presuming it's possible to compile for only compression or only decompression, what should be the result ?

  • A static library ? (for example, zstd_compress.a)
  • A list of files and parameters ?

Speaking of static library : how does it work today ?
Presuming libzstd.a is compiled and generated, what happens when linking a program which only uses compression or decompression code from this library ? Does the linker keep unused code out of the resulting binary, achieving the objective ?

From https://en.wikipedia.org/wiki/Static_library :

With static linking, it is enough to include those parts of the library that are directly and indirectly referenced by the target executable

@annulen
Copy link
Author

annulen commented Dec 2, 2015

Here are some facts that I know about ELF linking:

  • ELF linker manipulates sections, not functions.
  • ELF linker traverses all input files in one pass from left to right (this behavior can be modified by command line options, at least in GNU implementation). It picks sections according to following rules
    ** All code (.text) sections from .o files specified on command line are linked into final ELF object (executable or shared library)
    *
    Code sections from *.a files are included only if they are referenced from section, which was already included
  • In GCC and Clang there is option -ffunction-sections, which forces compiler to create separate section for each function. (There is also similar option -fdata-section for data, e.g. string literals). If GNU linker is invoked with --gc-sections, it will throw away all unuswd functions.
  • Aforementioned options are not present in old versions of GCC and binutils, e.g. people using embedded cross-toolchains with gcc < 4.2 are probably out of luck

@annulen
Copy link
Author

annulen commented Dec 2, 2015

So, if you place compression and decompression functions into different source files, make static library from it, and refenece only, e.g., decompression function from executable, compression functions won't be linked in. If they are not in different files, -ffunction-sections is required when building static library, and -Wl,--gc-sections when linking executable.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Dec 2, 2015

So dead code elimination from static library is in theory possible, in practice not obvious, with various complexities and limitations in the way.

OK. So now let's suppose that it's possible to build 2 static libraries dedicated for compression and decompression. Note that, within the compression one, there are multiple methods possibles, depending on compression level. That means I expect the compression library to remain "relatively big".

Here is btw current object file sizes, compiled using -Os (optimized for size) for x64 target :

ls -ls *.o *.a
 16 -rw-r--r-- 1 yann yann  13480 déc.   2 18:33 fse.o
 24 -rw-r--r-- 1 yann yann  20592 déc.   2 18:33 huff0.o
 52 -rw-r--r-- 1 yann yann  53064 déc.   2 18:33 zstd_compress.o
 12 -rw-r--r-- 1 yann yann  10328 déc.   2 18:33 zstd_decompress.o
100 -rw-r--r-- 1 yann yann 100042 déc.   2 18:33 libzstd.a

I suspect the request to remove decompression code is tied to reducing final code size. According to above measures, it will indeed reduce code size, but by no more than 30 % (including parts of fse and huff0).
Is it enough ? Is there a size objective basically ?

@annulen
Copy link
Author

annulen commented Dec 2, 2015

Currently I'm interested only in size of zstd application. I'm planning to use it for real-time compression of coredumps and related info on embedded system to transmit over network, and this data will never be decompressed on that device. I don't have strict size requirements, so 30% would be fine. I just wanted not to bring unnecessarily bloat into firmware.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Dec 2, 2015

Currently I'm interested only in size of zstd application

When you say "application", you mean the ./zstd command line utility ?

@annulen
Copy link
Author

annulen commented Dec 2, 2015

Yep

@Cyan4973
Copy link
Contributor

Cyan4973 commented Dec 2, 2015

OK. It's a more defined objective, but also quite more work.

On top of separating library functions, it will be necessary to modify program files.
They aren't created with this objective in mind, so it will take some time.

@annulen
Copy link
Author

annulen commented Dec 2, 2015

I thought something like annulen@777033b would be enough to exclude decompressor from zstd cli. Am I wrong?

@Cyan4973
Copy link
Contributor

Cyan4973 commented Dec 2, 2015

I've re-target the objective to "reduce size of ./zstd utility".

According to current experiment, the proposed change at annulen/zstd@777033b will probably not be enough.

I've experimented with removing non essential capabilities, starting with the integrated benchmark suite.

Just removing any mention of benchmark from zstdcli.c doesn't seem to be enough : the final exe size doesn't change much. I suspect that's because the public symbols are still generated, even if not used, as they could be called externally, like a dll. So, to get some size benefits, it's also necessary to remove bench.c from the compilation chain.

I'm starting to lean the binary along these principles. I suspect I'll have something to propose by tomorrow.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Dec 3, 2015

There is a new update in "dev" branch : 28e7cef

It proposes a new build option for ./zstd command line utility : make zstd-frugal

Tested on Linux x64 with gcc 4.8.4, the resulting binary is 107KiB, down from 270KiB (default).
To decrease size, it gives away legacy support and bench functionalities.

It still do both compression and decompression, but as stated earlier, separating both will take quite some more time. So I figured this solution could be a good stop gap.

@annulen
Copy link
Author

annulen commented Dec 3, 2015

Thanks!

@annulen
Copy link
Author

annulen commented Dec 3, 2015

Here are files sizes on MIPS (stripped):

  • Default options: 220K
  • ZSTD_LEGACY_SUPPORT=0: 164K
  • ZSTD_LEGACY_SUPPORT=0, zstd-noBench target: 136K
  • zstd-frugal: 108K

@Cyan4973
Copy link
Contributor

Cyan4973 commented Dec 3, 2015

Looks good to me, in line with x64 experience.

Is the target size good enough for you ?

@annulen
Copy link
Author

annulen commented Dec 3, 2015

Yep!

More numbers if you are interested:

  • -O3 -DZSTD_LEGACY_SUPPORT=0 -ffunction-sections -fdata-sections -Wl,--gc-sections: 116K
  • same + -DZSTDC_NO_DECOMPRESSOR (my patch): 92K
  • option2 + -flto -fwhole-program: 88K
  • option2 with -O2 instead of -O3: 84K
  • option4 with decompressor enabled: 108K

@Cyan4973
Copy link
Contributor

Cyan4973 commented Dec 3, 2015

Sounds good, your path seems able to reduce size even further, even though the public symbols are still present in compiled object files. Hey, better grab the gain ....

@Cyan4973
Copy link
Contributor

Latest development release allow compilation of compression / decompression separately.
You can have a look at the "dev" branch, and try make zstd-compress and make zstd-decompress .

@Cyan4973
Copy link
Contributor

v0.6.1 makes it possible to compile only the compressor or only the decompressor

@bittorf
Copy link

bittorf commented Oct 5, 2016

how is it supposed to work?

# cd zstd
# cd programs
# make zstd-decompress

cc      -I../lib -I../lib/common -I../lib/dictBuilder -I../lib/legacy -O3 -Wall -Wextra -Wcast-qual -Wcast-align -Wshadow -Wstrict-aliasing=1 -Wswitch-enum -Wdeclaration-after-statement -Wstrict-prototypes -Wundef   -DZSTD_NOBENCH -DZSTD_NODICT -DZSTD_NOCOMPRESS -DZSTD_LEGACY_SUPPORT=0 ../lib/common/entropy_common.c ../lib/common/fse_decompress.c ../lib/common/xxhash.c ../lib/common/zstd_common.c ../lib/decompress/huf_decompress.c zstdcli.c fileio.c -o zstd-decompress
/tmp/ccgl3PmR.o: In function `FIO_createDResources':
fileio.c:(.text+0x961): undefined reference to `ZSTD_createDStream'
fileio.c:(.text+0x976): undefined reference to `ZSTD_DStreamInSize'
fileio.c:(.text+0x98d): undefined reference to `ZSTD_DStreamOutSize'
fileio.c:(.text+0xa80): undefined reference to `ZSTD_initDStream_usingDict'
/tmp/ccgl3PmR.o: In function `FIO_decompressFrame':
fileio.c:(.text+0x1105): undefined reference to `ZSTD_resetDStream'
fileio.c:(.text+0x11e6): undefined reference to `ZSTD_decompressStream'
/tmp/ccgl3PmR.o: In function `FIO_decompressFilename':
fileio.c:(.text+0x1c08): undefined reference to `ZSTD_freeDStream'
/tmp/ccgl3PmR.o: In function `FIO_decompressMultipleFilenames':
fileio.c:(.text+0x1e23): undefined reference to `ZSTD_freeDStream'
/tmp/cc7gc0T1.o: In function `main':
zstdcli.c:(.text.startup+0x938): undefined reference to `ZSTD_maxCLevel'
collect2: error: ld returned 1 exit status
make: *** [zstd-decompress] Error 1

this is using checkout 83543a7

@Cyan4973 Cyan4973 reopened this Oct 5, 2016
@Cyan4973
Copy link
Contributor

Cyan4973 commented Oct 5, 2016

To be fixed

@inikep
Copy link
Contributor

inikep commented Oct 6, 2016

@bittorf It's already fixed at "dev" branch:
https://github.com/facebook/zstd/commits/dev

@Cyan4973
Copy link
Contributor

Fixed by @inikep in dev branch

Ornias1993 referenced this issue in c0d3z3r0/zfs Dec 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants