Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows Command-Line Tools #1

Open
Sanmayce opened this issue Feb 23, 2019 · 16 comments
Open

Windows Command-Line Tools #1

Sanmayce opened this issue Feb 23, 2019 · 16 comments

Comments

@Sanmayce
Copy link

Hi, could you provide binaries for Windows? Wanna include MiGz into my benchmark roster using 512KB and bigger blocks, by the way, how bigger they can be?

Your explanation of the acronym is not clear to me, please explain:
"... also supports multithreaded decompression, which is especially important for large files that are read repeatedly. Hence, MiGz."

What does the little 'i' stand for?

Allow me two more questions:

  • Why not in C?
  • Are you aware that MultIple decompressions are not bottlenecked by CPU in LzTurbo 29? I mean, your testresults using 4 cores are even inferior to the single-threaded decompression of LzTurbo 29/39!
     1447249    12.6       0.50     471.90   brotli 11d29                     ftp.gnu.org_grep-3.3.tar
     1455165    12.6       2.10     137.62   lzma 9d29:fb273:mf=bt4           ftp.gnu.org_grep-3.3.tar
     1489213    12.9       0.24    1063.22   oodle 139 ‘Leviathan’            ftp.gnu.org_grep-3.3.tar
     1496718    13.0       0.18    1322.62   oodle 129 ‘Hydra’                ftp.gnu.org_grep-3.3.tar
     1496718    13.0       0.45    1322.01   oodle 89 ‘Kraken’                ftp.gnu.org_grep-3.3.tar
     1513749    13.1       2.24    1070.13   zstd 22d29                       ftp.gnu.org_grep-3.3.tar
     1517944    13.2       0.16     346.06   lzham 4fb258:x4:d29              ftp.gnu.org_grep-3.3.tar
     1521395    13.2       1.49    1552.77   lzturbo 39                       ftp.gnu.org_grep-3.3.tar
     1756302    15.2      39.10    1542.17   lzturbo 32                       ftp.gnu.org_grep-3.3.tar
     1774686    15.4      21.85    1145.93   zstd 12                          ftp.gnu.org_grep-3.3.tar
     1782164    15.5       1.52    1953.87   lzturbo 29                       ftp.gnu.org_grep-3.3.tar
     1875468    16.3       1.47    1581.11   lizard 49                        ftp.gnu.org_grep-3.3.tar
     2114046    18.4      54.64     886.77   oodle 132 ‘Leviathan’            ftp.gnu.org_grep-3.3.tar
     2163309                       1548      Nakamichi 'Ryuugan-ditto-1TB'    ! Outside TurboBench, Intel-v15.0-64bit-archSSE41 compile !
     2172516    18.9       2.42    3501.52   oodle 118 ‘Selkie’               ftp.gnu.org_grep-3.3.tar
     2172516    18.9       2.42    3495.15   oodle 116 ‘Selkie’               ftp.gnu.org_grep-3.3.tar
     2359093    20.5     306.55    1233.67   lzturbo 30                       ftp.gnu.org_grep-3.3.tar
     2404889    20.9      15.35     333.83   zlib 9                           ftp.gnu.org_grep-3.3.tar
     2406525    20.9      46.42    3756.11   oodle 114 ‘Selkie’               ftp.gnu.org_grep-3.3.tar

As you can see from the above (random) test, LzTurbo 30 decompresses 1233.67/333.83=3.69x and compresses 306.55/15.35=19.97x faster than zlib 9!

     6998037    17.4       0.90     849.88   lzturbo 39                       Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar
     7022279    17.4       0.37     321.86   brotli 11d29                     Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar
     7049563    17.5       1.38     684.89   zstd 22d29                       Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar
     7071491    17.5       0.29     644.84   oodle 139 ‘Leviathan’            Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar
     7103502    17.6       0.40     724.20   oodle 89 ‘Kraken’                Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar
     7105986    17.6       0.23     723.98   oodle 129 ‘Hydra’                Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar
     7125187    17.7      10.61      25.82   bzip2                            Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar
     7960854    19.8       0.91    1302.16   lzturbo 29                       Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar
     8061825    20.0       1.38     924.79   lizard 49                        Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar
     9041702                       1077      Nakamichi 'Ryuugan-ditto-1TB'    ! Outside TurboBench, Intel-v15.0-64bit-archSSE41 compile !
     9314676    23.1      37.27    1085.78   lzturbo 32                       Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar
     9759547    24.2       0.57    1812.03   oodle 116 ‘Selkie’               Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar
     9759547    24.2       0.57    1810.97   oodle 118 ‘Selkie’               Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar
    10771358    26.7       4.30     320.47   zlib 9                           Complete_works_of_Fyodor_Dostoyevsky_in_15_volumes_(Russian).tar

For the second random example, LzTurbo 29 is 4x faster than zlib 9 in decompression, no need of 4 cores.

@Sanmayce
Copy link
Author

For more benchmarks, 11 in total so far:

TEXTORAMIC_Decompression_Showdown_2019-Feb-21.pdf:
https://drive.google.com/file/d/162cikKQ3QDiXhUaz_uBJG9unXhoJWwTQ/view?usp=sharing

Nakamichi_Ryuugan-ditto-1TB_btree_source.pdf:
https://drive.google.com/file/d/1dFtfvpcE-TUo_D_Ol9C8FSDXlbzyIll9/view?usp=sharing

@jeffpasternack
Copy link
Contributor

Hi Sanmayce--sorry for the delay in responding; I need to check my Github notification settings, it seems!

  • We don't include self-contained executables for Windows, but as MiGz is pure Java it can run anywhere; you just need to build the JAR with gradle and then run it with the java command-line tool as you would any other Java program. In fact, the executables we publish for Linux and Mac are really just Bash scripts that invoke the java command on the JAR (stored in the same file as the script).
  • MiGz comes from Multithreaded Gzip.
  • The reason that MiGz is written in Java (and not, e.g. C) and [generally] uses zlib (although the exact library depends on the particular JVM implementation, it's usually zlib) is because many companies, including LinkedIn, have a Java-based tech stack and effortless portability relieves us from worrying about native dependencies. We also care about compatibility with the widespread gzip format. There are certainly other compressors/decompressors available, including some that support gzip and could be called in-process via JNI, but this would compromise portability.

Thanks for your benchmark results, by the way--very interesting!

@Sanmayce
Copy link
Author

Thank you Jeff for detailed explanation, you see, me being an C amateur "clouds" my extra limited knowledge of other languages and what their purposes are.
My main idea when asking for binary was to include the executable in my future rosters, (all under Windows, alas). Now, I realize that MiGz targets different environments, I didn't even know that Java doesn't produce stand-alone "executables", excuse me for the profane question.

@jeffpasternack
Copy link
Contributor

Hi Sanmayce, FWIW I'm sure there are tools that can build self-contained executables for Java programs (and probably for compilation to native code rather than bytecode, too), so while we don't provide such executables ourselves you should be able to build them if you're sufficiently motivated :) (IMO it's easy enough to just invoke it using the java command-line tool as normal, though.)

@Sanmayce
Copy link
Author

Hi Jeff, thanks again.
Please consider something that would gladden eyes of many decompression benchmarkers - running MiGz on some many-cores CPU with enwik9:
http://mattmahoney.net/dc/text.html

As far as I see, this COMPUTEX 2019 (May 27) will set a new trend in CPUs - 8 cores affordable by poor-people in the long run. I myself intend to have Matisse with 16 threads. Simply, MiGz' forte ought to be shown on some modern machine, enwik9 is one superb roster, de facto THE BENCHMARK.

@jeffpasternack
Copy link
Contributor

Thanks for the suggestion, Sanmayce. I didn't know about this standardized dataset, but will definitely keep it in mind if/when I have time to run more benchmarks in the future. In practice we're already using MiGz on very high core count servers (e.g. 32+ logical cores) but the cores tend to be individually slower than those you'd find in higher-end desktops.

@Sanmayce
Copy link
Author

Your German XML dump is superbly on point, but not having references (other performers) is kinda not telling. In my view, MiGz will set a Pareto Frontier (decompression rate vs compressed size) with enwik9, it is interesting to see how threaded decompression fares against well-optimized single-threaded competition.

After a month or so, my toy Nakamichi will finish enwik9, my expectation is to set a Pareto Frontier:
http://www.sanmayce.com/Nakamichi/index.html#2019Apr08
Having done it, will ask Dr. Mahoney to add it to 'Large Text Compression Benchmark'.

@cielavenir
Copy link

cielavenir commented Sep 20, 2019

I have written DEFLATE compressors suite, whose backend has zlib / 7-zip / zopfli / miniz / libslz / libdeflate / zlib-ng / igzip. And I have added MiGz format frontend with parallel compression / decompression. Help yourself if interested.
However currently parallel compression will get fast only when the compression level is high.

https://github.com/cielavenir/7bgzf/tree/dev
https://www.dropbox.com/s/cv0wbbhgbzkfavl/7ciso190925.7z?dl=0 # Win32 / Win64 binary

@jeffpasternack
Copy link
Contributor

Thanks, @cielavenir! Looks very nice. I'll be particularly interested to compare the performance/overhead of C multithreading vs. Java's (assuming the underlying zlib is the same).

At a low compression levels, it might be that your machine is unable to stream data from disk quickly enough to benefit from multithreading--I haven't observed this myself, but my test machine has a rather fast SSD.

@Sanmayce
Copy link
Author

@cielavenir
It smells like cooking an yummy benchmark, if you want to initiate a thread where the enwik9 is benchmarked with your Win64 binary along with the latest Zstd modes 1 to 22, I am in...

F:\ENWIK9_benchmark_Zstd>timer64 zstd-v1.4.3-win64.exe -b1 -e22 --threads=64 -i33 enwik9
 1#enwik9            :1000000000 -> 357434859 (2.798),1011.9 MB/s , 680.9 MB/s
 2#enwik9            :1000000000 -> 329130073 (3.038), 680.0 MB/s , 552.9 MB/s
 3#enwik9            :1000000000 -> 313570458 (3.189), 280.2 MB/s , 504.7 MB/s
 4#enwik9            :1000000000 -> 307725039 (3.250), 204.0 MB/s , 495.8 MB/s
 5#enwik9            :1000000000 -> 301808803 (3.313), 161.9 MB/s , 470.9 MB/s
 6#enwik9            :1000000000 -> 295292703 (3.386), 119.4 MB/s , 485.4 MB/s
 7#enwik9            :1000000000 -> 285005952 (3.509),  86.7 MB/s , 520.3 MB/s
 8#enwik9            :1000000000 -> 280885195 (3.560),  70.2 MB/s , 541.4 MB/s
 9#enwik9            :1000000000 -> 278440978 (3.591),  52.6 MB/s , 549.3 MB/s
10#enwik9            :1000000000 -> 273739917 (3.653),  42.8 MB/s , 545.9 MB/s
11#enwik9            :1000000000 -> 271346644 (3.685),  36.5 MB/s , 550.1 MB/s
12#enwik9            :1000000000 -> 269278253 (3.714),  23.2 MB/s , 557.0 MB/s
13#enwik9            :1000000000 -> 265978647 (3.760),  24.4 MB/s , 567.5 MB/s
14#enwik9            :1000000000 -> 261516483 (3.824),  19.7 MB/s , 573.8 MB/s
15#enwik9            :1000000000 -> 258702580 (3.865),  15.7 MB/s , 574.6 MB/s
16#enwik9            :1000000000 -> 250158490 (3.997),  14.0 MB/s , 573.9 MB/s
17#enwik9            :1000000000 -> 242890314 (4.117),  9.82 MB/s , 540.3 MB/s
18#enwik9            :1000000000 -> 239733542 (4.171),  8.13 MB/s , 499.4 MB/s
19#enwik9            :1000000000 -> 235599635 (4.244),  6.22 MB/s , 448.9 MB/s
20#enwik9            :1000000000 -> 226011360 (4.425),  5.18 MB/s , 548.0 MB/s
21#enwik9            :1000000000 -> 220256419 (4.540),  3.28 MB/s , 547.9 MB/s
22#enwik9            :1000000000 -> 215061264 (4.650),  1.74 MB/s , 544.7 MB/s

The above results are for i7-3630QM, the initial package is downloadable at:
https://drive.google.com/file/d/1N8MmC34alEZGeMB6gZw-Vg2BqTZRxkbT/view?usp=sharing

@Sanmayce
Copy link
Author

If you have written it as a benchmark suite (similarly to lzbench and turbobench) it would be great.
Having all fast DEFLATE implementations in C, multi-threaded, under one roof is exciting!
To test their speed, what better way of throwing against the awesome Zstd?

@cielavenir
Copy link

cielavenir commented Sep 22, 2019

perhaps I should print the processing time, but for now please:

for meth in cz1 cz2 cz3 cz4 cz5 cz6 cz7 cz8 cz9 cS1 cS2 cS3 cS4 cS5 cS6 cS7 cS8 cS9 cZ1 cZ2 cs1 cl1 cl2 cl3 cl4 cl5 cl6 cl7 cl8 cl9 cl10 cl11 cl12 cn1 cn2 cn3 cn4 cn5 cn6 cn7 cn8 cn9 ci1 ci2 ci3 ci4; do
echo $meth
time 7migz -${meth} -@64 < enwik9 > enwik9.enc;ls -l enwik9.enc
done

@Sanmayce
Copy link
Author

I'm a Windows user; an suggestion, putting the output of above script on your homepage (as other authors do) would be informative, now no one knows how fast your code is.
This script only compresses, yes? Or you decompress after that as well?

@cielavenir
Copy link

cielavenir commented Sep 25, 2019

  • Now the ellapsed time is printed (but if you need user-time for multithread stress, you still need time command)
  • 7migz tool itself can compress and decompress (in parallel if you want), but the above bash script shows only compression. By the way decompression is done only by igzip backend as there are no meanings to choose other backend (for decompression).
  • Originally 7bgzf frontend has benchmark, but the conference paper is still pending... [edit: it would be open-access, so I'll be able to link immediately]

@Sanmayce
Copy link
Author

@cielavenir
Did a quick run with the updated enwik9_Zstd package, downloadable at:
https://drive.google.com/file/d/1B83Ktm0GI7ACRvcjvMiQXdTLznD9dxCG/view?usp=sharing

For Windows 10 64bit, laptop with I7-3630QM 4cores/8threads and 16GB DDR3, SSD Samsung 860 PRO 256GB:

Compressor                                               |   Ellapsed Time |                  Output
---------------------------------------------------------------------------------------------------- 
1 (zlib) 7migz -cz1 -@64  0<enwik9 1>enwik9.cz1          |    4.140813 sec | 379,731,414 enwik9.cz1
2 (zlib) 7migz -cz2 -@64  0<enwik9 1>enwik9.cz2          |    4.484577 sec | 365,654,349 enwik9.cz2
3 (zlib) 7migz -cz3 -@64  0<enwik9 1>enwik9.cz3          |    5.187740 sec | 355,443,967 enwik9.cz3
4 (zlib) 7migz -cz4 -@64  0<enwik9 1>enwik9.cz4          |    5.719017 sec | 339,905,974 enwik9.cz4
5 (zlib) 7migz -cz5 -@64  0<enwik9 1>enwik9.cz5          |    7.547227 sec | 329,417,750 enwik9.cz5
6 (zlib) 7migz -cz6 -@64  0<enwik9 1>enwik9.cz6          |    9.437935 sec | 326,115,303 enwik9.cz6
7 (zlib) 7migz -cz7 -@64  0<enwik9 1>enwik9.cz7          |   10.297354 sec | 325,489,808 enwik9.cz7
8 (zlib) 7migz -cz8 -@64  0<enwik9 1>enwik9.cz8          |   11.125513 sec | 325,006,523 enwik9.cz8
9 (zlib) 7migz -cz9 -@64  0<enwik9 1>enwik9.cz9          |   11.125514 sec | 325,000,606 enwik9.cz9
---------------------------------------------------------------------------------------------------- 
1 (7zip) 7migz -cS1 -@64  0<enwik9 1>enwik9.cS1          |    7.562848 sec | 339,119,944 enwik9.cS1
2 (7zip) 7migz -cS2 -@64  0<enwik9 1>enwik9.cS2          |    7.515976 sec | 339,119,944 enwik9.cS2
3 (7zip) 7migz -cS3 -@64  0<enwik9 1>enwik9.cS3          |    7.547228 sec | 339,119,944 enwik9.cS3
4 (7zip) 7migz -cS4 -@64  0<enwik9 1>enwik9.cS4          |    7.531599 sec | 339,119,944 enwik9.cS4
5 (7zip) 7migz -cS5 -@64  0<enwik9 1>enwik9.cS5          |   26.157469 sec | 315,436,344 enwik9.cS5
6 (7zip) 7migz -cS6 -@64  0<enwik9 1>enwik9.cS6          |   26.219970 sec | 315,436,344 enwik9.cS6
7 (7zip) 7migz -cS7 -@64  0<enwik9 1>enwik9.cS7          |   72.769007 sec | 313,290,259 enwik9.cS7
8 (7zip) 7migz -cS8 -@64  0<enwik9 1>enwik9.cS8          |   72.784638 sec | 313,290,259 enwik9.cS8
9 (7zip) 7migz -cS9 -@64  0<enwik9 1>enwik9.cS9          |  175.492552 sec | 313,031,626 enwik9.cS9
---------------------------------------------------------------------------------------------------- 
1 (libdeflate) 7migz -cl1 -@64  0<enwik9 1>enwik9.cl1    |    3.359530 sec | 357,276,709 enwik9.cl1
2 (libdeflate) 7migz -cl2 -@64  0<enwik9 1>enwik9.cl2    |    3.547040 sec | 345,865,789 enwik9.cl2
3 (libdeflate) 7migz -cl3 -@64  0<enwik9 1>enwik9.cl3    |    3.781422 sec | 340,919,046 enwik9.cl3
4 (libdeflate) 7migz -cl4 -@64  0<enwik9 1>enwik9.cl4    |    4.015808 sec | 337,870,102 enwik9.cl4
5 (libdeflate) 7migz -cl5 -@64  0<enwik9 1>enwik9.cl5    |    4.453332 sec | 328,713,486 enwik9.cl5
6 (libdeflate) 7migz -cl6 -@64  0<enwik9 1>enwik9.cl6    |    4.843972 sec | 326,477,090 enwik9.cl6
7 (libdeflate) 7migz -cl7 -@64  0<enwik9 1>enwik9.cl7    |    5.250245 sec | 325,606,455 enwik9.cl7
8 (libdeflate) 7migz -cl8 -@64  0<enwik9 1>enwik9.cl8    |   15.031952 sec | 318,781,742 enwik9.cl8
9 (libdeflate) 7migz -cl9 -@64  0<enwik9 1>enwik9.cl9    |   19.297770 sec | 315,242,759 enwik9.cl9
10 (libdeflate) 7migz -cl10 -@64  0<enwik9 1>enwik9.cl10 |   21.360367 sec | 313,801,807 enwik9.cl10
11 (libdeflate) 7migz -cl11 -@64  0<enwik9 1>enwik9.cl11 |   28.079433 sec | 313,156,477 enwik9.cl11
12 (libdeflate) 7migz -cl12 -@64  0<enwik9 1>enwik9.cl12 |   35.236018 sec | 313,033,510 enwik9.cl12
---------------------------------------------------------------------------------------------------- 
1 (zlibng) 7migz -cn1 -@64  0<enwik9 1>enwik9.cn1        |    2.578248 sec | 506,934,955 enwik9.cn1
2 (zlibng) 7migz -cn2 -@64  0<enwik9 1>enwik9.cn2        |    5.172112 sec | 362,046,146 enwik9.cn2
3 (zlibng) 7migz -cn3 -@64  0<enwik9 1>enwik9.cn3        |    5.765888 sec | 347,989,009 enwik9.cn3
4 (zlibng) 7migz -cn4 -@64  0<enwik9 1>enwik9.cn4        |    6.500298 sec | 331,634,506 enwik9.cn4
5 (zlibng) 7migz -cn5 -@64  0<enwik9 1>enwik9.cn5        |    8.062875 sec | 331,400,800 enwik9.cn5
6 (zlibng) 7migz -cn6 -@64  0<enwik9 1>enwik9.cn6        |    9.531695 sec | 329,505,194 enwik9.cn6
7 (zlibng) 7migz -cn7 -@64  0<enwik9 1>enwik9.cn7        |   13.625633 sec | 325,457,108 enwik9.cn7
8 (zlibng) 7migz -cn8 -@64  0<enwik9 1>enwik9.cn8        |   14.578805 sec | 325,006,403 enwik9.cn8
9 (zlibng) 7migz -cn9 -@64  0<enwik9 1>enwik9.cn9        |   14.766304 sec | 325,000,559 enwik9.cn9
---------------------------------------------------------------------------------------------------- 
1 (igzip) 7migz -ci1 -@64  0<enwik9 1>enwik9.ci1         |    2.000093 sec | 391,100,674 enwik9.ci1
2 (igzip) 7migz -ci2 -@64  0<enwik9 1>enwik9.ci2         |    2.203224 sec | 374,181,750 enwik9.ci2
3 (igzip) 7migz -ci3 -@64  0<enwik9 1>enwik9.ci3         |    2.218850 sec | 365,630,779 enwik9.ci3
4 (igzip) 7migz -ci4 -@64  0<enwik9 1>enwik9.ci4         |    5.125235 sec | 359,555,315 enwik9.ci4
---------------------------------------------------------------------------------------------------- 

Note1: Under Windows filenames are case insensitive, thus your 's' and 'S' and 'z' and 'Z' are overlapping.
Note2: Could you add printing the block sizes for each run, e.g. 7zip mode 9 seems like 7zip mode 5.
Note3: The decompression is awesome, around 0.9 seconds, wow!

Could you explain why so many executables, I wish I had one only to include in future benchmarks...

@cielavenir
Copy link

cielavenir commented Sep 25, 2019

Note1: Under Windows filenames are case insensitive

Yes, but options are case sensitive. ...Perhaps you need to change the output filename, though.

7zip mode 9 seems like 7zip mode 5

What do you mean... I can see -cS1==-cS4, -cS5==-cS6, -cS7==-cS8 though. I read such information in 7-zip documentation. cf https://sevenzip.osdn.jp/chm/cmdline/switches/method.htm#ZipX
However I see -cS5 is different from -cS9.

Could you explain why so many executables

Before, I have said DEFLATE compressors suite, which has many frontends and backends. 7migz is the MiGz frontend related to this topic.
(I have somehow added GZinga frontend. It is unrelated to this topic but you can see if curious.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants