LARGE_WINDOW 32kb windows #9

vlm · 2016-10-22T02:46:35Z

Hello!

I am configuring the project with autogen.sh && ./configure CFLAGS=-DLARGE_WINDOW to obtain 32kB deflate windows instead of default 8k. I am setting .flush = SYNC_FLUSH in the compression context after calling isal_deflate_init().

The compression rate is virtually unchanged between 8k and 32k windows. The speed doesn't change at all. This prompts me to think about LARGE_WINDOW is not implemented properly, or the flag is not visible all the way through to the assembly code. However, I modified the options.asm manually to switch LARGE_WINDOW option on, and recompiled — to no avail. Here's my table on my large text data file I am using:

	Plain	ZLib (32kB)	ISA-L (8kB)	ISA-L (32kB)
Compression rate	1x	5.68x	2.37x	2.49x

It is important to consider is that I am using a .flush = SYNC_FLUSH setting on blocks of about 5kB each at a time. Had it worked like Z_FULL_FLUSH (as in, discard the history window), it could lead to the observed behavior.

Would you please clarify whether this absense of a compression rate change is something I should expect after enabling LARGE_WINDOW.

The text was updated successfully, but these errors were encountered:

gbtucker · 2016-10-24T17:12:44Z

Sync_flush does do what you expect and allow matches over submit bounties as does no_flush in statefull compress. Changing the default to large_window may not have a big impact depending on data set unless you also change the default hufftables at compile time or use isal_create_hufftables() at runtime.

We are working on a number of improvements in compression that we will push soon that both improve the compression ratio and speed. This includes making 32K window the default.

vlm · 2016-10-24T22:00:54Z

@gbtucker thanks.

Changing the default Huffman tables to the ones tailored to the data did increase compression ratio for 32kB windows from 2.49x to 3.97x.

But then, the same tables on 8kB windows gave 3.66x.

This leads me to believe that larger windows are not getting using properly in this code.

gbtucker · 2016-10-25T16:32:16Z

It depends a lot on your data of course. What data set are you using?

vlm · 2016-10-26T02:08:29Z

I created a demo script. It concatenates two random files of size ${BLOCK} several times and then tries to compress them using gzip -1 or ISA-L's igzip_example in 32-K window mode.

[vlm@p1 igzip]$ ./test-compression.sh 8000
igzip_example
Window Size: 32 K
End of igzip_example

igzip_example
Window Size: 32 K
End of igzip_example

-rw-rw-r--. 1 vlm vlm  28936 Oct 26 02:10 outfile-gzip.bin
-rw-rw-r--. 1 vlm vlm  61336 Oct 26 02:10 outfile-isal-NO_FLUSH.bin
-rw-rw-r--. 1 vlm vlm 133756 Oct 26 02:10 outfile-isal-SYNC_FLUSH.bin
[vlm@p1 igzip]$ ./test-compression.sh 16000
igzip_example
Window Size: 32 K
End of igzip_example

igzip_example
Window Size: 32 K
End of igzip_example

-rw-rw-r--. 1 vlm vlm  56637 Oct 26 02:10 outfile-gzip.bin
-rw-rw-r--. 1 vlm vlm 119510 Oct 26 02:10 outfile-isal-NO_FLUSH.bin
-rw-rw-r--. 1 vlm vlm 371824 Oct 26 02:10 outfile-isal-SYNC_FLUSH.bin
[vlm@p1 igzip]$

As you see, the ISA-L in 32K window mode performs much worse than gzip -1 pretty much all the time, but even more so in SYNC_FLUSH mode.

`test-compression.sh` file

#!/usr/bin/env bash

set -e

if [ -z "$1" ]; then
    echo "Usage: $0 <random_block_size>"
    exit
fi
BLOCK="$1"

# Two random blocks
dd if=/dev/urandom count=1 bs=${BLOCK} of=rnd1.bin 2>/dev/null
dd if=/dev/urandom count=1 bs=${BLOCK} of=rnd2.bin 2>/dev/null

# Create a large file out of repetition of two random blocks
rm -f infile.bin
for n in {1..100}; do
    cat rnd1.bin rnd2.bin >> infile.bin
done

# gzip -1 baseline.
cat infile.bin | gzip -1 > outfile-gzip.bin

for mode in NO_FLUSH SYNC_FLUSH; do

    sed -i "s/NO_FLUSH/${mode}/" igzip_example.c
    sed -i "s/SYNC_FLUSH/${mode}/" igzip_example.c

    cc -I../include -DLARGE_WINDOW -o igzip_example igzip_example.c  -L ../.libs -lisal

    ./igzip_example infile.bin outfile-isal-${mode}.bin
done

ls -al outfile-*.bin

gbtucker · 2016-10-27T04:42:28Z

I looked into this case and it's not that large_window or sync_flush isn't working. The dataset is highly compressible with lots of large matches. That the 16000 case compresses shows that large_window must be finding matches of distance > 8k even with sync_flush on. After an initial set of literals there are mostly matches of max size 258. At the end of an input segment when sync_flush is on the input must be drained resulting in a partial match of less than 258. This results in a few more literals before max matches resume. Since the compression ratio is so high on this dataset a few literals makes a big difference. The other contributor is that when sync flush is on, a new header must be added at the start of each input block. This contributes over 100 bytes every 8k.

vlm · 2016-10-27T23:43:55Z

What I see is that ISA-L does significantly (2 times) worse on my streaming data examples. I've prepared an apples-to-apples comparison code which executes zlib and isal in the same line-by-line mode (the line is about 1000 bytes long) and compared the output:

https://github.com/vlm/isa-l/commit/f77f252d3bb95659fbe7b97e82794e8eb418ba15

The scripts and the test file are in the above patch. The outcome:

./test-isal-vs-zlib.sh test.json

Observe:

-rw-rw-r--. 1 vlm vlm 121379 Oct 27 23:38 outfile-gzip-1.bin
-rw-rw-r--. 1 vlm vlm 278236 Oct 27 23:38 outfile-isal.bin
-rw-rw-r--. 1 vlm vlm 142806 Oct 27 23:38 outfile-zlib-1.bin
-rw-rw-r--. 1 vlm vlm 592008 Oct 27 23:36 test.json

Note that gzip-1 and zlib's deflate in a line-per-line mode yields about 4.1 .. 4.8x compression, whereas isa-l yields out about 2.1x compression under the same conditions. Although ISA-L is very fast (thank you!), the 2.1 compression is not very useful, and it is indicative of some problem that will pop up for all customers trying to use a library in streaming contexts. The default zlib doesn't exhibit this problem.

gbtucker · 2016-10-28T02:09:14Z

It's true I didn't expect a common usage to be sync_flush so often on such small buffers. Is sync_flush really necessary on every line?

Zlib probably puts a type 1, constant static block, for each line. Without sync_flush on small buffers we do much better.

version 2.16:
$ ./igzip_file_perf test.json
  file test.json - in_size=592008 out_size=152714 iter=844 ratio_default=25.8% ratio_custom=21.8%
  igzip_file: runtime =     878183 usecs, bandwidth 476 MB in 0.8782 sec = 568.96 MB/s

prelim next release:
$ ./igzip_file_perf test.json
  file test.json - in_size=592008 out_size=140757 iter=844 ratio_default=23.8% ratio_custom=19.1%
  igzip_file: runtime =     614582 usecs, bandwidth 476 MB in 0.6146 sec = 813.00 MB/s

Would this make it worth it to delay the sync_flush in your usage?

vlm · 2016-10-28T05:31:28Z

The use case is streaming compression, such as WebSocket's RFC 7692. It requires a sync flush and mandates removing the terminating 00 00 ff ff sequence for each message transmitted. My test replicates this behavior for typical data representative to whatever we see in the wild (the data is anonymized, but has similar entropy).

So, as much as we'd like the to avoid spontaneous flushing, this flushing has semantic significance and can't be avoided for this class of applications.

gbtucker · 2016-10-28T20:42:38Z

OK, perhaps there are some features we can add to help with the compression ratio in this case.

vlm · 2016-12-16T00:36:09Z

Any news on that or plans to address somehow?

gbtucker · 2016-12-16T00:46:55Z

Yes, we have pulled in the feature to do static (btype 01) headers. This is the real issue on compression ratio with very small inputs and sync flush on.

vlm · 2017-01-11T06:09:55Z

Before the latest changes (i.e., copied from #9 (comment)):

./test-isal-vs-zlib.sh test.json

Observe:

-rw-rw-r--. 1 vlm vlm 121379 Oct 27 23:38 outfile-gzip-1.bin
-rw-rw-r--. 1 vlm vlm 278236 Oct 27 23:38 outfile-isal.bin
-rw-rw-r--. 1 vlm vlm 142806 Oct 27 23:38 outfile-zlib-1.bin
-rw-rw-r--. 1 vlm vlm 592008 Oct 27 23:36 test.json

After the latest changes:

./test-isal-vs-zlib.sh test.json

-rw-rw-r--. 1 vlm vlm 121379 Jan 11 06:03 outfile-gzip-1.bin
-rw-rw-r--. 1 vlm vlm 213130 Jan 11 06:03 outfile-isal.bin
-rw-rw-r--. 1 vlm vlm 142806 Jan 11 06:03 outfile-zlib-1.bin

From what I see, no tangible difference has been made on this type of load (278k->213k ISA-L vs 142k zlib). ISA-L still gives 50% less efficient compression than Z_BEST_SPEED zlib.

All the sources are in https://github.com/vlm/isa-l/commit/12aa33a1f6cce53a0f7e6ac3b5744056a94cb722, just ./configure --disable-shared CFLAGS=-DLARGE_WINDOW.

gbtucker · 2017-01-11T17:06:27Z

When you have such small blocks and sync flushing each one set for static hufftables.

isal_deflate_set_hufftables(stream, NULL, IGZIP_HUFFTABLE_STATIC);

vlm · 2017-01-12T22:51:50Z

With isal_deflate_set_hufftables:

New:
-rw-rw-r--. 1 vlm vlm 121379 Jan 12 22:17 outfile-gzip-1.bin
-rw-rw-r--. 1 vlm vlm 148602 Jan 12 22:17 outfile-isal.bin
-rw-rw-r--. 1 vlm vlm 142806 Jan 12 22:17 outfile-zlib-1.bin
-rw-rw-r--. 1 vlm vlm 592008 Jan 12 22:09 test.json

So, it is largely solved! Thank you very much!

vlm · 2017-01-12T22:56:07Z

However, there's another anomaly.

I tried to do the custom hufftables based on a 456k file. Now here's the result with DEFAULT hufftables, STATIC hufftables, and then a pair of hufftables_default and hufftables_static that were generated by the generate_custom_hufftables utility.

I would expect the custom hufftables to be markedly more efficient (in STATIC or DEFAULT) modes than the default ones shipped with igzip. However, they aren't:

Uncompressed	DEFAULT	STATIC	"_default" CUSTOM	"_static" CUSTOM
456810	81502	81271	67787	81271

As you see, the custom hufftable in the DEFAULT mode works best, whereas in a STATIC mode the file outputs are exactly the same (81271 bytes!), irrespectively of whether the default (shipped) table is used or the generated one.

With another sample file:

Uncompressed	DEFAULT	STATIC	"_default" CUSTOM	"_static" CUSTOM
283375	102774	72137	80255	72137

And another:

Uncompressed	DEFAULT	STATIC	"_default" CUSTOM	"_static" CUSTOM
2315	1582	1360	1333	1360

In the last examples I am not finding any advantage of using generated custom static tables over the default static ones. Is this expected?

gbtucker · 2017-01-12T23:31:14Z

That's great. I hope you see the point of isal_deflate_set_hufftables() is to give more control to the programmer especially when you know something about your data that you would rather not the compressor guess at. As such setting IGZIP_HUFFTABLE_STATIC will force the use of btype 01 hufftables and header as specified by the standard. If you changed the default hufftables this doesn't effect the format of the standard btype 01 static table so having the same compression ratio is expected. The benefit being the 01 header is 100 bytes shorter.
For your use case I would suggest a decision is made based on the size of input.

if (length < 1024)  // or some threshold where a 100B header is worth it
     set to btype 01 hufftable
else
     set to custom hufftable

Then you can have the best of both worlds.

vlm · 2017-01-13T00:19:24Z

Thank you!

vlm closed this as completed Jan 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LARGE_WINDOW 32kb windows #9

LARGE_WINDOW 32kb windows #9

vlm commented Oct 22, 2016 •

edited

Loading

gbtucker commented Oct 24, 2016

vlm commented Oct 24, 2016 •

edited

Loading

gbtucker commented Oct 25, 2016

vlm commented Oct 26, 2016 •

edited

Loading

gbtucker commented Oct 27, 2016

vlm commented Oct 27, 2016 •

edited

Loading

gbtucker commented Oct 28, 2016

vlm commented Oct 28, 2016 •

edited

Loading

gbtucker commented Oct 28, 2016

vlm commented Dec 16, 2016

gbtucker commented Dec 16, 2016

vlm commented Jan 11, 2017 •

edited

Loading

gbtucker commented Jan 11, 2017

vlm commented Jan 12, 2017

vlm commented Jan 12, 2017 •

edited

Loading

gbtucker commented Jan 12, 2017

vlm commented Jan 13, 2017

LARGE_WINDOW 32kb windows #9

LARGE_WINDOW 32kb windows #9

Comments

vlm commented Oct 22, 2016 • edited Loading

gbtucker commented Oct 24, 2016

vlm commented Oct 24, 2016 • edited Loading

gbtucker commented Oct 25, 2016

vlm commented Oct 26, 2016 • edited Loading

test-compression.sh file

gbtucker commented Oct 27, 2016

vlm commented Oct 27, 2016 • edited Loading

gbtucker commented Oct 28, 2016

vlm commented Oct 28, 2016 • edited Loading

gbtucker commented Oct 28, 2016

vlm commented Dec 16, 2016

gbtucker commented Dec 16, 2016

vlm commented Jan 11, 2017 • edited Loading

gbtucker commented Jan 11, 2017

vlm commented Jan 12, 2017

vlm commented Jan 12, 2017 • edited Loading

gbtucker commented Jan 12, 2017

vlm commented Jan 13, 2017

vlm commented Oct 22, 2016 •

edited

Loading

vlm commented Oct 24, 2016 •

edited

Loading

vlm commented Oct 26, 2016 •

edited

Loading

`test-compression.sh` file

vlm commented Oct 27, 2016 •

edited

Loading

vlm commented Oct 28, 2016 •

edited

Loading

vlm commented Jan 11, 2017 •

edited

Loading

vlm commented Jan 12, 2017 •

edited

Loading