Skip to content

Chunk Block Size vs Number of Threads

Peter Hyman edited this page Feb 2, 2023 · 4 revisions

Update

lrzip-next now employs the Dictionary coding that lzma2 uses. This reduces the dictionary set but also streamlines it. The code snippet below is obsolete. Now, dictionary size is set as follows.
2^n, 3*2^(n-1)
2^(n+1), 3*2^(n)
2^(n+2), 3*2^(n+1)

lrzip-next will attempt to fit the largest possible block size based on compression level. If the block size * number of threads is greater than the memory overhead required, it will be reduced stepwise until it fits.

Example

If threads * 2^26 (64MB) is too large, threads * 3*2^24 (48MB) will be tried
If threads * 3*2^24 is too large, threads * 2^25 (32MB) will be tried.
and so on.

This allows lrzip-next to store dictionary size in 1 byte.

See Update on LZMA Compression Memory Usage

New Methodology for setting Chunk Sizes

One of the first things I changed in lrzip-next was to alter the way that block sizes were computed in the open_stream_out function. The methodology (still employed in lrzip) favored having high number of threads, even if it meant reducing individual block sizes. A smaller block size defeats the purpose of Long Range Zip since a small block has nothing long about it. Depending on the amount of system ram, the old method created some very odd results. Consider...

1,158,307,840 byte Kernel Source, 8MB System Ram

Current lrzip

Compression Level # Threads Chunk Size # Blocks Compression Ratio LZMA Dictionary Size
5 9 128,700,872 8 6.568 1,048,576
6 9 128,073,425 8 6.649 4,194,304
7 9 55,721,681 16 7.770 16,777,216
8 8 10,485,760 85 7.610 33,554,432

lrzip-next

Compression Level # Threads Block Size # Blocks Compression Ratio LZMA Dictionary Size
5 9 128,700,872 8 7.816 16,777,216
6 5 231,661,568 5 7.850 22,020,096
7 5 231,661,568 4 7.995 22,020,096
8 5 231,661,568 4 8.001 22,020,096

It was the 85 Blocks!

When I discovered the 85 blocks, it forced a revisit to the methodology. The methodology now tries first, reduce threads, then reduce LZMA Dictionary size, test against memory limit. If OK break out. If not, reduce Dictionary by 10% and try again. This method favors larger block sizes.

The end result is a slightly smaller dictionary size than the LZMA spec would have, and fewer threads than current. The benefit is still a larger Dictionary and a much larger block size to pass to the backend compressor. The goal was to keep Dictionary size >= 16MB and to keep threads >=4.

The same methodology was applied to ZPAQ as well and its Block Sizes. This makes ZPAQ compression much more efficient.

In pseudo-code

MAX_THREADS=current_number_of_threads
MIN_THREADS=4 or current_number_of_threads, whichever is lower
While Overhead * Num_Threads > limit_usable_ram
do
    Dictionary_Size = lower power of 2^n or 3^n
    Recompute_Overhead()
    for threads = MAX_THREADS; threads > MIN_THREADS; threads--
    do
        if Overhead * threads <= limit_usable_ram; then break!
    done
    can we do again? Or, are minimums passed?
done

If still not resolved, keep reducing threads until 1.