@Cyan4973 Cyan4973 released this Oct 19, 2018 · 100 commits to dev since this release

Assets 9

This is minor fix release building upon v1.3.6.

The main reason we publish this new version is that @indygreg detected an important compression ratio regression for a specific scenario (compressing with dictionary at level 9 or 10 for small data, or 11 - 12 for large data) . We don't anticipate this scenario to be common : dictionary compression is still rare, then most users prefer fast modes (levels <=3), a few rare ones use strong modes (level 15-19), so "middle compression" is an extreme rarity.
But just in case some user do, we publish this release.

A few other minor things were ongoing and are therefore bundled.

Decompression speed might be slightly better with clang, depending on exact target and version. We could observe as mush as 7% speed gains in some cases, though in other cases, it's rather in the ~2% range.

The integrated backtrace functionality in the cli is updated : its presence can be more easily controlled, invoking BACKTRACE build macro. The automatic detector is more restrictive, and release mode builds without it by default. We want to be sure the default make compiles without any issue on most platforms.

Finally, the list of man pages has been completed with documentation for zstdless and zstdgrep, by @samrussell .

Detailed list of changes

  • perf: slightly better decompression speed on clang (depending on hardware target)
  • fix : ratio for dictionary compression at levels 9 and 10, reported by @indygreg
  • build: no longer build backtrace by default in release mode; restrict further automatic mode
  • build: control backtrace support through build macro BACKTRACE
  • misc: added man pages for zstdless and zstdgrep, by @samrussell

@Cyan4973 Cyan4973 released this Oct 5, 2018 · 33 commits to master since this release

Assets 6

Zstandard v1.3.6 release is focused on intensive dictionary compression for database scenarios.

This is a new environment we are experimenting. The success of dictionary compression on small data, of which databases tend to store plentiful, led to increased adoption, and we now see scenarios where literally thousands of dictionaries are being used simultaneously, with permanent generation or update of new dictionaries.

To face these new conditions, v1.3.6 brings a few improvements to the table :

  • A brand new, faster dictionary builder, by @JenniferLiu, under guidance from @terrelln. The new builder, named fastcover, is about 10x faster than our previous default generator, cover, while suffering only negligible accuracy losses (<1%). It's effectively an approximative version of cover, which throws away accuracy for the benefit of speed and memory. The new dictionary builder is so effective that it has become our new default dictionary builder (--train). Slower but higher quality generator remains accessible using --train-cover command.

Here is an example, using the "github user records" public dataset (about 10K records of about 1K each) :

builder algorithm generation time compression ratio
fast cover (v1.3.6 --train) 0.9 s x10.29
cover (v1.3.5 --train) 10.1 s x10.31
High accuracy fast cover (--train-fastcover) 6.6 s x10.65
High accuracy cover (--train-cover) 50.5 s x10.66
  • Faster dictionary decompression under memory pressure, when using thousands of dictionaries simultaneously. The new decoder is able to detect cold vs hot dictionary scenarios, and adds clever prefetching decisions to minimize memory latency. It typically improves decoding speed by ~+30% (vs v1.3.5).

  • Faster dictionary compression under memory pressure, when using a lot of contexts simultaneously. The new design, by @felixhandte, reduces considerably memory usage when compressing small data with dictionaries, which is the main scenario found in databases. The sharp memory usage reduction makes it easier for CPU caches to manages multiple contexts in parallel. Speed gains scale with number of active contexts, as shown in the graph below :
    Dictionary compression : Speed vs Nb Active Contexts

    Note that, in real-life environment, benefits are present even faster, since cpu caches tend to be used by multiple other process / threads at the same time, instead of being monopolized by a single synthetic benchmark.

Other noticeable improvements

A new command --adapt, makes it possible to pipe gigantic amount of data between servers (typically for backup scenarios), and let the compressor automatically adjust compression level based on perceived network conditions. When the network becomes slower, zstd will use available time to compress more, and accelerate again when bandwidth permit. It reduces the need to "pre-calibrate" speed and compression level, and is a good simplification for system administrators. It also results in gains for both dimensions (better compression ratio and better speed) compared to the more traditional "fixed" compression level strategy.
This is still early days for this feature, and we are eager to get feedback on its usages. We know it works better in fast bandwidth environments for example, as adaptation itself becomes slow when bandwidth is slow. This is something that will need to be improved. Nonetheless, in its current incarnation, --adapt already proves useful for several datacenter scenarios, which is why we are releasing it.

Advanced users will be please by the expansion of an existing tool, tests/paramgrill, which has been refined by @georgelu. This tool explores the space of advanced compression parameters, to find the best possible set of compression parameters for a given scenario. It takes as input a set of samples, and a set of constraints, and works its way towards better and better compression parameters respecting the constraints.

Example :

./paramgrill --optimize=cSpeed=50M dirToSamples/*   # requires minimum compression speed of 50 MB/s
optimizing for dirToSamples/* - limit compression speed 50 MB/s

(...)

/*   Level  5   */       { 20, 18, 18,  2,  5,  2,ZSTD_greedy  ,  0 },     /* R:3.147 at  75.7 MB/s - 567.5 MB/s */   # best level satisfying constraint
--zstd=windowLog=20,chainLog=18,hashLog=18,searchLog=2,searchLength=5,targetLength=2,strategy=3,forceAttachDict=0

(...)

/* Custom Level */       { 21, 16, 18,  2,  6,  0,ZSTD_lazy2   ,  0 },     /* R:3.240 at  53.1 MB/s - 661.1 MB/s */  # best custom parameters found
--zstd=windowLog=21,chainLog=16,hashLog=18,searchLog=2,searchLength=6,targetLength=0,strategy=5,forceAttachDict=0   # associated command arguments, can be copy/pasted for `zstd`

Finally, documentation has been updated, to reflect wording adopted by IETF RFC 8478 (Zstandard Compression and the application/zstd Media Type).

Detailed changes list

  • perf: much faster dictionary builder, by @JenniferLiu
  • perf: faster dictionary compression on small data when using multiple contexts, by @felixhandte
  • perf: faster dictionary decompression when using a very large number of dictionaries simultaneously
  • cli : fix : does no longer overwrite destination when source does not exist (#1082)
  • cli : new command --adapt, for automatic compression level adaptation
  • api : fix : block api can be streamed with > 4 GB, reported by @catid
  • api : reduced ZSTD_DDict size by 2 KB
  • api : minimum negative compression level is defined, and can be queried using ZSTD_minCLevel() (#1312).
  • build: support Haiku target, by @korli
  • build: Read Legacy support is now limited to v0.5+ by default. Can be changed at compile time with macro ZSTD_LEGACY_SUPPORT.
  • doc : zstd_compression_format.md updated to match wording in IETF RFC 8478
  • misc: tests/paramgrill, a parameter optimizer, by @GeorgeLu97

@Cyan4973 Cyan4973 released this Jun 28, 2018 · 512 commits to master since this release

Assets 6

Zstandard v1.3.5 is a maintenance release focused on dictionary compression performance.

Compression is generally associated with the act of willingly requesting the compression of some large source. However, within datacenters, compression brings its best benefits when completed transparently. In such scenario, it's actually very common to compress a large number of very small blobs (individual messages in a stream or log, or records in a cache or datastore, etc.). Dictionary compression is a great tool for these use cases.

This release makes dictionary compression significantly faster for these situations, when compressing small to very small data (inputs up to ~16 KB).

Dictionary compression : speed vs input size

The above image plots the compression speeds at different input sizes for zstd v1.3.4 (red) and v1.3.5 (green), at levels 1, 3, 9, and 18.
The benchmark data was gathered on an Intel Xeon CPU E5-2680 v4 @ 2.40GHz. The benchmark was compiled with clang-7.0, with the flags -O3 -march=native -mtune=native -DNDEBUG. The file used in the results shown here is the osdb file from the Silesia corpus, cut into small blocks. It was selected because it performed roughly in the middle of the pack among the Silesia files.

The new version saves substantial initialization time, which is increasingly important as the average size to compress becomes smaller. The impact is even more perceptible at higher levels, where initialization costs are higher. For larger inputs, performance remain similar.

Users can expect to measure substantial speed improvements for inputs smaller than 8 KB, and up to 32 KB depending on the context. The expected speed-up ranges from none (large, incompressible blobs) to many times faster (small, highly compressible inputs). Real world examples up to 15x have been observed.

Other noticeable improvements

The compression levels have been slightly adjusted, taking into consideration the higher top speed of level 1 since v1.3.4, and making level 19 a substantially stronger compression level while preserving the 8 MB window size limit, hence keeping an acceptable memory budget for decompression.

It's also possible to select the content of libzstd by modifying macro values at compilation time. By default, libzstd contains everything, but its size can be made substantially smaller by removing support for the dictionary builder, or legacy formats, or deprecated functions. It's even possible to build a compression-only or a decompression-only library.

Detailed changes list

  • perf: much faster dictionary compression, by @felixhandte
  • perf: small quality improvement for dictionary generation, by @terrelln
  • perf: improved high compression levels (notably level 19)
  • mem : automatic memory release for long duration contexts
  • cli : fix : overlapLog can be manually set
  • cli : fix : decoding invalid lz4 frames
  • api : fix : performance degradation for dictionary compression when using advanced API, by @terrelln
  • api : change : clarify ZSTD_CCtx_reset() vs ZSTD_CCtx_resetParameters(), by @terrelln
  • build: select custom libzstd scope through control macros, by @GeorgeLu97
  • build: OpenBSD support, by @bket
  • build: make and make all are compatible with -j
  • doc : clarify zstd_compression_format.md, updated for IETF RFC process
  • misc: pzstd compatible with reproducible compilation, by @lamby

Known bug

zstd --list does not work with non-interactive tty.
This issue is fixed in dev branch.

@Cyan4973 Cyan4973 released this Mar 26, 2018 · 908 commits to master since this release

Assets 6

The v1.3.4 release of Zstandard is focused on performance, and offers nice speed boost in most scenarios.

Asynchronous compression by default for zstd CLI

zstd cli will now performs compression in parallel with I/O operations by default. This requires multi-threading capability (which is also enabled by default).
It doesn't sound like much, but effectively improves throughput by 20-30%, depending on compression level and underlying I/O performance.

For example, on a Mac OS-X laptop with an Intel Core i7-5557U CPU @ 3.10GHz, running time zstd enwik9 at default compression level (2) on a SSD gives the following :

Version real time
1.3.3 9.2s
1.3.4 --single-thread 8.8s
1.3.4 (asynchronous) 7.5s

This is a nice boost to all scripts using zstd cli, typically in network or storage tasks. The effect is even more pronounced at faster compression setting, since the CLI overlaps a proportionally higher share of compression with I/O.

Previous default behavior (blocking single thread) is still available, accessible through --single-thread long command. It's also the only mode available when no multi-threading capability is detected.

General speed improvements

Some core routines have been refined to provide more speed on newer cpus, making better use of their out-of-order execution units. This is more sensible on the decompression side, and even more so with gcc compiler.

Example on the same platform, running in-memory benchmark zstd -b1 silesia.tar :

Version C.Speed D.Speed
1.3.3 llvm9 290 MB/s 660 MB/s
1.3.4 llvm9 304 MB/s 700 MB/s (+6%)
1.3.3 gcc7 280 MB/s 710 MB/s
1.3.4 gcc7 300 MB/s 890 MB/s (+25%)

Faster compression levels

So far, compression level 1 has been the fastest one available. Starting with v1.3.4, there will be additional choices. Faster compression levels can be invoked using negative values.
On the command line, the equivalent one can be triggered using --fast[=#] command.

Negative compression levels sample data more sparsely, and disable Huffman compression of literals, translating into faster decoding speed.

It's possible to create one's own custom fast compression level
by using strategy ZSTD_fast, increasing ZSTD_p_targetLength to desired value,
and turning on or off literals compression, using ZSTD_p_compressLiterals.

Performance is generally on par or better than other high speed algorithms. On below benchmark (compressing silesia.tar on an Intel Core i7-6700K CPU @ 4.00GHz) , it ends up being faster and stronger on all metrics compared with quicklz and snappy at --fast=2. It also compares favorably to lzo with --fast=3. lz4 still offers a better speed / compression combo, with zstd --fast=4 approaching close.

name ratio compression decompression
zstd 1.3.4 --fast=5 1.996 770 MB/s 2060 MB/s
lz4 1.8.1 2.101 750 MB/s 3700 MB/s
zstd 1.3.4 --fast=4 2.068 720 MB/s 2000 MB/s
zstd 1.3.4 --fast=3 2.153 675 MB/s 1930 MB/s
lzo1x 2.09 -1 2.108 640 MB/s 810 MB/s
zstd 1.3.4 --fast=2 2.265 610 MB/s 1830 MB/s
quicklz 1.5.0 -1 2.238 540 MB/s 720 MB/s
snappy 1.1.4 2.091 530 MB/s 1820 MB/s
zstd 1.3.4 --fast=1 2.431 530 MB/s 1770 MB/s
zstd 1.3.4 -1 2.877 470 MB/s 1380 MB/s
brotli 1.0.2 -0 2.701 410 MB/s 430 MB/s
lzf 3.6 -1 2.077 400 MB/s 860 MB/s
zlib 1.2.11 -1 2.743 110 MB/s 400 MB/s

Applications which were considering Zstandard but were worried of being CPU-bounded are now able to shift the load from CPU to bandwidth on a larger scale, and may even vary temporarily their choice depending on local conditions (to deal with some sudden workload surge for example).

Long Range Mode with Multi-threading

zstd-1.3.2 introduced the long range mode, capable to deduplicate long distance redundancies in a large data stream, a situation typical in backup scenarios for example. But its usage in association with multi-threading was discouraged, due to inefficient use of memory.
zstd-1.3.4 solves this issue, by making long range match finder run in serial mode, like a pre-processor, before passing its result to backend compressors (regular zstd). Memory usage is now bounded to the maximum of the long range window size, and the memory that zstdmt would require without long range matching. As the long range mode runs at about 200 MB/s, depending on the number of cores available, it's possible to tune compression level to match the LRM speed, which becomes the upper limit.

zstd -T0  -5  --long    file # autodetect threads, level 5, 128 MB window
zstd -T16 -10 --long=31 file # 16 threads, level 10, 2 GB window

As illustration, benchmarks of the two files "Linux 4.7 - 4.12" and "Linux git" from the 1.3.2 release are shown below. All compressors are run with 16 threads, except "zstd single 2 GB". zstd compressors are run with either a 128 MB or 2 GB window size, and lrzip compressor is run with lzo, gzip, and xz backends. The benchmarks were run on a 16 core Sandy Bridge @ 2.2 GHz.

Linux 4.7 - 12 compression ratio vs speed
Linux git compression ratio vs speed

The association of Long Range Mode with multi-threading is pretty compelling for large stream scenarios.

Miscellaneous

This release also brings its usual list of small improvements and bug fixes, as detailed below :

  • perf: faster speed (especially decoding speed) on recent cpus (haswell+)
  • perf: much better performance associating --long with multi-threading, by @terrelln
  • perf: better compression at levels 13-15
  • cli : asynchronous compression by default, for faster experience (use --single-thread for former behavior)
  • cli : smoother status report in multi-threading mode
  • cli : added command --fast=#, for faster compression modes
  • cli : fix crash when not overwriting existing files, by Pádraig Brady (@pixelb)
  • api : nbThreads becomes nbWorkers : 1 triggers asynchronous mode
  • api : compression levels can be negative, for even more speed
  • api : ZSTD_getFrameProgression() : get precise progress status of ZSTDMT anytime
  • api : ZSTDMT can accept new compression parameters during compression
  • api : implemented all advanced dictionary decompression prototypes
  • build: improved meson recipe, by Shawn Landden (@shawnl)
  • build: VS2017 scripts, by @HaydnTrigg
  • misc: all /contrib projects fixed
  • misc: added /contrib/docker script by @gyscos

@Cyan4973 Cyan4973 released this Dec 21, 2017 · 1269 commits to master since this release

Assets 4

This is bugfix release, mostly focused on cleaning several detrimental corner cases scenarios.
It is nonetheless a recommended upgrade.

Changes Summary

  • perf: improved zstd_opt strategy (levels 16-19)
  • fix : bug #944 : multithreading with shared ditionary and large data, reported by @gsliepen
  • cli : change : -o can be combined with multiple inputs, by @terrelln
  • cli : fix : content size written in header by default
  • cli : fix : improved LZ4 format support, by @felixhandte
  • cli : new : hidden command -b -S, to benchmark multiple files and generate one result per file
  • api : change : when setting pledgedSrcSize, use ZSTD_CONTENTSIZE_UNKNOWN macro value to mean "unknown"
  • api : fix : support large skippable frames, by @terrelln
  • api : fix : re-using context could result in suboptimal block size in some corner case scenarios
  • api : fix : streaming interface was adding a useless 3-bytes null block to small frames
  • build: fix : compilation under rhel6 and centos6, reported by @pixelb
  • build: added check target
  • build: improved meson support, by @shawnl

@Cyan4973 Cyan4973 released this Oct 9, 2017 · 1467 commits to master since this release

Assets 4

Zstandard Long Range Match Finder

Zstandard has a new long range match finder written by Facebook's intern Stella Lau (@stellamplau), which specializes on finding long matches in the distant past. It integrates seamlessly with the regular compressor, and the output can be decompressed just like any other Zstandard compressed data.

The long range match finder adds minimal overhead to the compressor, works with any compression level, and maintains Zstandard's blazingly fast decompression speed. However, since the window size is larger, it requires more memory for compression and decompression.

To go along with the long range match finder, we've increased the maximum window size to 2 GB. The decompressor only accepts window sizes up to 128 MB by default, but zstd -d --memory=2GB will decompress window sizes up to 2 GB.

Example usage

# 128 MB window size
zstd -1 --long file
zstd -d file.zst

# 2 GB window size (window log = 31)
zstd -6 --long=31 file
zstd -d --long=31 file.zst
# OR
zstd -d --memory=2GB file.zst
ZSTD_CCtx *cctx = ZSTD_createCCtx();
ZSTD_CCtx_setParameter(cctx, ZSTD_p_compressionLevel, 19);
ZSTD_CCtx_setParameter(cctx, ZSTD_p_enableLongDistanceMatching, 1); // Sets windowLog=27
ZSTD_CCtx_setParameter(cctx, ZSTD_p_windowLog, 30); // Optionally increase the window log
ZSTD_compress_generic(cctx, &out, &in, ZSTD_e_end);

ZSTD_DCtx *dctx = ZSTD_createDCtx();
ZSTD_DCtx_setMaxWindowSize(dctx, 1 << 30);
ZSTD_decompress_generic(dctx, &out, &in);

Benchmarks

We compared the zstd long range matcher to zstd and lrzip. The benchmarks were run on an AMD Ryzen 1800X (8 cores with 16 threads at 3.6 GHz).

Compressors

  • zstd — The regular Zstandard compressor.
  • zstd 128 MB — The Zstandard compressor with a 128 MB window size.
  • zstd 2 GB — The Zstandard compressor with a 2 GB window size.
  • lrzip xz — The lrzip compressor with default options, which uses the xz backend at level 7 with 16 threads.
  • lrzip xz single — The lrzip compressor with a single-threaded xz backend at level 7.
  • lrzip zstd — The lrzip compressor without a backend, then its output is compressed by zstd (not multithreaded).

Files

  • Linux 4.7 - 4.12 — This file consists of the uncompressed tarballs of the six Linux kernel release from 4.7 to 4.12 concatenated together in order. This file is extremely compressible if the compressor can match against the previous versions well.
  • Linux git — This file is a tarball of the linux repo, created by git clone https://github.com/torvalds/linux && tar -cf linux-git.tar linux/. This file gets a small benefit from long range matching. This file shows how the long range matcher performs when there isn't too many matches to find.

Results

Both zstd and zstd 128 MB don't have large enough of a window size to compress Linux 4.7 - 4.12 well. zstd 2 GB compresses the fastest, and slightly better than lrzip-zstd. lrzip-xz compresses the best, and at a reasonable speed with multithreading enabled. The place where zstd shines is decompression ease and speed. Since it is just regular Zstandard compressed data, it is decompressed by the highly optimized decompressor.

The Linux git file shows that the long range matcher maintains good compression and decompression speed, even when there are far less long range matches. The decompression speed takes a small hit because it has to look further back to reconstruct the matches.

Compression Ratio vs Speed Decompression Speed
Linux 4.7 - 12 compression ratio vs speed Linux 4.7 - 12 decompression speed
Linux git compression ratio vs speed Linux git decompression speed

Implementation details

The long distance match finder was inspired by great work from Con Kolivas' lrzip, which in turn was inspired by Andrew Tridgell's rzip. Also, let's mention Bulat Ziganshin's srep, which we have not been able to test unfortunately (site down), but the discussions on encode.ru proved great sources of inspiration.

Therefore, many similar mechanisms are adopted, such as using a rolling hash, and filling a hash table divided into buckets of entries.

That being said, we also made different choices, with the goal to favor speed, as can be observed in benchmark. The rolling hash formula is selected for computing efficiency. There is a restrictive insertion policy, which only inserts candidates that respect a mask condition. The insertion policy allows us to skip the hash table in the common case that a match isn't present. Confirmation bits are saved, to only check for matches when there is a strong presumption of success. These and a few more details add up to make zstd's long range matcher a speed-oriented implementation.

The biggest difference though is that the long range matcher is blended into the regular compressor, producing a single valid zstd frame, undistinguishable from normal operation (except obviously for the larger window size). This makes decompression a single pass process, preserving its speed property.

More details are available directly in source code, at lib/compress/zstd_ldm.c.

Future work

This is a first implementation, and it still has a few limitations, that we plan to lift in the future.

The long range matcher doesn't interact well with multithreading. Due to the way zstd multithreading is currently implemented, memory usage will scale with the window size times the number of threads, which is a problem for large window sizes. We plan on supporting multithreaded long range matching with reasonable memory usage in a future version.

Secondly, Zstandard is currently limited to a 2 GB window size because of indexer's design. While this is a significant update compared to previous 128 MB limit, we believe this limitation can be lifted altogether, with some structural changes in the indexer. However, it also means that window size would become really big, with knock-off consequences on memory usage. So, to reduce this load, we will have to consider memory map as a complementary way to reference past content in the uncompressed file.

Detailed list of changes

  • new : long range mode, using --long command, by Stella Lau (@stellamplau)
  • new : ability to generate and decode magicless frames (#591)
  • changed : maximum nb of threads reduced to 200, to avoid address space exhaustion in 32-bits mode
  • fix : multi-threading compression works with custom allocators, by @terrelln
  • fix : a rare compression bug when compression generates very large distances and bunch of other conditions (only possible at --ultra -22)
  • fix : 32-bits build can now decode large offsets (levels 21+)
  • cli : added LZ4 frame support by default, by Felix Handte (@felixhandte)
  • cli : improved --list output
  • cli : new : can split input file for dictionary training, using command -B#
  • cli : new : clean operation artefact on Ctrl-C interruption (#854)
  • cli : fix : do not change /dev/null permissions when using command -t with root access, reported by @mike155 (#851)
  • cli : fix : write file size in header in multiple-files mode
  • api : added macro ZSTD_COMPRESSBOUND() for static allocation
  • api : experimental : new advanced decompression API
  • api : fix : sizeof_CCtx() used to over-estimate
  • build: fix : compilation works with -mbmi (#868)
  • build: fix : no-multithread variant compiles without pool.c dependency, reported by Mitchell Blank Jr (@mitchblank) (#819)
  • build: better compatibility with reproducible builds, by Bernhard M. Wiedemann (@bmwiedemann) (#818)
  • example : added streaming_memory_usage
  • license : changed /examples license to BSD + GPLv2
  • license : fix a few header files to reflect new license (#825)

Warning

bug #944 : v1.3.2 is known to produce corrupted data in the following scenario, requiring all these conditions simultaneously :

  • compression using multi-threading
  • with a dictionary
  • on "large enough" files (several MB, exact threshold depends on compression level)

Note that dictionary is meant to help compression of small files (a few KB), while multi-threading is only useful for large files, so it's pretty rare to need both at the same time. Nonetheless, if your application happens to trigger this situation, it's recommended to skip v1.3.2 for a newer version. At the time of this warning, the dev branch is known to work properly for the same scenario.

@Cyan4973 Cyan4973 released this Aug 20, 2017 · 1796 commits to master since this release

Assets 4
  • New license : BSD + GPLv2
  • perf: substantially decreased memory usage in Multi-threading mode, thanks to reports by Tino Reichardt (@mcmilk)
  • perf: Multi-threading supports up to 256 threads. Cap at 256 when more are requested (#760)
  • cli : improved and fixed --list command, by @ib (#772)
  • cli : command -vV lists supported formats, by @ib (#771)
  • build : fixed binary variants, reported by @svenha (#788)
  • build : fix Visual compilation for non x86/x64 targets, reported by @GregSlazinski (#718)
  • API exp : breaking change : ZSTD_getframeHeader() provides more information
  • API exp : breaking change : pinned down values of error codes
  • doc : fixed huffman example, by Ulrich Kunitz (@ulikunitz)
  • new : contrib/adaptive-compression, I/O driven compression level, by Paul Cruz (@paulcruz74)
  • new : contrib/long_distance_matching, statistics tool by Stella Lau (@stellamplau)
  • updated : contrib/linux-kernel, by Nick Terrell (@terrelln)

@Cyan4973 Cyan4973 released this Jul 5, 2017 · 2165 commits to master since this release

Assets 4

cli : new : --list command, by @paulcruz74
cli : changed : xz/lzma support enabled by default
cli : changed : -t * continue processing list after a decompression error
API : added : ZSTD_versionString()
API : promoted to stable status : ZSTD_getFrameContentSize(), by @iburinoc
API exp : new advanced API : ZSTD_compress_generic(), ZSTD_CCtx_setParameter()
API exp : new : API for static or external allocation : ZSTD_initStatic?Ctx()
API exp : added : ZSTD_decompressBegin_usingDDict(), requested by @Crazee (#700)
API exp : clarified memory estimation / measurement functions.
API exp : changed : strongest strategy renamed ZSTD_btultra, fastest strategy ZSTD_fast set to 1
Improved : reduced stack memory usage, by @terrelln and @stellamplau
tools : decodecorpus can generate random dictionary-compressed samples, by @paulcruz74
new : contrib/seekable_format, demo and API, by @iburinoc
changed : contrib/linux-kernel, updated version and license, by @terrelln

@Cyan4973 Cyan4973 released this May 4, 2017 · 2629 commits to master since this release

Assets 4

Major features :

  • Multithreading is enabled by default in the cli. Use -T# to select nb of thread. To disable multithreading, build target zstd-nomt or compile with HAVE_THREAD=0.
  • New dictionary builder named "cover" with improved quality (produces better compression ratio), by @terrelln. Legacy dictionary builder remains available, using --train-legacy command.

Other changes :
cli : new : command -T0 means "detect and use nb of cores", by @iburinoc
cli : new : zstdmt symlink hardwired to zstd -T0
cli : new : command --threads=# (#671)
cli : new : commands --train-cover and --train-legacy, to select dictionary algorithm and parameters
cli : experimental targets zstd4 and xzstd4, supporting lz4 format, by @iburinoc
cli : fix : does not output compressed data on console
cli : fix : ignore symbolic links unless --force specified,
API : breaking change : ZSTD_createCDict_advanced() uses compressionParameters as argument
API : added : prototypes ZSTD_*_usingCDict_advanced(), for direct control over frameParameters.
API : improved: ZSTDMT_compressCCtx() reduced memory usage
API : fix : ZSTDMT_compressCCtx() now provides srcSize in header (#634)
API : fix : src size stored in frame header is controlled at end of frame
API : fix : enforced consistent rules for pledgedSrcSize==0 (#641)
API : fix : error code GENERIC replaced by dstSizeTooSmall when appropriate
build: improved cmake script, by @Majlen
build: enabled Multi-threading support for *BSD, by @bapt
tools: updated paramgrill. Command -O# provides best parameters for sample and speed target.
new : contrib/linux-kernel version, by @terrelln