Skip to content
  • v1.4.8
  • 97a3da1
  • Compare
    Choose a tag to compare
    Search for a tag
  • v1.4.8
  • 97a3da1
  • Compare
    Choose a tag to compare
    Search for a tag

@Cyan4973 Cyan4973 released this Dec 19, 2020

This is a minor hotfix for v1.4.7,
where an internal buffer unalignment bug was detected by @bmwiedemann .
The issue is of no consequence for x64 and arm64 targets,
but could become a problem for cpus relying on strict alignment, such as mips or older arm designs.
Additionally, some targets, like 32-bit x86 cpus, do not care much about alignment, but the code does, and will detect the misalignment and return an error code. Some other less common platforms, such as s390x, also seem to trigger the same issue.

While it's a minor fix, this update is nonetheless recommended.

Assets 8
  • v1.4.7
  • 645a297
  • Compare
    Choose a tag to compare
    Search for a tag
  • v1.4.7
  • 645a297
  • Compare
    Choose a tag to compare
    Search for a tag

@Cyan4973 Cyan4973 released this Dec 17, 2020 · 78 commits to dev since this release

Note : this version features a minor bug, which can be present on systems others than x64 and arm64. Update v1.4.8 is recommended for all other platforms.

v1.4.7 unleashes several months of improvements across many axis, from performance to various fixes, to new capabilities, of which a few are highlighted below. It’s a recommended upgrade.

(Note: if you ever wondered what happened to v1.4.6, it’s an internal release number reserved for synchronization with Linux Kernel)

Improved --long mode

--long mode makes it possible to analyze vast quantities of data in reasonable time and memory budget. The --long mode algorithm runs on top of the regular match finder, and both contribute to the final compressed outcome.
However, the fact that these 2 stages were working independently resulted in minor discrepancies at highest compression levels, where the cost of each decision must be carefully monitored. For this reason, in situations where the input is not a good fit for --long mode (no large repetition at long distance), enabling it could reduce compression performance, even if by very little, compared to not enabling it (at high compression levels). This situation made it more difficult to "just always enable" the --long mode by default.
This is fixed in this version. For compression levels 16 and up, usage of --long will now never regress compared to compression without --long. This property made it possible to ramp up --long mode contribution to the compression mix, improving its effectiveness.

The compression ratio improvements are most notable when --long mode is actually useful. In particular, --patch-from (which implicitly relies on --long) shows excellent gains from the improvements. We present some brief results here (tested on Macbook Pro 16“, i9).


Since --long mode is now always beneficial at high compression levels, it’s now automatically enabled for any window size >= 128MB and up.

Faster decompression of small blocks

This release includes optimizations that significantly speed up decompression of small blocks and small data. The decompression speed gains will vary based on the block size according to the table below:

Block Size Decompression Speed Improvement
1 KB ~+30%
2 KB ~+30%
4 KB ~+25%
8 KB ~+15%
16 KB ~+10%
32 KB ~+5%

These optimizations come from improving the process of reading the block header, and building the Huffman and FSE decoding tables. zstd’s default block size is 128 KB, and at this block size the time spent decompressing the data dominates the time spent reading the block header and building the decoding tables. But, as blocks become smaller, the cost of reading the block header and building decoding tables becomes more prominent.

CLI improvements

The CLI received several noticeable upgrades with this version.
To begin with, zstd can accept a new parameter through environment variable, ZSTD_NBTHREADS . It’s useful when zstd is called behind an application (tar, or a python script for example). Also, users which prefer multithreaded compression by default can now set a desired nb of threads with their environment. This setting can still be overridden on demand via command line.
A new command --output-dir-mirror makes it possible to compress a directory containing subdirectories (typically with -r command) producing one compressed file per source file, and reproduce the arborescence into a selected destination directory.
There are other various improvements, such as more accurate warning and error messages, full equivalence between conventions --long-command=FILE and --long-command FILE, fixed confusion risks between stdin and user prompt, or between console output and status message, as well as a new short execution summary when processing multiple files, cumulatively contributing to a nicer command line experience.

New experimental features

Shared Thread Pool

By default, each compression context can be set to use a maximum nb of threads.
In complex scenarios, there might be multiple compression contexts, working in parallel, and each using some nb of threads. In such cases, it might be desirable to control the total nb of threads used by all these compression contexts altogether.

This is now possible, by making all these compression contexts share the same threadpool. This capability is expressed thanks to a new advanced compression parameter, ZSTD_CCtx_refThreadPool(), contributed by @marxin. See its documentation for more details.

Faster Dictionary Compression

This release introduces a new experimental dictionary compression algorithm, applicable to mid-range compression levels, employing strategies such as ZSTD_greedy, ZSTD_lazy, and ZSTD_lazy2. This new algorithm can be triggered by selecting the compression parameter ZSTD_c_enableDedicatedDictSearch during ZSTD_CDict creation (experimental section).

Benchmarks show the new algorithm providing significant compression speed gains :

Level Hot Dict Cold Dict
5 ~+17% ~+30%
6 ~+12% ~+45%
7 ~+13% ~+40%
8 ~+16% ~+50%
9 ~+19% ~+65%
10 ~+24% ~+70%

We hope it will help making mid-levels compression more attractive for dictionary scenarios. See the documentation for more details. Feedback is welcome!

New Sequence Ingestion API

We introduce a new entry point, ZSTD_compressSequences(), which makes it possible for users to define their own sequences, by whatever mechanism they prefer, and present them to this new entry point, which will generate a single zstd-compressed frame, based on provided sequences.

So for example, users can now feed to the function an array of externally generated ZSTD_Sequence:
[(offset: 5, matchLength: 4, litLength: 10), (offset: 7, matchLength: 6, litLength: 3), ...] and the function will output a zstd compressed frame based on these sequences.

This experimental API has currently several limitations (and its relevant params exist in the “experimental” section). Notably, this API currently ignores any repeat offsets provided, instead always recalculating them on the fly. Additionally, there is no way to forcibly specify existence of certain zstd features, such as RLE or raw blocks.
If you are interested in this new entry point, please refer to zstd.h for more detailed usage instructions.


There are many other features and improvements in this release, and since we can’t highlight them all, they are listed below:

  • perf: stronger --long mode at high compression levels, by @senhuang42
  • perf: stronger --patch-from at high compression levels, thanks to --long improvements
  • perf: faster decompression speed for small blocks, by @terrelln
  • perf: faster dictionary compression at medium compression levels, by @felixhandte
  • perf: small speed & memory usage improvements for ZSTD_compress2(), by @terrelln
  • perf: minor generic decompression speed improvements, by @helloguo
  • perf: improved fast compression speeds with Visual Studio, by @animalize
  • cli : Set nb of threads with environment variable ZSTD_NBTHREADS, by @senhuang42
  • cli : new --output-dir-mirror DIR command, by @xxie24 (#2219)
  • cli : accept decompressing files with *.zstd suffix
  • cli : --patch-from can compress stdin when used with --stream-size, by @bimbashrestha (#2206)
  • cli : provide a condensed summary by default when processing multiple files
  • cli : fix : stdin input can no longer be confused with user prompt
  • cli : fix : console output no longer mixes stdout and status messages
  • cli : improve accuracy of several error messages
  • api : new sequence ingestion API, by @senhuang42
  • api : shared thread pool: control total nb of threads used by multiple compression jobs, by @marxin
  • api : new ZSTD_getDictID_fromCDict(), by @LuAPi
  • api : zlibWrapper only uses public API, and is compatible with dynamic library, by @terrelln
  • api : fix : multithreaded compression has predictable output even in special cases (see #2327) (issue not present on cli)
  • api : fix : dictionary compression correctly respects dictionary compression level (see #2303) (issue not present on cli)
  • api : fix : return dstSize_tooSmall error whenever appropriate
  • api : fix : ZSTD_initCStream_advanced() with static allocation and no dictionary
  • build: fix cmake script when employing path including spaces, by @terrelln
  • build: new ZSTD_NO_INTRINSICS macro to avoid explicit intrinsics
  • build: new STATIC_BMI2 macro for compile time detection of BMI2 on MSVC, by @Niadb (#2258)
  • build: improved compile-time detection of aarch64/neon platforms, by @bsdimp
  • build: Fix building on AIX 5.1, by @likema
  • build: compile paramgrill with cmake on Windows, requested by @mirh
  • build: install pkg-config file with CMake and MinGW, by @tonytheodore (#2183)
  • build: Install DLL with CMake on Windows, by @BioDataAnalysis (#2221)
  • build: fix : cli compilation with uclibc
  • misc: Improve single file library and include dictBuilder, by @cwoffenden
  • misc: Fix single file library compilation with Emscripten, by @yoshihitoh (#2227)
  • misc: Add freestanding translation script in contrib/freestanding_lib, by @terrelln
  • doc : clarify repcode updates in format specification, by @felixhandte
Assets 8
  • v1.4.5
  • b706286
  • Compare
    Choose a tag to compare
    Search for a tag
  • v1.4.5
  • b706286
  • Compare
    Choose a tag to compare
    Search for a tag

@Cyan4973 Cyan4973 released this May 22, 2020 · 772 commits to master since this release

Zstd v1.4.5 Release Notes

This is a fairly important release which includes performance improvements and new major CLI features. It also fixes a few corner cases, making it a recommended upgrade.

Faster Decompression Speed

Decompression speed has been improved again, thanks to great contributions from @terrelln.
As usual, exact mileage varies depending on files and compilers.
For x64 cpus, expect a speed bump of at least +5%, and up to +10% in favorable cases.
ARM cpus receive more benefit, with speed improvements ranging from +15% vicinity, and up to +50% for certain SoCs and scenarios (ARM‘s situation is more complex due to larger differences in SoC designs).

For illustration, some benchmarks run on a modern x64 platform using zstd -b compiled with gcc v9.3.0 :

v1.4.4 v1.4.5
silesia.tar 1568 MB/s 1653 MB/s
--- --- ---
enwik8 1374 MB/s 1469 MB/s
calgary.tar 1511 MB/s 1610 MB/s

Same platform, using clang v10.0.0 compiler :

v1.4.4 v1.4.5
silesia.tar 1439 MB/s 1496 MB/s
--- --- ---
enwik8 1232 MB/s 1335 MB/s
calgary.tar 1361 MB/s 1457 MB/s

Simplified integration

Presuming a project needs to integrate libzstd's source code (as opposed to linking a pre-compiled library), the /lib source directory can be copy/pasted into target project. Then the local build system must setup a few include directories. Some setups are automatically provided in prepared build scripts, such as Makefile, but any other 3rd party build system must do it on its own.
This integration is now simplified, thanks to @felixhandte, by making all dependencies within /lib relative, meaning it’s only necessary to setup include directories for the *.h header files that are directly included into target project (typically zstd.h). Even that task can be circumvented by copy/pasting the *.h into already established include directories.

Alternatively, if you are a fan of one-file integration strategy, @cwoffenden has extended his one-file decoder script into a full feature one-file compression library. The script will generate a file zstd.c, which contains all selected elements from the library (by default, compression and decompression). It’s then enough to import just zstd.h and the generated zstd.c into target project to access all included capabilities.


Zstandard CLI is introducing a new command line option --patch-from, which leverages existing compressors, dictionaries and long range match finder to deliver a high speed engine for producing and applying patches to files.

--patch-from is based on dictionary compression. It will consider a previous version of a file as a dictionary, to better compress a new version of same file. This operation preserves fast zstd speeds at lower compression levels. To this ends, it also increases the previous maximum limit for dictionaries from 32 MB to 2 GB, and automatically uses the long range match finder when needed (though it can also be manually overruled).
--patch-from can also be combined with multi-threading mode at a very minimal compression ratio loss.

Example usage:

# create the patch
zstd --patch-from=<oldfile> <newfile> -o <patchfile>

# apply the patch
zstd -d --patch-from=<oldfile> <patchfile> -o <newfile>`

We compared zstd to bsdiff, a popular industry grade diff engine. Our test corpus were tarballs of different versions of source code from popular GitHub repositories. Specifically:

`repos = {
    # ~31mb (small file)
    "zstd": {"url": "", "dict-branch": "refs/tags/v1.4.2", "src-branch": "refs/tags/v1.4.3"},
    # ~273mb (medium file)
    "wordpress": {"url": "", "dict-branch": "refs/tags/5.3.1", "src-branch": "refs/tags/5.3.2"},
    # ~1.66gb (large file)
    "llvm": {"url": "", "dict-branch": "refs/tags/llvmorg-9.0.0", "src-branch": "refs/tags/llvmorg-9.0.1"}

--patch-from on level 19 (with chainLog=30 and targetLength=4kb) is comparable with bsdiff when comparing patch sizes.

--patch-from greatly outperforms bsdiff in speed even on its slowest setting of level 19 boasting an average speedup of ~7X. --patch-from is >200X faster on level 1 and >100X faster (shown below) on level 3 vs bsdiff while still delivering patch sizes less than 0.5% of the original file size.



And of course, there is no change to the fast zstd decompression speed.

Addendum :

After releasing --patch-from, we were made aware of two other popular diff engines by the community: SmartVersion and Xdelta. We ran some additional benchmarks for them and here are our primary takeaways. All three tools are excellent diff engines with clear advantages (especially in speed) over the popular bsdiff. Patch sizes for both binary and text data produced by all three are pretty comparable with Xdelta underperforming Zstd and SmartVersion only slightly [1]. For patch creation speed, Xdelta is the clear winner for text data and Zstd is the clear winner for binary data [2]. And for Patch Extraction Speed (ie. decompression), Zstd is fastest in all scenarios [3]. See wiki for details.


Finally, --filelist= is a new CLI capability, which makes it possible to pass a list of files to operate upon from a file,
as opposed to listing all target files solely on the command line.
This makes it possible to prepare a list offline, save it into a file, and then provide the prepared list to zstd.
Another advantage is that this method circumvents command line size limitations, which can become a problem when operating on very large directories (such situation can typically happen with shell expansion).
In contrast, passing a very large list of filenames from within a file is free of such size limitation.

Full List

  • perf: Improved decompression speed (x64 >+5%, ARM >+15%), by @terrelln
  • perf: Automatically downsizes ZSTD_DCtx when too large for too long (#2069, by @bimbashrestha)
  • perf: Improved fast compression speed on aarch64 (#2040, ~+3%, by @caoyzh)
  • perf: Small level 1 compression speed gains (depending on compiler)
  • fix: Compression ratio regression on huge files (> 3 GB) using high levels (--ultra) and multithreading, by @terrelln
  • api: ZDICT_finalizeDictionary() is promoted to stable (#2111)
  • api: new experimental parameter ZSTD_d_stableOutBuffer (#2094)
  • build: Generate a single-file libzstd library (#2065, by @cwoffenden)
  • build: Relative includes, no longer require -I flags for zstd lib subdirs (#2103, by @felixhandte)
  • build: zstd now compiles cleanly under -pedantic (#2099)
  • build: zstd now compiles with make-4.3
  • build: Support mingw cross-compilation from Linux, by @Ericson2314
  • build: Meson multi-thread build fix on windows
  • build: Some misc icc fixes backed by new ci test on travis
  • cli: New --patch-from command, create and apply patches from files, by @bimbashrestha
  • cli: --filelist= : Provide a list of files to operate upon from a file
  • cli: -b can now benchmark multiple files in decompression mode
  • cli: New --no-content-size command
  • cli: New --show-default-cparams command
  • misc: new diagnosis tool, checked_flipped_bits, in contrib/, by @felixhandte
  • misc: Extend largeNbDicts benchmark to compression
  • misc: experimental edit-distance match finder in contrib/
  • doc: Improved beginner docs
  • doc: New issue templates for zstd
Assets 8
  • v1.4.4
  • 10f0e69
  • Compare
    Choose a tag to compare
    Search for a tag
  • v1.4.4
  • 10f0e69
  • Compare
    Choose a tag to compare
    Search for a tag

@Cyan4973 Cyan4973 released this Nov 5, 2019 · 1316 commits to master since this release

This release includes some major performance improvements and new CLI features, which make it a recommended upgrade.

Faster Decompression Speed

Decompression speed has been substantially improved, thanks to @terrelln. Exact mileage obviously varies depending on files and scenarios, but the general expectation is a bump of about +10%. The benefit is considered applicable to all scenarios, and will be perceptible for most usages.

Some benchmark figures for illustration:

v1.4.3 v1.4.4
silesia.tar 1440 MB/s 1600 MB/s
enwik8 1225 MB/s 1390 MB/s
calgary.tar 1360 MB/s 1530 MB/s

Faster Compression Speed when Re-Using Contexts

In server workloads (characterized by very high compression volume of relatively small inputs), the allocation and initialization of zstd's internal datastructures can become a significant part of the cost of compression. For this reason, zstd has long had an optimization (which we recommended for large-scale users, perhaps with something like this): when you provide an already-used ZSTD_CCtx to a compression operation, zstd tries to re-use the existing data structures, if possible, rather than re-allocate and re-initialize them.

Historically, this optimization could avoid re-allocation most of the time, but required an exact match of internal parameters to avoid re-initialization. In this release, @felixhandte removed the dependency on matching parameters, allowing the full context re-use optimization to be applied to effectively all compressions. Practical workloads on small data should expect a ~3% speed-up.

In addition to improving average performance, this change also has some nice side-effects on the extremes of performance.

  • On the fast end, it is now easier to get optimal performance from zstd. In particular, it is no longer necessary to do careful tracking and matching of contexts to compressions based on detailed parameters (as discussed for example in #1796). Instead, straightforwardly reusing contexts is now optimal.
  • Second, this change ameliorates some rare, degenerate scenarios (e.g., high volume streaming compression of small inputs with varying, high compression levels), in which it was possible for the allocation and initialization work to vastly overshadow the actual compression work. These cases are up to 40x faster, and now perform in-line with similar happy cases.

Dictionaries and Large Inputs

In theory, using a dictionary should always be beneficial. However, due to some long-standing implementation limitations, it can actually be detrimental. Case in point: by default, dictionaries are prepared to compress small data (where they are most useful). When this prepared dictionary is used to compress large data, there is a mismatch between the prepared parameters (targeting small data) and the ideal parameters (that would target large data). This can cause dictionaries to counter-intuitively result in a lower compression ratio when compressing large inputs.

Starting with v1.4.4, using a dictionary with a very large input will no longer be detrimental. Thanks to a patch from @senhuang42, whenever the library notices that input is sufficiently large (relative to dictionary size), the dictionary is re-processed, using the optimal parameters for large data, resulting in improved compression ratio.

The capability is also exposed, and can be manually triggered using ZSTD_dictForceLoad.

New commands

zstd CLI extends its capabilities, providing new advanced commands, thanks to great contributions :

  • zstd generated files (compressed or decompressed) can now be automatically stored into a different directory than the source one, using --output-dir-flat=DIR command, provided by @senhuang42 .
  • It’s possible to inform zstd about the size of data coming from stdin . @nmagerko proposed 2 new commands, allowing users to provide the exact stream size (--stream-size=# ) or an approximative one (--size-hint=#). Both only make sense when compressing a data stream from a pipe (such as stdin), since for a real file, zstd obtains the exact source size from the file system. Providing a source size allows zstd to better adapt internal compression parameters to the input, resulting in better performance and compression ratio. Additionally, providing the precise size makes it possible to embed this information in the compressed frame header, which also allows decoder optimizations.
  • In situations where the same directory content get regularly compressed, with the intention to only compress new files not yet compressed, it’s necessary to filter the file list, to exclude already compressed files. This process is simplified with command --exclude-compressed, provided by @shashank0791 . As the name implies, it simply excludes all compressed files from the list to process.

Single-File Decoder with Web Assembly

Let’s complete the picture with an impressive contribution from @cwoffenden. libzstd has long offered the capability to build only the decoder, in order to generate smaller binaries that can be more easily embedded into memory-constrained devices and applications.

@cwoffenden built on this capability and offers a script creating a single-file decoder, as an amalgamated variant of reference Zstandard’s decoder. The package is completed with a nice build script, which compiles the one-file decoder into WASM code, for embedding into web application, and even tests it.

As a capability example, check out the awesome WebGL demo provided by @cwoffenden in /contrib/single_file_decoder/examples directory!

Full List

  • perf: Improved decompression speed, by > 10%, by @terrelln
  • perf: Better compression speed when re-using a context, by @felixhandte
  • perf: Fix compression ratio when compressing large files with small dictionary, by @senhuang42
  • perf: zstd reference encoder can generate RLE blocks, by @bimbashrestha
  • perf: minor generic speed optimization, by @davidbolvansky
  • api: new ability to extract sequences from the parser for analysis, by @bimbashrestha
  • api: fixed decoding of magic-less frames, by @terrelln
  • api: fixed ZSTD_initCStream_advanced() performance with fast modes, reported by @QrczakMK
  • cli: Named pipes support, by @bimbashrestha
  • cli: short tar's extension support, by @stokito
  • cli: command --output-dir-flat=DIE , generates target files into requested directory, by @senhuang42
  • cli: commands --stream-size=# and --size-hint=#, by @nmagerko
  • cli: command --exclude-compressed, by @shashank0791
  • cli: faster -t test mode
  • cli: improved some error messages, by @vangyzen
  • cli: fix command -D dictionary on Windows
  • cli: fix rare deadlock condition within dictionary builder, by @terrelln
  • build: single-file decoder with emscripten compilation script, by @cwoffenden
  • build: fixed zlibWrapper compilation on Visual Studio, reported by @bluenlive
  • build: fixed deprecation warning for certain gcc version, reported by @jasonma163
  • build: fix compilation on old gcc versions, by @cemeyer
  • build: improved installation directories for cmake script, by Dmitri Shubin
  • pack: modified pkgconfig, for better integration into openwrt, requested by @neheb
  • misc: Improved documentation : ZSTD_CLEVEL, DYNAMIC_BMI2, ZSTD_CDict, function deprecation, zstd format
  • misc: fixed educational decoder : accept larger literals section, and removed UNALIGNED() macro
Assets 8
  • v1.4.3
  • a3d655d
  • Compare
    Choose a tag to compare
    Search for a tag
  • v1.4.3
  • a3d655d
  • Compare
    Choose a tag to compare
    Search for a tag

@felixhandte felixhandte released this Aug 19, 2019 · 1775 commits to master since this release

Dictionary Compression Regression

We discovered an issue in the v1.4.2 release, which can degrade the effectiveness of dictionary compression. This release fixes that issue.

Detailed Changes

Assets 8
  • v1.4.2
  • ff304e9
  • Compare
    Choose a tag to compare
    Search for a tag
  • v1.4.2
  • ff304e9
  • Compare
    Choose a tag to compare
    Search for a tag

@felixhandte felixhandte released this Jul 25, 2019 · 1801 commits to master since this release

Legacy Decompression Fix

This release is a small one, that corrects an issue discovered in the previous release. Zstandard v1.4.1 included a bug in decompressing v0.5 legacy frames, which is fixed in v1.4.2.

Detailed Changes

Assets 8
  • v1.4.1
  • 52181f8
  • Compare
    Choose a tag to compare
    Search for a tag
  • v1.4.1
  • 52181f8
  • Compare
    Choose a tag to compare
    Search for a tag

@felixhandte felixhandte released this Jul 19, 2019 · 1829 commits to master since this release


This release is primarily a maintenance release.

It includes a few bug fixes, including a fix for a rare data corruption bug, which could only be triggered in a niche use case, when doing all of the following: using multithreading mode, with an overlap size >= 512 MB, using a strategy >= ZSTD_btlazy, and compressing more than 4 GB. None of the default compression levels meet these requirements (not even --ultra ones).


This release also includes some performance improvements, among which the primary improvement is that Zstd decompression is ~7% faster, thanks to @mgrice.

See this comparison of decompression speeds at different compression levels, measured on the Silesia Corpus, on an Intel i9-9900K with GCC 9.1.0.

Level v1.4.0 v1.4.1 Delta
1 1390 MB/s 1453 MB/s +4.5%
3 1208 MB/s 1301 MB/s +7.6%
5 1129 MB/s 1233 MB/s +9.2%
7 1224 MB/s 1347 MB/s +10.0%
16 1278 MB/s 1430 MB/s +11.8%

Detailed list of changes

Assets 8
  • v1.4.0
  • 83b51e9
  • Compare
    Choose a tag to compare
    Search for a tag
  • v1.4.0
  • 83b51e9
  • Compare
    Choose a tag to compare
    Search for a tag

@terrelln terrelln released this Apr 16, 2019 · 1984 commits to master since this release

Advanced API

The main focus of the v1.4.0 release is the stabilization of the advanced API.

The advanced API provides a way to set specific parameters during compression and decompression in an API and ABI compatible way. For example, it allows you to compress with multiple threads, enable --long mode, set frame parameters, and load dictionaries. It is compatible with ZSTD_compressStream*() and ZSTD_compress2(). There is also an advanced decompression API that allows you to set parameters like maximum memory usage, and load dictionaries. It is compatible with the existing decompression functions ZSTD_decompressStream() and ZSTD_decompressDCtx().

The old streaming functions are all compatible with the new API, and the documentation provides the equivalent function calls in the new API. For example, see ZSTD_initCStream(). The stable functions will remain supported, but the functions in the experimental sections, like ZSTD_initCStream_usingDict(), will eventually be marked as deprecated and removed in favor of the new advanced API.

The examples have all been updated to use the new advanced API. If you have questions about how to use the new API, please refer to the examples, and if they are unanswered, please open an issue.


Zstd's fastest compression level just got faster! Thanks to ideas from Intel's igzip and @gbtucker, we've made level 1, zstd's fastest strategy, 6-8% faster in most scenarios. For example on the Silesia Corpus with level 1, we see 0.2% better compression compared to zstd-1.3.8, and these performance figures on an Intel i9-9900K:

Version C. Speed D. Speed
1.3.8 gcc-8 489 MB/s 1343 MB/s
1.4.0 gcc-8 532 MB/s (+8%) 1346 MB/s
1.3.8 clang-8 488 MB/s 1188 MB/s
1.4.0 clang-8 528 MB/s (+8%) 1216 MB/s

New Features

A new experimental function ZSTD_decompressBound() has been added by @shakeelrao. It is useful when decompressing zstd data in a single shot that may, or may not have the decompressed size written into the frame. It is exact when the decompressed size is written into the frame, and a tight upper bound within 128 KB, as long as ZSTD_e_flush and ZSTD_flushStream() aren't used. When ZSTD_e_flush is used, in the worst case the bound can be very large, but this isn't a common scenario.

The parameter ZSTD_c_literalCompressionMode and the CLI flag --[no-]compress-literals allow users to explicitly enable and disable literal compression. By default literals are compressed with positive compression levels, and left uncompressed for negative compression levels. Disabling literal compression boosts compression and decompression speed, at the cost of compression ratio.

Detailed list of changes

  • perf: Improve level 1 compression speed in most scenarios by 6% by @gbtucker and @terrelln
  • api: Move the advanced API, including all functions in the staging section, to the stable section
  • api: Make ZSTD_e_flush and ZSTD_e_end block for maximum forward progress
  • api: Rename ZSTD_CCtxParam_getParameter to ZSTD_CCtxParams_getParameter
  • api: Rename ZSTD_CCtxParam_setParameter to ZSTD_CCtxParams_setParameter
  • api: Don't export ZSTDMT functions from the shared library by default
  • api: Require ZSTD_MULTITHREAD to be defined to use ZSTDMT
  • api: Add ZSTD_decompressBound() to provide an upper bound on decompressed size by @shakeelrao
  • api: Fix ZSTD_decompressDCtx() corner cases with a dictionary
  • api: Move ZSTD_getDictID_*() functions to the stable section
  • api: Add ZSTD_c_literalCompressionMode flag to enable or disable literal compression by @terrelln
  • api: Allow compression parameters to be set when a dictionary is used
  • api: Allow setting parameters before or after ZSTD_CCtx_loadDictionary() is called
  • api: Fix ZSTD_estimateCStreamSize_usingCCtxParams()
  • api: Setting ZSTD_d_maxWindowLog to 0 means use the default
  • cli: Ensure that a dictionary is not used to compress itself by @shakeelrao
  • cli: Add --[no-]compress-literals flag to enable or disable literal compression
  • doc: Update the examples to use the advanced API
  • doc: Explain how to transition from old streaming functions to the advanced API in the header
  • build: Improve the Windows release packages
  • build: Improve CMake build by @hjmjohnson
  • build: Build fixes for FreeBSD by @lwhsu
  • build: Remove redundant warnings by @thatsafunnyname
  • build: Fix tests on OpenBSD by @bket
  • build: Extend fuzzer build system to work with the new clang engine
  • build: CMake now creates the symlink
  • build: Improve Menson build by @lzutao
  • misc: Fix symbolic link detection on FreeBSD
  • misc: Use physical core count for -T0 on FreeBSD by @cemeyer
  • misc: Fix zstd --list on truncated files by @kostmo
  • misc: Improve logging in debug mode by @felixhandte
  • misc: Add CirrusCI tests by @lwhsu
  • misc: Optimize dictionary memory usage in corner cases
  • misc: Improve the dictionary builder on small or homogeneous data
  • misc: Fix spelling across the repo by @jsoref
Assets 8
  • v1.3.8
  • 470344d
  • Compare
    Choose a tag to compare
    Search for a tag
  • v1.3.8
  • 470344d
  • Compare
    Choose a tag to compare
    Search for a tag

@Cyan4973 Cyan4973 released this Dec 27, 2018 · 2303 commits to dev since this release

Advanced API

v1.3.8 main focus is the stabilization of the advanced API.

This API has been in the making for more than a year, and makes it possible to trigger advanced features, such as multithreading, --long mode, or detailed frame parameters, in a straightforward and extensible manner. Some examples are provided in this blog entry.
To make this vision possible, the advanced API relies on sticky parameters, which can be stacked on top of each other in any order. This makes it possible to introduce new features in the future without breaking API nor ABI.

This API has provided a good experience in our infrastructure, and we hope it will prove easy to use and efficient in your applications. Nonetheless, before being branded "stable", this proposal must spend a last round in "staging area", in order to generate comments and feedback from new users. It's planned to be labelled "stable" by v1.4.0, which is expected to be next release, depending on received feedback.

The experimental section still contains a lot of prototypes which are largely redundant with the new advanced API. Expect them to become deprecated, and then later dropped in some future. Transition towards the newer advanced API is therefore highly recommended.


Decoding speed has been improved again, primarily for some specific scenarios : frames using large window sizes (--ultra or --long), and cold dictionary. Cold dictionary is expected to become more important in the near future, as solutions relying on thousands of dictionaries simultaneously will be deployed.

The higher compression levels get a slight compression ratio boost, mostly visible for small (<256 KB) and large (>32 MB) data streams. This change benefits asymmetric scenarios (compress ones, decompress many times), typically targeting level 19.

New features

A noticeable addition, @terrelln introduces the --rsyncable mode to zstd. Similar to gzip --rsyncable, it generates a compressed frame which is friendly to rsync in case of limited changes : a difference in the input data will only impact a small localized amount of compressed data, instead of everything from the position onward due to cascading impacts. This is useful for very large archives regularly updated and synchronized over long distance connections (as an example, compressed mailboxes come to mind).

The method used by zstd preserves the compression ratio very well, introducing only very tiny losses due to synchronization points, meaning it's no longer a sacrifice to use --rsyncable. Here is an example on silesia.tar, using default compression level :

compressor normal --rsyncable Ratio diff. time
gzip 68235456 68778265 -0.795% 7.92s
zstd 66829650 66846769 -0.026% 1.17s

Speaking of compression of level : it's now possible to use environment variable ZSTD_CLEVEL to influence default compression level. This can prove useful in situations where it's not possible to provide command line parameters, typically when zstd is invoked "under the hood" by some calling process.

Lastly, anyone interested in embedding a small zstd decoder into a space-constrained application will be interested in a new set of build macros introduced by @felixhandte, which makes it possible to selectively turn off decoder features to reduce binary size even further. Final binary size will of course vary depending on target assembler and compiler, but in preliminary testings on x64, it helped reducing the decoder size by a factor 3 (from ~64KB towards ~20KB).

Detailed list of changes

  • perf: better decompression speed on large files (+7%) and cold dictionaries (+15%)
  • perf: slightly better compression ratio at high compression modes
  • api : finalized advanced API, last stage before "stable" status
  • api : new --rsyncable mode, by @terrelln
  • api : support decompression of empty frames into NULL (used to be an error) (#1385)
  • build: new set of build macros to generate a minimal size decoder, by @felixhandte
  • build: fix compilation on MIPS32, reported by @clbr (#1441)
  • build: fix compilation with multiple -arch flags, by @ryandesign
  • build: highly upgraded meson build, by @lzutao
  • build: improved buck support, by @obelisk
  • build: fix cmake script : can create debug build, by @pitrou
  • build: Makefile : grep works on both colored consoles and systems without color support
  • build: fixed zstd-pgo target, by @bmwiedemann
  • cli : support ZSTD_CLEVEL environment variable, by @yijinfb (#1423)
  • cli : --no-progress flag, preserving final summary (#1371), by @terrelln
  • cli : ensure destination file is not source file (#1422)
  • cli : clearer error messages, notably when input file not present
  • doc : clarified, by @ulikunitz
  • misc: fixed zstdgrep, returns 1 on failure, by @lzutao
  • misc: NEWS renamed as CHANGELOG, in accordance with fb.oss policy
Assets 9

@Cyan4973 Cyan4973 released this Oct 19, 2018 · 2662 commits to dev since this release

This is minor fix release building upon v1.3.6.

The main reason we publish this new version is that @indygreg detected an important compression ratio regression for a specific scenario (compressing with dictionary at level 9 or 10 for small data, or 11 - 12 for large data) . We don't anticipate this scenario to be common : dictionary compression is still rare, then most users prefer fast modes (levels <=3), a few rare ones use strong modes (level 15-19), so "middle compression" is an extreme rarity.
But just in case some user do, we publish this release.

A few other minor things were ongoing and are therefore bundled.

Decompression speed might be slightly better with clang, depending on exact target and version. We could observe as mush as 7% speed gains in some cases, though in other cases, it's rather in the ~2% range.

The integrated backtrace functionality in the cli is updated : its presence can be more easily controlled, invoking BACKTRACE build macro. The automatic detector is more restrictive, and release mode builds without it by default. We want to be sure the default make compiles without any issue on most platforms.

Finally, the list of man pages has been completed with documentation for zstdless and zstdgrep, by @samrussell .

Detailed list of changes

  • perf: slightly better decompression speed on clang (depending on hardware target)
  • fix : ratio for dictionary compression at levels 9 and 10, reported by @indygreg
  • build: no longer build backtrace by default in release mode; restrict further automatic mode
  • build: control backtrace support through build macro BACKTRACE
  • misc: added man pages for zstdless and zstdgrep, by @samrussell
Assets 9
You can’t perform that action at this time.