Merge v1.4.1 to Master #1691

felixhandte · 2019-07-18T23:13:25Z

List of Changes

bug: Fix data corruption in niche use cases by @terrelln (Fix data corruption in niche use case #1659)
bug: Fuzz legacy modes, fix uncovered bugs by @terrelln ([fuzzer] Run fuzzers in legacy mode and fix legacy code #1593, [legacy] Fix Huffman jump table reads in v01 and v05 #1594, [legacy] Fix ZSTDv0*_decodeSequence() #1595)
bug: Fix out of bounds read by @terrelln ([fuzzer] Fuzz frame info functions #1590)
perf: Improve decode speed by ~7% @mgrice (perf improvements for zstd decode #1668)
perf: Slightly improved compression ratio of level 3 and 4 (ZSTD_dfast) by @Cyan4973 (updated double_fast complementary insertion #1681)
perf: Slightly faster compression speed when re-using a context by @Cyan4973 (memset() rather than reduceIndex() #1658)
perf: Improve compression ratio for small windowLog by @Cyan4973 (Improves compression ratio for small windowLog #1624)
perf: Faster compression speed in high compression mode for repetitive data by @terrelln ([libzstd] Optimize ZSTD_insertBt1() for repetitive data #1635)
api: Add parameter to generate smaller dictionaries by @Tyler-Tran (Adding shrinking flag for cover and fastcover #1656)
cli: Recognize symlinks when built in C99 mode by @felixhandte (Protect lstat() With Better Macro Guard #1640)
cli: Expose cpu load indicator for each file on -vv mode by @ephiepark (zstdcli : expose cpu load indicator for each file on -vv mode #1631)
cli: Restrict read permissions on destination files by @chungy ([programs] set chmod 600 after opening destination file #1644)
cli: zstdgrep: handle -f flag by @felixhandte (Handle -f Flag in zstgrep #1618)
cli: zstdcat: follow symlinks by @vejnar (Make zstdcat to follow symbolic links #1604)
doc: Remove extra size limit on compressed blocks by @felixhandte ([doc] Remove Limitation that Compressed Block is Smaller than Uncompressed Content #1689)
doc: Fix typo by @yk-tanigawa (Update README.md #1633)
doc: Improve documentation on streaming buffer sizes by @Cyan4973 (Added comments on I/O buffer sizes for streaming #1629)
build: CMake: support building with LZ4 @LeeYoung624 (add cmake lz4 support #1626)
build: CMake: install zstdless and zstdgrep by @LeeYoung624 (CMake bug fix: didn't install zstdless and zstdgrep. #1647)
build: CMake: respect existing uninstall target by @j301scott (CMake: Check for existing custom target 'uninstall' #1619)
build: Make: skip multithread tests when built without support by @michaelforney (Skip --adapt and --rsyncable tests when built without thread support #1620)
build: Make: Fix examples/ test target by @sjnam (fails to "make test" in examples #1603)
build: Meson: rename options out of deprecated namespace by @lzutao (meson: Fix deprecated build warnings on build options #1665)
build: Meson: fix build by @lzutao (meson: Error if fail to extract version number from zstd.h #1602)
build: Visual Studio: don't export symbols in static lib by @scharan (Remove ZSTD_DLL_EXPORT=1 for static lib #1650)
build: Visual Studio: fix linking by @Absotively (Visual Studio issues #1639)
build: Fix MinGW-W64 build by @myzhang1029 (Fix #1591 - Not building on MinGW-W64 #1600)
misc: Expand decodecorpus coverage by @ephiepark (decodecorpus #1664)

The function didn't verify that the skippable frame size is correct.

Add a fuzzer that fuzzes all helper functions that take compressed input. This fuzzer caught one out of bounds read in `ZSTD_decompressBound()`.

[fuzzer] Fuzz frame info functions

[fuzzer] Run fuzzers in legacy mode and fix legacy code

[legacy] Fix Huffman jump table reads in v01 and v05

* Version <= 0.5 could read beyond the end of `dumps`, which points into the input buffer. * Check the validity of `dumps` before using it, if it is out of bounds return garbage values. There is no return code for this function. * Introduce `MEM_readLE24()` for simplicity, since I don't want to trust that there is an extra byte after `dumps`.

[legacy] Fix ZSTDv0*_decodeSequence()

[libzstd] Error if all sequence bits aren't consumed

Add a static function LONG_TELL for the forth #if branch

Fix #1591 - Not building on MinGW-W64

* Update to use ninja v1.9.0 on CI

fails to "make test" in examples

Make zstdcat to follow symbolic links

meson: Update default project version

* tests: Fix shellcheck warnings in playTests.sh * tests: Do not use ../programs which is relative to tests dirs This commit fixes error when running playTests.sh in Meson. Mesonbuild runs out of tree, so ./datagen not in `zstd/tests` dir, it lies in <mesonbuilddir>/tests. This leads to ../programs invalid. * tests: Replace relative paths for zstd/tests dir * playTests: Set shell options explicitly, not in shebang * playTests: Replace echo -e with printf * meson: Fix test-zstd Use std=gnu99 to build and test just like `make test`. * meson: Fix legacy test * meson: Enable testing in CI Run build under release mode for faster test time. * meson: Increase timeout time for test-zstream

reflect code review comments

decodecorpus

memset() rather than reduceIndex()

When we wrote one byte beyond the end of the buffer for RLE blocks back in 1.3.7, we would then have `op > oend`. That is a problem when we use `oend - op` for the size of the destination buffer, and allows further writes beyond the end of the buffer for the rest of the function. Lets assert that it doesn't happen.

)

Adding targetCBlockSize param

@scherepanov

also : added doc on context re-use, as suggested by @scherepanov at #1676

fix gitignore errors

Factor out the logic to build sequences

in a way which is more favorable to compression ratio, though very slightly slower (~-1%). More details in the PR.

* perf improvements for zstd decode tldr: 7.5% average decode speedup on silesia corpus at compression levels 1-3 (sandy bridge) Background: while investigating zstd perf differences between clang and gcc I noticed that even though gcc is vectorizing the loop in in wildcopy, it was not being done as well as could be done by hand. The sites where wildcopy is invoked have an interesting distribution of lengths to be copied. The loop trip count is rarely above 1, yet long copies are common enough to make their performance important.The code in zstd_decompress.c to invoke wildcopy handles the latter well but the gcc autovectorizer introduces a needlessly expensive startup check for vectorization. See how GCC autovectorizes the loop here: https://godbolt.org/z/apr0x0 Here is the code after this diff has been applied: (left hand side is the good one, right is with vectorizer on) After: https://godbolt.org/z/OwO4F8 Note that autovectorization still does not do a good job on the optimized version, so it's turned off\ via attribute and flag. I found that neither attribute nor command-line flag were entirely successful in turning off vectorization, which is why there were both. silesia benchmark data - second triad of each file is with the original code: file orig compressedratio encode decode change 1#dickens 10192446-> 4268865(2.388), 198.9MB/s 709.6MB/s 2#dickens 10192446-> 3876126(2.630), 128.7MB/s 552.5MB/s 3#dickens 10192446-> 3682956(2.767), 104.6MB/s 537MB/s 1#dickens 10192446-> 4268865(2.388), 195.4MB/s 659.5MB/s 7.60% 2#dickens 10192446-> 3876126(2.630), 127MB/s 516.3MB/s 7.01% 3#dickens 10192446-> 3682956(2.767), 105MB/s 479.5MB/s 11.99% 1#mozilla 51220480-> 20117517(2.546), 285.4MB/s 734.9MB/s 2#mozilla 51220480-> 19067018(2.686), 220.8MB/s 686.3MB/s 3#mozilla 51220480-> 18508283(2.767), 152.2MB/s 669.4MB/s 1#mozilla 51220480-> 20117517(2.546), 283.4MB/s 697.9MB/s 5.30% 2#mozilla 51220480-> 19067018(2.686), 225.9MB/s 665MB/s 3.20% 3#mozilla 51220480-> 18508283(2.767), 154.5MB/s 640.6MB/s 4.50% 1#mr 9970564-> 3840242(2.596), 262.4MB/s 899.8MB/s 2#mr 9970564-> 3600976(2.769), 181.2MB/s 717.9MB/s 3#mr 9970564-> 3563987(2.798), 116.3MB/s 620MB/s 1#mr 9970564-> 3840242(2.596), 253.2MB/s 827.3MB/s 8.76% 2#mr 9970564-> 3600976(2.769), 177.4MB/s 655.4MB/s 9.54% 3#mr 9970564-> 3563987(2.798), 111.2MB/s 564.2MB/s 9.89% 1#nci 33553445-> 2849306(11.78), 575.2MB/s , 1335.8MB/s 2#nci 33553445-> 2890166(11.61), 509.3MB/s , 1238.1MB/s 3#nci 33553445-> 2857408(11.74), 431MB/s , 1210.7MB/s 1#nci 33553445-> 2849306(11.78), 565.4MB/s , 1220.2MB/s 9.47% 2#nci 33553445-> 2890166(11.61), 508.2MB/s , 1128.4MB/s 9.72% 3#nci 33553445-> 2857408(11.74), 429.1MB/s , 1097.7MB/s 10.29% 1#ooffice 6152192-> 3590954(1.713), 231.4MB/s , 662.6MB/s 2#ooffice 6152192-> 3323931(1.851), 162.8MB/s , 592.6MB/s 3#ooffice 6152192-> 3145625(1.956), 99.9MB/s , 549.6MB/s 1#ooffice 6152192-> 3590954(1.713), 224.7MB/s , 624.2MB/s 6.15% 2#ooffice 6152192-> 3323931 (1.851), 155MB/s , 564.5MB/s 4.98% 3#ooffice 6152192-> 3145625(1.956), 101.1MB/s , 521.2MB/s 5.45% 1#osdb 10085684-> 3739042(2.697), 271.9MB/s 876.4MB/s 2#osdb 10085684-> 3493875(2.887), 208.2MB/s 857MB/s 3#osdb 10085684-> 3515831(2.869), 135.3MB/s 805.4MB/s 1#osdb 10085684-> 3739042(2.697), 257.4MB/s 793.8MB/s 10.41% 2#osdb 10085684-> 3493875(2.887), 209.7MB/s 776.1MB/s 10.42% 3#osdb 10085684-> 3515831(2.869), 130.6MB/s 727.7MB/s 10.68% 1#reymont 6627202-> 2152771(3.078), 198.9MB/s 696.2MB/s 2#reymont 6627202-> 2071140(3.200), 170MB/s 595.2MB/s 3#reymont 6627202-> 1953597(3.392), 128.5MB/s 609.7MB/s 1#reymont 6627202-> 2152771(3.078), 199.6MB/s 655.2MB/s 6.26% 2#reymont 6627202-> 2071140(3.200), 168.2MB/s 554.4MB/s 7.36% 3#reymont 6627202-> 1953597(3.392), 128.7MB/s 557.4MB/s 9.38% 1#samba 21606400-> 5510994(3.921), 338.1MB/s 1066MB/s 2#samba 21606400-> 5240208(4.123), 258.7MB/s 992.3MB/s 3#samba 21606400-> 5003358(4.318), 200.2MB/s 991.1MB/s 1#samba 21606400-> 5510994(3.921), 330.8MB/s 974MB/s 9.45% 2#samba 21606400-> 5240208(4.123), 257.9MB/s 919.4MB/s 7.93% 3#samba 21606400-> 5003358(4.318), 198.5MB/s 908.9MB/s 9.04% 1#sao 7251944-> 6256401(1.159), 194.6MB/s 602.2MB/s 2#sao 7251944-> 5808761(1.248), 128.2MB/s 532.1MB/s 3#sao 7251944-> 5556318(1.305), 73MB/s 509.4MB/s 1#sao 7251944-> 6256401(1.159), 198.7MB/s 580.7MB/s 3.70% 2#sao 7251944-> 5808761(1.248), 129.1MB/s 502.7MB/s 5.85% 3#sao 7251944-> 5556318(1.305), 74.6MB/s 493.1MB/s 3.31% 1#webster 41458703-> 13692222(3.028), 222.3MB/s 752MB/s 2#webster 41458703-> 12842646(3.228), 157.6MB/s 532.2MB/s 3#webster 41458703-> 12191964(3.400), 124MB/s 468.5MB/s 1#webster 41458703-> 13692222(3.028), 219.7MB/s 697MB/s 7.89% 2#webster 41458703-> 12842646(3.228), 153.9MB/s 495.4MB/s 7.43% 3#webster 41458703-> 12191964(3.400), 124.8MB/s 444.8MB/s 5.33% 1#xml 5345280-> 696652(7.673), 485MB/s , 1333.9MB/s 2#xml 5345280-> 681492(7.843), 405.2MB/s , 1237.5MB/s 3#xml 5345280-> 639057(8.364), 328.5MB/s , 1281.3MB/s 1#xml 5345280-> 696652(7.673), 473.1MB/s , 1232.4MB/s 8.24% 2#xml 5345280-> 681492(7.843), 398.6MB/s , 1145.9MB/s 7.99% 3#xml 5345280-> 639057(8.364), 327.1MB/s , 1175MB/s 9.05% 1#x-ray 8474240-> 6772557(1.251), 521.3MB/s 762.6MB/s 2#x-ray 8474240-> 6684531(1.268), 230.5MB/s 688.5MB/s 3#x-ray 8474240-> 6166679(1.374), 68.7MB/s 478.8MB/s 1#x-ray 8474240-> 6772557(1.251), 502.8MB/s 736.7MB/s 3.52% 2#x-ray 8474240-> 6684531(1.268), 224.4MB/s 662MB/s 4.00% 3#x-ray 8474240-> 6166679(1.374), 67.3MB/s 437.8MB/s 9.37% 7.51% * makefile changed to only pass -fno-tree-vectorize to gcc * <Replace this line with a title. Use 1 line only, 67 chars or less> Don't add "no-tree-vectorize" attribute on clang (which defines __GNUC__) * fix for warning/error with subtraction of void* pointers * fix c90 conformance issue - ISO C90 forbids mixed declarations and code * Fix assert for negative diff, only when there is no overlap * fix overflow revealed in fuzzing tests * tweak for small speed increase

same number of complementary insertions, just organized differently (long at `ip-2`, short at `ip-1`).

* [ldm] Fix bug in overflow correction with large job size * [zstdmt] Respect ZSTDMT_JOBSIZE_MAX (1G in 64-bit mode) * [test] Add test that exposes the bug Sadly the test fails on our CI because it uses too much memory, so I had to comment it out.

updated double_fast complementary insertion

[regression] Update results for ZSTD_double_fast update

…essed Content This changes the size limit on compressed blocks to match those of the other block types: they may not be larger than the `Block_Maximum_Decompressed_Size`, which is the smaller of the `Window_Size` and 128 KB, removing the additional restriction that had been placed on `Compressed_Block`s, that they be smaller than the decompressed content they represent. Several things motivate removing this restriction. On the one hand, this restriction is not useful for decoders: the decoder must nonetheless be prepared to accept compressed blocks that are the full `Block_Maximum_Decompressed_Size`. And on the other, this bound is actually artificially limiting. If block representations were entirely independent, a compressed representation of a block that is larger than the contents of the block would be ipso facto useless, and it would be strictly better to send it as an `Raw_Block`. However, blocks are not entirely independent, and it can make sense to pay the cost of encoding custom entropy tables in a block, even if that pushes that block size over the size of the data it represents, because those tables can be re-used by subsequent blocks. Finally, as far as I can tell, this restriction in the spec is not currently enforced in any Zstandard implementation, nor has it ever been. This change should therefore be safe to make.

Update CHANGELOG with v1.4.1 Changes

terrelln and others added 30 commits April 17, 2019 11:29

[libzstd] Fix ZSTD_decompressBound() on bad skippable frames

450feb0

The function didn't verify that the skippable frame size is correct.

[fuzzer] Add a fuzzer for frame info functions

09caa4d

Add a fuzzer that fuzzes all helper functions that take compressed input. This fuzzer caught one out of bounds read in `ZSTD_decompressBound()`.

[legacy] Return the right error code

5922f4e

[libzstd] Check the size in readSkippableFrameSize()

ee130a9

[fuzz] Add a seedcorpora target for oss-fuzz

58bcc32

Merge pull request #1590 from terrelln/frame-info-fuzz

af3531e

[fuzzer] Fuzz frame info functions

[fuzzer] Size the decompression output buffer randomly

cc66900

[fuzzer] Compile with legacy support

610a81e

[legacy] Fix a bug in ZSTDv06_findFrameSizeInfoLegacy()

ac098c7

[legacy] Fix bug in ZSTD_decodeSeqHeaders()

579f3d7

[paramgrill] Fix mingw build errors

785331a

Merge pull request #1593 from terrelln/legacy-fix

a8db4bd

[fuzzer] Run fuzzers in legacy mode and fix legacy code

[legacy] Fix Huffman jump table reads in v01 and v05

2536771

Merge pull request #1594 from terrelln/legacy-fix

9ad7ea4

[legacy] Fix Huffman jump table reads in v01 and v05

Merge pull request #1595 from terrelln/legacy-fix

b758250

[legacy] Fix ZSTDv0*_decodeSequence()

[libzstd] Error if all sequence bits aren't consumed

a892e25

[libzstd] Add a ZSTD_STATIC_ASSERT for BIT_DStream_status

5f228f8

Merge pull request #1598 from terrelln/decode-seq

3d673f3

[libzstd] Error if all sequence bits aren't consumed

Fix #1591 - Not building on MinGW-W64

f837326

Add a static function LONG_TELL for the forth #if branch

Merge pull request #1600 from myzhang1029/long-tell

585b5a1

Fix #1591 - Not building on MinGW-W64

meson: Update default project version

4107b73

* Update to use ninja v1.9.0 on CI

fix test fail

bee9e5f

set followLinks option true to cat, zcat and gzcat programs

c4a40db

Merge pull request #1603 from sjnam/examples-test-fail

9ef732f

fails to "make test" in examples

add test for zstdcat and zcat on symlink

3e1e49d

Merge pull request #1604 from vejnar/dev

69baaee

Make zstdcat to follow symbolic links

meson: Error out if fail to extracted version number

5d900ff

Merge pull request #1602 from lzutao/meson

f8178ec

meson: Update default project version

ephiepark and others added 25 commits July 1, 2019 10:17

reflect code review comments

2830952

Merge pull request #7 from ephiepark/decodecorpus

01f5b5d

reflect code review comments

Merge pull request #1664 from ephiepark/dev

4d611ca

decodecorpus

Merge pull request #1658 from facebook/memset

857e608

memset() rather than reduceIndex()

[fuzz] Add a compression fuzzer with randomly sized output buffer (#1670

e962f07

)

Adding targetCBlockSize param

9007701

Factor out the logic to build sequences

f57ac7b

Merge pull request #1671 from ephiepark/dev

096714d

Adding targetCBlockSize param

fix gitignore errors

654cb9d

updated version number (to v1.4.1)

b8ec4b0

also : added doc on context re-use, as suggested by @scherepanov at #1676

Merge pull request #1677 from LeeYoung624/gitignore_fix

8eda16c

fix gitignore errors

updated .gitignore rule

34a1a37

updated .gitignore

2387d57

Merge pull request #1675 from ephiepark/dev

b01c1c6

Factor out the logic to build sequences

updated double_fast complementary insertion

d132773

in a way which is more favorable to compression ratio, though very slightly slower (~-1%). More details in the PR.

double-fast: changed the trade-off for a smaller positive change

e8a7f5d

same number of complementary insertions, just organized differently (long at `ip-2`, short at `ip-1`).

updated the _extDict variant of double fast

eaeb7f0

Merge pull request #1681 from facebook/level3

8fb08b6

updated double_fast complementary insertion

[regression] Update results for ZSTD_double_fast update

4c2943d

Merge pull request #1684 from terrelln/regression

f7d5694

[regression] Update results for ZSTD_double_fast update

[doc] Bump Format Spec Version

a2861d7

facebook-github-bot added the CLA Signed label Jul 18, 2019

Cyan4973 approved these changes Jul 18, 2019

View reviewed changes

felixhandte and others added 2 commits July 19, 2019 11:18

Update CHANGELOG with v1.4.1 Changes

62a0dc5

Merge pull request #1692 from felixhandte/v1.4.1-changelog

d636cd1

Update CHANGELOG with v1.4.1 Changes

felixhandte merged commit 52181f8 into master Jul 19, 2019

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge v1.4.1 to Master #1691

Merge v1.4.1 to Master #1691

felixhandte commented Jul 18, 2019 •

edited

Loading

Merge v1.4.1 to Master #1691

Merge v1.4.1 to Master #1691

Conversation

felixhandte commented Jul 18, 2019 • edited Loading

List of Changes

felixhandte commented Jul 18, 2019 •

edited

Loading