[ldm] Fix bug in overflow correction with large job size #1678

terrelln · 2019-07-09T23:15:24Z

src is the wrong pointer here. When the job size is 2GB the second job does overflow correction, but src - base == 2 GB for every chunk. When we do overflow correction we get current & cycleLog + maxDist == 2 GB & 2^26 + 2 GB == 2 GB, meaning we effectively never do overflow correction that job. Then the next job we overflow, and things go terribly wrong.

Instead, we need to pass the beginning of the current chunk, which is only 1 MB large and is well under the maximum chunk size of 512 MB.

There are asserts that failed before this change, and don't fail after, but I accidentally defined the wrong macro during the build for my initial tests.

The test I added to playTests.sh has assertion failures and produces corrupted data before this change. The test I added to zstreamtest.c doesn't fail before the change, but exercises the max job size and window log which is useful.

Fixes #1672.

Cyan4973 · 2019-07-10T18:53:51Z

tests/zstreamtest.c

@@ -1028,6 +1028,55 @@ static int basicUnitTests(U32 seed, double compressibility)
    }
    DISPLAYLEVEL(3, "OK \n");

+    /* Large source and large job size multithreading compression test */


how long does this test last ?

Less than 10 seconds

It's very reasonable time duration to test such condition,

but for this section of the test system (basicUnitTests() in zstreamtest), it might be a bit too long :
we ideally want all tests in basicUnitTests() to be a fraction of a second each,
so that the sum of all these tests remain in the "acceptable" range to be run as a preamble test.

Any chance this could become an equivalent test in the --test-large-data of playTests.sh ?
That's generally where we play all long tests with large files, making it possible to disable them selectively.

One downside is that playTests.sh uses the CLI, rather than the API.
On first look, it seems it could nonetheless be compatible with below test ?

We can't control the job size on the CLI except implicitly. I also have a test there that exposes the bug.

I think we should leave the test there for now, because we don't currently have a better place.

But, I think we should invest some time in switching to gtest for our unit tests. It is the standard for C/C++ tests. There are a couple of benefits:

All the tests can start with a clean environment and share setup/teardown code. We can reset or recreate contexts between tests, populate input buffers, etc.

gtest has great helpers for printing why a test failed.

We can do fancy things like run the same test on multiple types of data (if we want).

When one test fails it still runs all the other tests

It is the standard, so on-boarding engineers and open source contributors will be easier.

The downside is that we have a C++/cmake dependency for testing (gtest can be a submodule). I think that is reasonable since it is only needed for tests.

I ended up removing the test because it is failing in 32-bit mode with ASAN.

When we introduce gtest we'll have to skip the "expensive" tests in 32-bit ASAN mode like we do with playTests.sh.

Sadly the test fails on our CI because it uses too much memory, so I had to comment it out.

Cyan4973

Looks great to me !

[ldm] Fix bug in overflow correction with large job size

4d15e58

facebook-github-bot added the CLA Signed label Jul 9, 2019

Cyan4973 reviewed Jul 10, 2019

View reviewed changes

terrelln force-pushed the ldm-fix branch 2 times, most recently from b7d0d83 to 6f974e6 Compare July 10, 2019 21:29

[zstdmt] Respect ZSTDMT_JOBSIZE_MAX (1G in 64-bit mode)

4af3be5

terrelln force-pushed the ldm-fix branch 4 times, most recently from e71bde8 to bfbb287 Compare July 11, 2019 22:41

[test] Add test that exposes the bug

e598e49

Sadly the test fails on our CI because it uses too much memory, so I had to comment it out.

terrelln force-pushed the ldm-fix branch from bfbb287 to e598e49 Compare July 12, 2019 00:38

Cyan4973 approved these changes Jul 12, 2019

View reviewed changes

terrelln merged commit 75cfe1d into facebook:dev Jul 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ldm] Fix bug in overflow correction with large job size #1678

[ldm] Fix bug in overflow correction with large job size #1678

terrelln commented Jul 9, 2019 •

edited

Loading

Cyan4973 Jul 10, 2019

terrelln Jul 10, 2019

Cyan4973 Jul 10, 2019

terrelln Jul 10, 2019

Cyan4973 Jul 10, 2019

terrelln Jul 11, 2019

Cyan4973 left a comment

[ldm] Fix bug in overflow correction with large job size #1678

[ldm] Fix bug in overflow correction with large job size #1678

Conversation

terrelln commented Jul 9, 2019 • edited Loading

Cyan4973 Jul 10, 2019

Choose a reason for hiding this comment

terrelln Jul 10, 2019

Choose a reason for hiding this comment

Cyan4973 Jul 10, 2019

Choose a reason for hiding this comment

terrelln Jul 10, 2019

Choose a reason for hiding this comment

Cyan4973 Jul 10, 2019

Choose a reason for hiding this comment

terrelln Jul 11, 2019

Choose a reason for hiding this comment

Cyan4973 left a comment

Choose a reason for hiding this comment

terrelln commented Jul 9, 2019 •

edited

Loading