Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ldm] Fix bug in overflow correction with large job size #1678

Merged
merged 3 commits into from
Jul 12, 2019

Conversation

terrelln
Copy link
Contributor

@terrelln terrelln commented Jul 9, 2019

src is the wrong pointer here. When the job size is 2GB the second job does overflow correction, but src - base == 2 GB for every chunk. When we do overflow correction we get current & cycleLog + maxDist == 2 GB & 2^26 + 2 GB == 2 GB, meaning we effectively never do overflow correction that job. Then the next job we overflow, and things go terribly wrong.

Instead, we need to pass the beginning of the current chunk, which is only 1 MB large and is well under the maximum chunk size of 512 MB.

There are asserts that failed before this change, and don't fail after, but I accidentally defined the wrong macro during the build for my initial tests.

The test I added to playTests.sh has assertion failures and produces corrupted data before this change. The test I added to zstreamtest.c doesn't fail before the change, but exercises the max job size and window log which is useful.

Fixes #1672.

@@ -1028,6 +1028,55 @@ static int basicUnitTests(U32 seed, double compressibility)
}
DISPLAYLEVEL(3, "OK \n");

/* Large source and large job size multithreading compression test */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how long does this test last ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less than 10 seconds

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very reasonable time duration to test such condition,

but for this section of the test system (basicUnitTests() in zstreamtest), it might be a bit too long :
we ideally want all tests in basicUnitTests() to be a fraction of a second each,
so that the sum of all these tests remain in the "acceptable" range to be run as a preamble test.

Any chance this could become an equivalent test in the --test-large-data of playTests.sh ?
That's generally where we play all long tests with large files, making it possible to disable them selectively.

One downside is that playTests.sh uses the CLI, rather than the API.
On first look, it seems it could nonetheless be compatible with below test ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't control the job size on the CLI except implicitly. I also have a test there that exposes the bug.

I think we should leave the test there for now, because we don't currently have a better place.

But, I think we should invest some time in switching to gtest for our unit tests. It is the standard for C/C++ tests. There are a couple of benefits:

  • All the tests can start with a clean environment and share setup/teardown code. We can reset or recreate contexts between tests, populate input buffers, etc.
  • gtest has great helpers for printing why a test failed.
  • We can do fancy things like run the same test on multiple types of data (if we want).
  • When one test fails it still runs all the other tests
  • It is the standard, so on-boarding engineers and open source contributors will be easier.

The downside is that we have a C++/cmake dependency for testing (gtest can be a submodule). I think that is reasonable since it is only needed for tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up removing the test because it is failing in 32-bit mode with ASAN.

When we introduce gtest we'll have to skip the "expensive" tests in 32-bit ASAN mode like we do with playTests.sh.

@terrelln terrelln force-pushed the ldm-fix branch 2 times, most recently from b7d0d83 to 6f974e6 Compare July 10, 2019 21:29
@terrelln terrelln force-pushed the ldm-fix branch 4 times, most recently from e71bde8 to bfbb287 Compare July 11, 2019 22:41
Sadly the test fails on our CI because it uses too much memory, so
I had to comment it out.
Copy link
Contributor

@Cyan4973 Cyan4973 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me !

@terrelln terrelln merged commit 75cfe1d into facebook:dev Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incompatibility between --long=31 and -22 --ultra?
3 participants