-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ldm] Fix bug in overflow correction with large job size #1678
Conversation
tests/zstreamtest.c
Outdated
@@ -1028,6 +1028,55 @@ static int basicUnitTests(U32 seed, double compressibility) | |||
} | |||
DISPLAYLEVEL(3, "OK \n"); | |||
|
|||
/* Large source and large job size multithreading compression test */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how long does this test last ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Less than 10 seconds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's very reasonable time duration to test such condition,
but for this section of the test system (basicUnitTests()
in zstreamtest
), it might be a bit too long :
we ideally want all tests in basicUnitTests()
to be a fraction of a second each,
so that the sum of all these tests remain in the "acceptable" range to be run as a preamble test.
Any chance this could become an equivalent test in the --test-large-data
of playTests.sh
?
That's generally where we play all long tests with large files, making it possible to disable them selectively.
One downside is that playTests.sh
uses the CLI, rather than the API.
On first look, it seems it could nonetheless be compatible with below test ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't control the job size on the CLI except implicitly. I also have a test there that exposes the bug.
I think we should leave the test there for now, because we don't currently have a better place.
But, I think we should invest some time in switching to gtest for our unit tests. It is the standard for C/C++ tests. There are a couple of benefits:
- All the tests can start with a clean environment and share setup/teardown code. We can reset or recreate contexts between tests, populate input buffers, etc.
- gtest has great helpers for printing why a test failed.
- We can do fancy things like run the same test on multiple types of data (if we want).
- When one test fails it still runs all the other tests
- It is the standard, so on-boarding engineers and open source contributors will be easier.
The downside is that we have a C++/cmake dependency for testing (gtest can be a submodule). I think that is reasonable since it is only needed for tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up removing the test because it is failing in 32-bit mode with ASAN.
When we introduce gtest we'll have to skip the "expensive" tests in 32-bit ASAN mode like we do with playTests.sh.
b7d0d83
to
6f974e6
Compare
e71bde8
to
bfbb287
Compare
Sadly the test fails on our CI because it uses too much memory, so I had to comment it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me !
src
is the wrong pointer here. When the job size is 2GB the second job does overflow correction, butsrc - base == 2 GB
for every chunk. When we do overflow correction we getcurrent & cycleLog + maxDist == 2 GB & 2^26 + 2 GB == 2 GB
, meaning we effectively never do overflow correction that job. Then the next job we overflow, and things go terribly wrong.Instead, we need to pass the beginning of the current chunk, which is only 1 MB large and is well under the maximum chunk size of 512 MB.
There are asserts that failed before this change, and don't fail after, but I accidentally defined the wrong macro during the build for my initial tests.
The test I added to
playTests.sh
has assertion failures and produces corrupted data before this change. The test I added tozstreamtest.c
doesn't fail before the change, but exercises the max job size and window log which is useful.Fixes #1672.