Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[linux-kernel] Update patches for v2 #738

Merged
merged 1 commit into from Jun 29, 2017
Merged

[linux-kernel] Update patches for v2 #738

merged 1 commit into from Jun 29, 2017

Conversation

@terrelln
Copy link
Contributor

terrelln commented Jun 27, 2017

  • Use div_u64 for division of u64s (2/4)
  • Reduce stack usage of FSE_buildCTable_wksp(), FSE_compress_wksp, and HUF_readDTableX4 (2/4)
@terrelln
Copy link
Contributor Author

terrelln commented Jun 27, 2017

I'm going to reduce the stack usage of a few more functions.

U32 tableLog = 0;
U32 nbSymbols = 0;
size_t readSize;

if ((HUF_SYMBOLVALUE_MAX + 1) * (sizeof(U32) + sizeof(BYTE)) > workspaceSize)

This comment has been minimized.

@terrelln

terrelln Jun 27, 2017

Author Contributor

This is wrong, fixing it.

U32 count[MaxSeq + 1];
S16 norm[MaxSeq + 1];
U32 *count = (U32 *)zc->tmpCounters;
S16 *norm = (S16 *)(count + MaxSeq + 1);

This comment has been minimized.

@Cyan4973

Cyan4973 Jun 27, 2017

Contributor

I would recommend to re-order space
to have U32 areas first, followed by U16, then BYTE.
It naturally solves alignment risks.

This comment has been minimized.

@terrelln

terrelln Jun 27, 2017

Author Contributor

Since the next usage is inside of FSE_.*() functions, to make that work here I would have to put norm at the end of the workspace. Do you think its worth it?

I'll definitely update the other places though.

This comment has been minimized.

@Cyan4973

Cyan4973 Jun 28, 2017

Contributor

-Wcast-align is supposed to trigger a warning if one upgrade a U16 pointer into a U32 one,
even if you manually take good care of address alignment,
which is what happens below when norm get used as a base for workspace.

The order in memory can be different.
You can keep norm right after count if it's better,
but workspace will have to be based from count, because it is also a U32.

workspace = rankStart0 + HUF_TABLELOG_MAX + 2;
if ((BYTE *)workspace > wkspEnd)
return ERROR(tableLog_tooLarge);
workspaceSize = wkspEnd - (BYTE *)workspace;

This comment has been minimized.

@terrelln

terrelln Jun 27, 2017

Author Contributor

Fixing formatting.

size_t NCountLength;

workspace = counting + FSE_MAX_SYMBOL_VALUE + 1;
if ((BYTE *)workspace > wkspEnd)

This comment has been minimized.

@Cyan4973

Cyan4973 Jun 27, 2017

Contributor

In theory, such comparison is UB, because to be true, workspace must be outside of valid memory object.
Would it be possible to convert this test
into a size comparison ?
Something like if ( ((FSE_MAX_SYMBOL_VALUE+1)*sizeof(short)) > workspaceSize ) ?

U32 count[HUF_TABLELOG_MAX + 1];
S16 norm[HUF_TABLELOG_MAX + 1];
workspace = norm + HUF_TABLELOG_MAX + 1 + ((HUF_TABLELOG_MAX + 1) & 1);
if ((BYTE *)workspace > wkspEnd)

This comment has been minimized.

@Cyan4973

Cyan4973 Jun 27, 2017

Contributor

same comment here, about size comparison

U32 tableLog = 0;
U32 nbSymbols = 0;
size_t iSize;
void *const dtPtr = DTable + 1;
HUF_DEltX2 *const dt = (HUF_DEltX2 *)dtPtr;

workspace = rankVal + HUF_TABLELOG_ABSOLUTEMAX + 1;
if ((BYTE *)workspace > wkspEnd)

This comment has been minimized.

@Cyan4973

Cyan4973 Jun 27, 2017

Contributor

and same here

if ((BYTE *)(rankVal + HUF_TABLELOG_MAX) > wkspEnd)

workspace = rankStart0 + HUF_TABLELOG_MAX + 2;
if ((BYTE *)workspace > wkspEnd)

This comment has been minimized.

@Cyan4973

Cyan4973 Jun 27, 2017

Contributor

well...

CXXFLAGS += -std=c++11
CFLAGS += -g -O0
CXXFLAGS += -std=c++11 -g -O3
CFLAGS += -g -O3 -Wframe-larger-than=400

This comment has been minimized.

@Cyan4973

Cyan4973 Jun 27, 2017

Contributor

why 400 ?
I expected linux kernel robot to test 1024 ?

This comment has been minimized.

@terrelln

terrelln Jun 28, 2017

Author Contributor

Yeah it does, but the reviewer was concerned about stack usage, since the kernel stack is 16 KB at most. There were a few functions between 512 and 1024 bytes, so I fixed them up as well. After that, they were all below 400.

S16 norm[MaxSeq + 1];
U32 *count = (U32 *)zc->tmpCounters;
S16 *norm = (S16 *)(count + MaxSeq + 1);
U32 *const workspace = PTR_ALIGN((U32 *)(norm + MaxSeq + 1), sizeof(U32));

This comment has been minimized.

@Cyan4973

Cyan4973 Jun 28, 2017

Contributor

I know this is supposed to work, since PTR_ALIGN makes a good job at properly aligning the resulting ptr,
but still, from a compiler perspective, promoting a U16* into a U32* is ground for a -Wcast-align warning :
https://godbolt.org/g/vy7fR3

This comment has been minimized.

@terrelln

terrelln Jun 28, 2017

Author Contributor

Yeah, wrote this before I saw your second comment. I've fixed it so that it doesn't use the U16 * to derive the U32 *.

{
BYTE huffWeight[HUF_SYMBOLVALUE_MAX + 1];
U32 rankVal[HUF_TABLELOG_ABSOLUTEMAX + 1]; /* large enough for values from 0 to 16 */

This comment has been minimized.

@Cyan4973

Cyan4973 Jun 28, 2017

Contributor

minor : HUF_TABLELOG_ABSOLUTEMAX is quite small, it shouldn't worry stack size

spaceUsed += sizeof(sortedSymbol_t) * (HUF_SYMBOLVALUE_MAX + 1);
weightList = (BYTE *)workspace + spaceUsed;
spaceUsed += HUF_SYMBOLVALUE_MAX + 1;
spaceUsed = ALIGN(spaceUsed, sizeof(U32));

This comment has been minimized.

@Cyan4973

Cyan4973 Jun 28, 2017

Contributor

👍

{
BYTE huffWeight[HUF_SYMBOLVALUE_MAX + 1]; /* init not required, even though some static analyzer may complain */
U32 rankVal[HUF_TABLELOG_ABSOLUTEMAX + 1]; /* large enough for values from 0 to 16 */

This comment has been minimized.

@Cyan4973

Cyan4973 Jun 28, 2017

Contributor

minor : HUF_TABLELOG_ABSOLUTEMAX is quite small, it shouldn't worry stack size

This comment has been minimized.

@terrelln

terrelln Jun 28, 2017

Author Contributor

Yeah, but since we are already passing the workspace for huffWeight and HUF_readStats_wksp(), we might as well take advantage of it to reduce the stack a bit further.

@terrelln terrelln force-pushed the terrelln:lk-div branch from 5b6b77f to 18683b9 Jun 28, 2017
@terrelln
Copy link
Contributor Author

terrelln commented Jun 28, 2017

The patch is ready for review again. I've fixed the bugs in the stack computations that my fuzzers revealed. The fuzzers are still running.

spaceUsed += sizeof(S16) * (MaxSeq + 1);
spaceUsed = ALIGN(spaceUsed, sizeof(U32));

workspace = (U32 *)((BYTE *)zc->tmpCounters + spaceUsed);

This comment has been minimized.

@Cyan4973

Cyan4973 Jun 28, 2017

Contributor

If a U32* ptr is created from a BYTE* ptr,
even if the initial source was a U32*,
it's still a concern for -Wcast-align.

See "doubleCast" example :
https://godbolt.org/g/qrWbLq

Solution : express spaceUsed in U32 increment.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Jun 28, 2017

There are still some alignment promotion issues in many places in the code.
I've only commented one of them, but the same concern and solution is applicable to the rest of the code.

@terrelln terrelln force-pushed the terrelln:lk-div branch from 18683b9 to ebbda16 Jun 29, 2017
* Reduce stack usage of many zstd functions, none use over 388 B anymore.
* Remove an incorrect `const` in `xxhash`.
@terrelln terrelln force-pushed the terrelln:lk-div branch from ebbda16 to ce91b64 Jun 29, 2017
@Cyan4973 Cyan4973 merged commit 82879d5 into facebook:dev Jun 29, 2017
3 checks passed
3 checks passed
ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.