Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign up[linux-kernel] Update patches for v2 #738
Conversation
|
I'm going to reduce the stack usage of a few more functions. |
| U32 tableLog = 0; | ||
| U32 nbSymbols = 0; | ||
| size_t readSize; | ||
|
|
||
| if ((HUF_SYMBOLVALUE_MAX + 1) * (sizeof(U32) + sizeof(BYTE)) > workspaceSize) |
This comment has been minimized.
This comment has been minimized.
| U32 count[MaxSeq + 1]; | ||
| S16 norm[MaxSeq + 1]; | ||
| U32 *count = (U32 *)zc->tmpCounters; | ||
| S16 *norm = (S16 *)(count + MaxSeq + 1); |
This comment has been minimized.
This comment has been minimized.
Cyan4973
Jun 27, 2017
•
Contributor
I would recommend to re-order space
to have U32 areas first, followed by U16, then BYTE.
It naturally solves alignment risks.
This comment has been minimized.
This comment has been minimized.
terrelln
Jun 27, 2017
Author
Contributor
Since the next usage is inside of FSE_.*() functions, to make that work here I would have to put norm at the end of the workspace. Do you think its worth it?
I'll definitely update the other places though.
This comment has been minimized.
This comment has been minimized.
Cyan4973
Jun 28, 2017
Contributor
-Wcast-align is supposed to trigger a warning if one upgrade a U16 pointer into a U32 one,
even if you manually take good care of address alignment,
which is what happens below when norm get used as a base for workspace.
The order in memory can be different.
You can keep norm right after count if it's better,
but workspace will have to be based from count, because it is also a U32.
| workspace = rankStart0 + HUF_TABLELOG_MAX + 2; | ||
| if ((BYTE *)workspace > wkspEnd) | ||
| return ERROR(tableLog_tooLarge); | ||
| workspaceSize = wkspEnd - (BYTE *)workspace; |
This comment has been minimized.
This comment has been minimized.
| size_t NCountLength; | ||
|
|
||
| workspace = counting + FSE_MAX_SYMBOL_VALUE + 1; | ||
| if ((BYTE *)workspace > wkspEnd) |
This comment has been minimized.
This comment has been minimized.
Cyan4973
Jun 27, 2017
Contributor
In theory, such comparison is UB, because to be true, workspace must be outside of valid memory object.
Would it be possible to convert this test
into a size comparison ?
Something like if ( ((FSE_MAX_SYMBOL_VALUE+1)*sizeof(short)) > workspaceSize ) ?
| U32 count[HUF_TABLELOG_MAX + 1]; | ||
| S16 norm[HUF_TABLELOG_MAX + 1]; | ||
| workspace = norm + HUF_TABLELOG_MAX + 1 + ((HUF_TABLELOG_MAX + 1) & 1); | ||
| if ((BYTE *)workspace > wkspEnd) |
This comment has been minimized.
This comment has been minimized.
| U32 tableLog = 0; | ||
| U32 nbSymbols = 0; | ||
| size_t iSize; | ||
| void *const dtPtr = DTable + 1; | ||
| HUF_DEltX2 *const dt = (HUF_DEltX2 *)dtPtr; | ||
|
|
||
| workspace = rankVal + HUF_TABLELOG_ABSOLUTEMAX + 1; | ||
| if ((BYTE *)workspace > wkspEnd) |
This comment has been minimized.
This comment has been minimized.
| if ((BYTE *)(rankVal + HUF_TABLELOG_MAX) > wkspEnd) | ||
|
|
||
| workspace = rankStart0 + HUF_TABLELOG_MAX + 2; | ||
| if ((BYTE *)workspace > wkspEnd) |
This comment has been minimized.
This comment has been minimized.
| CXXFLAGS += -std=c++11 | ||
| CFLAGS += -g -O0 | ||
| CXXFLAGS += -std=c++11 -g -O3 | ||
| CFLAGS += -g -O3 -Wframe-larger-than=400 |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
terrelln
Jun 28, 2017
Author
Contributor
Yeah it does, but the reviewer was concerned about stack usage, since the kernel stack is 16 KB at most. There were a few functions between 512 and 1024 bytes, so I fixed them up as well. After that, they were all below 400.
| S16 norm[MaxSeq + 1]; | ||
| U32 *count = (U32 *)zc->tmpCounters; | ||
| S16 *norm = (S16 *)(count + MaxSeq + 1); | ||
| U32 *const workspace = PTR_ALIGN((U32 *)(norm + MaxSeq + 1), sizeof(U32)); |
This comment has been minimized.
This comment has been minimized.
Cyan4973
Jun 28, 2017
Contributor
I know this is supposed to work, since PTR_ALIGN makes a good job at properly aligning the resulting ptr,
but still, from a compiler perspective, promoting a U16* into a U32* is ground for a -Wcast-align warning :
https://godbolt.org/g/vy7fR3
This comment has been minimized.
This comment has been minimized.
terrelln
Jun 28, 2017
Author
Contributor
Yeah, wrote this before I saw your second comment. I've fixed it so that it doesn't use the U16 * to derive the U32 *.
| { | ||
| BYTE huffWeight[HUF_SYMBOLVALUE_MAX + 1]; | ||
| U32 rankVal[HUF_TABLELOG_ABSOLUTEMAX + 1]; /* large enough for values from 0 to 16 */ |
This comment has been minimized.
This comment has been minimized.
Cyan4973
Jun 28, 2017
Contributor
minor : HUF_TABLELOG_ABSOLUTEMAX is quite small, it shouldn't worry stack size
| spaceUsed += sizeof(sortedSymbol_t) * (HUF_SYMBOLVALUE_MAX + 1); | ||
| weightList = (BYTE *)workspace + spaceUsed; | ||
| spaceUsed += HUF_SYMBOLVALUE_MAX + 1; | ||
| spaceUsed = ALIGN(spaceUsed, sizeof(U32)); |
This comment has been minimized.
This comment has been minimized.
| { | ||
| BYTE huffWeight[HUF_SYMBOLVALUE_MAX + 1]; /* init not required, even though some static analyzer may complain */ | ||
| U32 rankVal[HUF_TABLELOG_ABSOLUTEMAX + 1]; /* large enough for values from 0 to 16 */ |
This comment has been minimized.
This comment has been minimized.
Cyan4973
Jun 28, 2017
Contributor
minor : HUF_TABLELOG_ABSOLUTEMAX is quite small, it shouldn't worry stack size
This comment has been minimized.
This comment has been minimized.
terrelln
Jun 28, 2017
Author
Contributor
Yeah, but since we are already passing the workspace for huffWeight and HUF_readStats_wksp(), we might as well take advantage of it to reduce the stack a bit further.
|
The patch is ready for review again. I've fixed the bugs in the stack computations that my fuzzers revealed. The fuzzers are still running. |
| spaceUsed += sizeof(S16) * (MaxSeq + 1); | ||
| spaceUsed = ALIGN(spaceUsed, sizeof(U32)); | ||
|
|
||
| workspace = (U32 *)((BYTE *)zc->tmpCounters + spaceUsed); |
This comment has been minimized.
This comment has been minimized.
Cyan4973
Jun 28, 2017
Contributor
If a U32* ptr is created from a BYTE* ptr,
even if the initial source was a U32*,
it's still a concern for -Wcast-align.
See "doubleCast" example :
https://godbolt.org/g/qrWbLq
Solution : express spaceUsed in U32 increment.
|
There are still some alignment promotion issues in many places in the code. |
* Reduce stack usage of many zstd functions, none use over 388 B anymore. * Remove an incorrect `const` in `xxhash`.
terrelln commentedJun 27, 2017