New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-33312: Fix clang ubsan out of bounds warnings in dict. #6537
Conversation
Fix clang ubsan (undefined behavior sanitizer) warnings in dictobject.c by adjusting how the internal struct _dictkeysobject shared keys structure is declared. This remains ABI compatible. We get rid of the union at the end of the struct being used for conveinence to avoid typecasting in favor of a simple appropriate minimum size int64_t[1] as [1] length arrays at the end of a struct are known to clang to be used for variable sized objects. A variable length array (VLA) would be more proper and simplify the dictobject.c code further by not having to subtract the size of the struct memeber in the three places it does size calculations, but PEP-007 does not allow those in CPython's coding standard today.
If MSVC on appveyor does not like the VLA I'll go back to [1] instead of [].
See https://bugs.python.org/issue33312 for discussion. |
#endif | ||
} dk_indices; | ||
Dynamically sized, SIZEOF_VOID_P is minimum. */ | ||
char dk_indices[]; /* char is required to avoid strict aliasing. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not it be unsigned char
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the DKIX_EMPTY constant is -1 and all of the types this was replacing are signed (as are the things we cast it to everywhere). so sticking with char made sense.
i'd prefer to say int8_t but given that references I've found only mention char and unsigned char in relation to strict aliasing I'm being conservative and exactly matching that.
.dk_indices = { .as_1 = {DKIX_EMPTY, DKIX_EMPTY, DKIX_EMPTY, DKIX_EMPTY, | ||
DKIX_EMPTY, DKIX_EMPTY, DKIX_EMPTY, DKIX_EMPTY}}, | ||
{DKIX_EMPTY, DKIX_EMPTY, DKIX_EMPTY, DKIX_EMPTY, | ||
DKIX_EMPTY, DKIX_EMPTY, DKIX_EMPTY, DKIX_EMPTY}, /* dk_indices */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the size of the dk_indices
field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a static initializer. it's my understanding that static initializing a VLA has the compiler allocate space for however many elements you enter.
example: char foo[] = "hello"
@@ -298,7 +298,7 @@ PyDict_Fini(void) | |||
2 : sizeof(int32_t)) | |||
#endif | |||
#define DK_ENTRIES(dk) \ | |||
((PyDictKeyEntry*)(&(dk)->dk_indices.as_1[DK_SIZE(dk) * DK_IXSIZE(dk)])) | |||
((PyDictKeyEntry*)(&((int8_t*)((dk)->dk_indices))[DK_SIZE(dk) * DK_IXSIZE(dk)])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the following expression look a tiny bit clearer to you?
((PyDictKeyEntry*)((int8_t*)((dk)->dk_indices) + DK_SIZE(dk) * DK_IXSIZE(dk))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe, but i'll still toss another pair of ()s in there for clarity:
((PyDictKeyEntry*)(((int8_t*)((dk)->dk_indices)) + DK_SIZE(dk) * DK_IXSIZE(dk))
even though i believe those are equivalent (the cast happens before the + ?)
Thanks @gpshead for the PR 🌮🎉.. I'm working now to backport this PR to: 3.6, 3.7. |
GH-6543 is a backport of this pull request to the 3.7 branch. |
…6537) Fix clang ubsan (undefined behavior sanitizer) warnings in dictobject.c by adjusting how the internal struct _dictkeysobject shared keys structure is declared. This remains ABI compatible. We get rid of the union at the end of the struct being used for conveinence to avoid typecasting in favor of char[] variable length array at the end of a struct. This is known to clang to be used for variable sized objects and will not cause an undefined behavior problem. Similarly, char arrays do not have strict aliasing undefined behavior when cast. PEP-007 does not currently list variable length arrays (VLAs) as allowed in our subset of C99. If this turns out to be a problem, the fix to this is to change the char `dk_indices[]` into `dk_indices[1]` and restore the three size computation subtractions this change removes: `- Py_MEMBER_SIZE(PyDictKeysObject, dk_indices)` If this works as is I'll make a separate PR to update PEP-007. (cherry picked from commit 397f1b2) Co-authored-by: Gregory P. Smith <greg@krypto.org>
…6537) Fix clang ubsan (undefined behavior sanitizer) warnings in dictobject.c by adjusting how the internal struct _dictkeysobject shared keys structure is declared. This remains ABI compatible. We get rid of the union at the end of the struct being used for conveinence to avoid typecasting in favor of char[] variable length array at the end of a struct. This is known to clang to be used for variable sized objects and will not cause an undefined behavior problem. Similarly, char arrays do not have strict aliasing undefined behavior when cast. PEP-007 does not currently list variable length arrays (VLAs) as allowed in our subset of C99. If this turns out to be a problem, the fix to this is to change the char `dk_indices[]` into `dk_indices[1]` and restore the three size computation subtractions this change removes: `- Py_MEMBER_SIZE(PyDictKeysObject, dk_indices)` If this works as is I'll make a separate PR to update PEP-007. (cherry picked from commit 397f1b2) Co-authored-by: Gregory P. Smith <greg@krypto.org>
GH-6544 is a backport of this pull request to the 3.6 branch. |
…H-6543) Fix clang ubsan (undefined behavior sanitizer) warnings in dictobject.c by adjusting how the internal struct _dictkeysobject shared keys structure is declared. This remains ABI compatible. We get rid of the union at the end of the struct being used for conveinence to avoid typecasting in favor of char[] variable length array at the end of a struct. This is known to clang to be used for variable sized objects and will not cause an undefined behavior problem. Similarly, char arrays do not have strict aliasing undefined behavior when cast. PEP-007 does not currently list variable length arrays (VLAs) as allowed in our subset of C99. If this turns out to be a problem, the fix to this is to change the char `dk_indices[]` into `dk_indices[1]` and restore the three size computation subtractions this change removes: `- Py_MEMBER_SIZE(PyDictKeysObject, dk_indices)` If this works as is I'll make a separate PR to update PEP-007. (cherry picked from commit 397f1b2)
Fix clang ubsan (undefined behavior sanitizer) warnings in dictobject.c by
adjusting how the internal struct _dictkeysobject shared keys structure is
declared.
This remains ABI compatible. We get rid of the union at the end of the
struct being used for conveinence to avoid typecasting in favor of a
simple appropriate minimum size int64_t[1] as [1] length arrays at the
end of a struct are known to clang to be used for variable sized objects.
A variable length array (VLA) would be more proper and simplify the
dictobject.c code further by not having to subtract the size of the struct
memeber in the three places it does size calculations, but PEP-007 does not
allow those in CPython's coding standard today.
https://bugs.python.org/issue33312