Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-33312: Fix clang ubsan out of bounds warnings in dict. #6537

Merged
merged 3 commits into from Apr 20, 2018

Conversation

gpshead
Copy link
Member

@gpshead gpshead commented Apr 19, 2018

Fix clang ubsan (undefined behavior sanitizer) warnings in dictobject.c by
adjusting how the internal struct _dictkeysobject shared keys structure is
declared.

This remains ABI compatible. We get rid of the union at the end of the
struct being used for conveinence to avoid typecasting in favor of a
simple appropriate minimum size int64_t[1] as [1] length arrays at the
end of a struct are known to clang to be used for variable sized objects.

A variable length array (VLA) would be more proper and simplify the
dictobject.c code further by not having to subtract the size of the struct
memeber in the three places it does size calculations, but PEP-007 does not
allow those in CPython's coding standard today.

https://bugs.python.org/issue33312

Fix clang ubsan (undefined behavior sanitizer) warnings in dictobject.c by
adjusting how the internal struct _dictkeysobject shared keys structure is
declared.

This remains ABI compatible.  We get rid of the union at the end of the
struct being used for conveinence to avoid typecasting in favor of a
simple appropriate minimum size int64_t[1] as [1] length arrays at the
end of a struct are known to clang to be used for variable sized objects.

A variable length array (VLA) would be more proper and simplify the
dictobject.c code further by not having to subtract the size of the struct
memeber in the three places it does size calculations, but PEP-007 does not
allow those in CPython's coding standard today.
If MSVC on appveyor does not like the VLA I'll go back to [1] instead of [].
@gpshead
Copy link
Member Author

gpshead commented Apr 19, 2018

See https://bugs.python.org/issue33312 for discussion.

#endif
} dk_indices;
Dynamically sized, SIZEOF_VOID_P is minimum. */
char dk_indices[]; /* char is required to avoid strict aliasing. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not it be unsigned char?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the DKIX_EMPTY constant is -1 and all of the types this was replacing are signed (as are the things we cast it to everywhere). so sticking with char made sense.

i'd prefer to say int8_t but given that references I've found only mention char and unsigned char in relation to strict aliasing I'm being conservative and exactly matching that.

.dk_indices = { .as_1 = {DKIX_EMPTY, DKIX_EMPTY, DKIX_EMPTY, DKIX_EMPTY,
DKIX_EMPTY, DKIX_EMPTY, DKIX_EMPTY, DKIX_EMPTY}},
{DKIX_EMPTY, DKIX_EMPTY, DKIX_EMPTY, DKIX_EMPTY,
DKIX_EMPTY, DKIX_EMPTY, DKIX_EMPTY, DKIX_EMPTY}, /* dk_indices */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the size of the dk_indices field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a static initializer. it's my understanding that static initializing a VLA has the compiler allocate space for however many elements you enter.

example: char foo[] = "hello"

@@ -298,7 +298,7 @@ PyDict_Fini(void)
2 : sizeof(int32_t))
#endif
#define DK_ENTRIES(dk) \
((PyDictKeyEntry*)(&(dk)->dk_indices.as_1[DK_SIZE(dk) * DK_IXSIZE(dk)]))
((PyDictKeyEntry*)(&((int8_t*)((dk)->dk_indices))[DK_SIZE(dk) * DK_IXSIZE(dk)]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the following expression look a tiny bit clearer to you?

((PyDictKeyEntry*)((int8_t*)((dk)->dk_indices) + DK_SIZE(dk) * DK_IXSIZE(dk))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe, but i'll still toss another pair of ()s in there for clarity:

((PyDictKeyEntry*)(((int8_t*)((dk)->dk_indices)) + DK_SIZE(dk) * DK_IXSIZE(dk))

even though i believe those are equivalent (the cast happens before the + ?)

@gpshead gpshead merged commit 397f1b2 into python:master Apr 20, 2018
@miss-islington
Copy link
Contributor

Thanks @gpshead for the PR 🌮🎉.. I'm working now to backport this PR to: 3.6, 3.7.
🐍🍒⛏🤖

@gpshead gpshead deleted the bjp_issue33312 branch April 20, 2018 05:41
@bedevere-bot
Copy link

GH-6543 is a backport of this pull request to the 3.7 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Apr 20, 2018
…6537)

Fix clang ubsan (undefined behavior sanitizer) warnings in dictobject.c by
adjusting how the internal struct _dictkeysobject shared keys structure is
declared.

This remains ABI compatible.  We get rid of the union at the end of the
struct being used for conveinence to avoid typecasting in favor of char[]
variable length array at the end of a struct. This is known to clang to be
used for variable sized objects and will not cause an undefined behavior
problem.  Similarly, char arrays do not have strict aliasing undefined
behavior when cast.

PEP-007 does not currently list variable length arrays (VLAs) as allowed
in our subset of C99.  If this turns out to be a problem, the fix to this is
to change the char `dk_indices[]` into `dk_indices[1]` and restore the
three size computation subtractions this change removes:
  `- Py_MEMBER_SIZE(PyDictKeysObject, dk_indices)`

If this works as is I'll make a separate PR to update PEP-007.
(cherry picked from commit 397f1b2)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Apr 20, 2018
…6537)

Fix clang ubsan (undefined behavior sanitizer) warnings in dictobject.c by
adjusting how the internal struct _dictkeysobject shared keys structure is
declared.

This remains ABI compatible.  We get rid of the union at the end of the
struct being used for conveinence to avoid typecasting in favor of char[]
variable length array at the end of a struct. This is known to clang to be
used for variable sized objects and will not cause an undefined behavior
problem.  Similarly, char arrays do not have strict aliasing undefined
behavior when cast.

PEP-007 does not currently list variable length arrays (VLAs) as allowed
in our subset of C99.  If this turns out to be a problem, the fix to this is
to change the char `dk_indices[]` into `dk_indices[1]` and restore the
three size computation subtractions this change removes:
  `- Py_MEMBER_SIZE(PyDictKeysObject, dk_indices)`

If this works as is I'll make a separate PR to update PEP-007.
(cherry picked from commit 397f1b2)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
@bedevere-bot
Copy link

GH-6544 is a backport of this pull request to the 3.6 branch.

gpshead pushed a commit that referenced this pull request Apr 20, 2018
…H-6543)

Fix clang ubsan (undefined behavior sanitizer) warnings in dictobject.c by
adjusting how the internal struct _dictkeysobject shared keys structure is
declared.

This remains ABI compatible.  We get rid of the union at the end of the
struct being used for conveinence to avoid typecasting in favor of char[]
variable length array at the end of a struct. This is known to clang to be
used for variable sized objects and will not cause an undefined behavior
problem.  Similarly, char arrays do not have strict aliasing undefined
behavior when cast.

PEP-007 does not currently list variable length arrays (VLAs) as allowed
in our subset of C99.  If this turns out to be a problem, the fix to this is
to change the char `dk_indices[]` into `dk_indices[1]` and restore the
three size computation subtractions this change removes:
  `- Py_MEMBER_SIZE(PyDictKeysObject, dk_indices)`

If this works as is I'll make a separate PR to update PEP-007.
(cherry picked from commit 397f1b2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants