-
-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ubsan undefined behavior sanitizer flags struct _dictkeysobject (PyDictKeysObj) #77493
Comments
Build CPython (master in this case - though I originally noticed the problem when building a 3.6 tree) as follows with clang installed: build$ LD=clang-5.0 LDFLAGS=-fsanitize=undefined CC=clang-5.0 CXX=clang-5.0 CFLAGS=-fsanitize=undefined CXXFLAGS=-fsanitize=undefined ../gpshead/configure ... notice many of the warnings scroll by during the build itself as it executes the interpreter then execute it yourself at the end and you'll get a bunch of these: ../gpshead/Objects/dictobject.c:547:12: runtime error: index 16 out of bounds for type 'int8_t [8]' At issue is the hash table here: https://github.com/python/cpython/blob/3.7/Objects/dict-common.h which is intentionally meant to be indexed "out of bounds" off the end of the struct. I'm not a strict C language definition so I don't know if that is _supposed_ to be defined behavior as we all tend to assume it is in C or not. If it is supposed to be okay, we should be able to annotate it as such to avoid the warning under ubsan builds. If it is not, we need to change the way this is written. |
Yeah, I've run into this before. The "correct" thing to do is use C99 VLAs. Unfortunately, that doesn't work for PyDictKeysObject because it really wants a union of VLAs but that isn't supported. The best I could do is making a struct for every possible member of the union. See the attached patch. I didn't land it because it's gross. OTOH, undefined behavior is bad, so I'm okay landing this if you don't find it too revolting. |
I think it is worth getting such a change in. its arguable that the compiler should be able to see through the union for this use case but I assume such a fix would only land in a recent clang version. i'm attaching a variant on your patch that doesn't cast entire structs but just casts the VLA. thoughts? |
notably, C99 variable length arrays syntax is not mentioned as allowed in https://www.python.org/dev/peps/pep-0007/. If we want to use VLAs, that should be clarified. But both our solutions should work with [1] instead of [] which is allowed by the ubsan checker. it means keeping the annoying "- Py_MEMBER_SIZE(...)" of the array on all size calculations (but because of that the patch is smaller). |
Your PR is basically what we did prior to 186122e. The problem is that may run afoul of different UB, namely strict aliasing. (Though, I suppose we could probably also avoid that by making dk_indices char[].) If VLAs work with whatever MSVC we're using, it seems fine to add it to PEP-7. At worst, we can #ifdef sanitizer it. |
By my reading on how strict aliasing works, I think just changing the int64_t[1] or int32_t[1] in my PR to char[1] will work as char is always assumed to alias? the clang ubsan i was testing my PR against wasn't warning me about strict aliasing. :/ |
I'm going to see what appveyor says with the VLA code on the PR. I've updated it to use char[]. |
My PR is almost a revert of 186122e so I'm curious what your motivation behind that change to use the union was? |
I was fixing strict aliasing problems in the code and I thought the union was cleaner than a bunch of casts. |
Why does the code even need the flexible struct member? If you use the surrounding backing store directly, the aliasing issue disappears if the backing store is untyped memory (if not, you have the aliasing problem with the rest of the struct anyway). |
If you think this should be written differently, please propose it in a PR so we can see what you are suggesting. An unbounded member at the end of a struct is quite a common practice in C. |
Greg, this change has broken some buildbots: |
ned: yes, it supposedly has. test_gdb. in a manner i cannot reproduce or debug without error output that indicates anything about the problem. test_gdb passes on my systems. |
I see one line in Tools/gdb/libpython.py that may be related. i'll try changing that. the only way i have to _test_ it is to merge it into master. |
New changeset 53f67d4 by Gregory P. Smith in branch 'master': |
New changeset 11659d0 by Gregory P. Smith (Miss Islington (bot)) in branch '3.7': |
Thanks, Greg, looking better! |
Sorry for the breakage. I'm glad the fix was that greppable and easy given I had no idea where to start otherwise. :) This is in for 3.7. The PR for 3.6 is pending, i want to let it out in the next 3.7 builds for a while before i merge it into 3.6 even though I've got pretty high confidence at this point. bpo-33321 filed to track getting an ubsan buildbot setup. Also, thank you Benjamin for your patch and pointing out the strict aliasing issue. so many undefined behaviors. it's a wonder the language even works. :) |
automated backport of the gdb/libpython change fails on 3.6, if this is needed there (I haven't looked into the code on that branch), it's a tiny change to apply. i'll leave that to the 3.6 RM to decide if at all. |
This issue does not seem to me to be a security issue so would not meet the criteria for backporting to 3.6. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: