Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-40120: Fix unbounded struct char[] undefined behavior. #19232

Closed
wants to merge 5 commits into from

Conversation

gpshead
Copy link
Member

@gpshead gpshead commented Mar 30, 2020

It is undefined behaviour if index is beyond array size. The workaround
for this is the standard C99 feature known as "Flexible array member".

https://en.wikipedia.org/wiki/Flexible_array_member

https://bugs.python.org/issue40120

It is undefined behaviour if index is beyond array size. The workaround
for this is standard C99 feature known as "Flexible array member".

https://en.wikipedia.org/wiki/Flexible_array_member
@gpshead
Copy link
Member Author

gpshead commented Mar 30, 2020

(yes a BPO issue is needed, i'm starting with the PR to see how the CI bots fare)

@gpshead gpshead requested a review from vstinner March 30, 2020 20:16
@gpshead
Copy link
Member Author

gpshead commented Mar 30, 2020

It looks like we thankfully got rid of any use of sizeof(PyBytesObject) a long time ago (which never really made much sense anyways) as part of https://bugs.python.org/issue4445 so I believe this should have no negative consequences.

@gpshead
Copy link
Member Author

gpshead commented Mar 30, 2020

There are other instances of this pattern in our codebase. Objects/unicodeobject.c has encoding_map who's level32 member appears to use this? Grepping I don't immediately see others.

This one was found by a coworker working to enable clang analyzer.

@vstinner
Copy link
Member

It is undefined behaviour if index is beyond array size.

Well, there was no other choice in ISO C89 than using char ob_sval[1];, no?

Is char ob_sval[]; supported by the C compiler supported by CPython? Like Visual Studio, GCC, clang and xlc (AIX)? (I don't think that we officially support xlc, but it's more "best effort" support.)

You can use the new buildbot label to test you change on more platforms.

@gpshead gpshead changed the title Fix undefined behavior in PyBytesObject. bpo-40120: Fix undefined behavior in PyBytesObject. Mar 30, 2020
@gpshead gpshead changed the title bpo-40120: Fix undefined behavior in PyBytesObject. bpo-40120: Fix unbounded struct char[] undefined behavior. Mar 30, 2020
@gpshead
Copy link
Member Author

gpshead commented Mar 30, 2020

Moving discussion to the bug. (TL;DR I will do the buildbot tests and add this feature to PEP 7 unless we find an important platform that balks about it). Not being explicit in PEP7 yet suggests this should not be backported.

@gpshead gpshead self-assigned this Mar 30, 2020
This is really nothing more than a change test, it needs to match the C
struct when no level23 mappings will be allocated.  An empty char[] does
not consume any C sizeof() space whereas the previous `char[1]` consumed
one alignment size worth, even when unused.
@gpshead gpshead added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 30, 2020
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @gpshead for commit cd483f6 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@@ -28,6 +28,9 @@ _Py_IDENTIFIER(__bytes__);

Using PyBytesObject_SIZE instead of sizeof(PyBytesObject) saves
3 bytes per string allocation on a typical system.

The + 1 accounts for the trailing \0 byte that we include as a safety
measure for code that treats the underlying char * as a C string.
*/
#define PyBytesObject_SIZE (offsetof(PyBytesObject, ob_sval) + 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, you made PyBytesObject 1 byte smaller, and you didn't have to substract 1 somewhere? Does it mean that Python overallocates 1 byte since forever?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thankfully not. Because this code used offsetof the macro did not need to change. The offset never changes. The existing + 1 in the macro is easy to misinterpret as having been there to account for the [1] but that isn't how it was used - PyBytesObject is always allocated with that one extra byte that is filled with a NUL for C safety for PyBytes_AsString users.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The [1] thing is used in more places:

  • PyObject *f_localsplus[1]; in PyFrameObject
  • PyObject *ob_item[1]; in PyTupleObject
  • void *elements[1]; in asdl_seq and int elements[1];` in asdl_int_seq
  • digit ob_digit[1]; in PyLongObject
  • Py_ssize_t ob_array[1]; in PyMemoryViewObject
  • _tracemalloc.c: frame_t frames[1]; in traceback_t

Do we need to update all of them? Do you want to update them all?

@vstinner
Copy link
Member

Refleaks buildbots are likely caused by https://bugs.python.org/issue40115 (known leak)

@kadler
Copy link
Contributor

kadler commented Nov 14, 2020

Is char ob_sval[]; supported by the C compiler supported by CPython? Like Visual Studio, GCC, clang and xlc (AIX)? (I don't think that we officially support xlc, but it's more "best effort" support.)

Yep, this is supported by xlc.

@@ -8208,14 +8208,14 @@ struct encoding_map {
PyObject_HEAD
unsigned char level1[32];
int count2, count3;
unsigned char level23[1];
unsigned char level23[];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gpshead: Can you please write a PR which only contains this change? Since it doesn't touch the public C API, C++ compatibility is not an issue (I don't think that Python can be built with a C++ compiler). We can start with this change and see how it goes? See also the discussion at python/peps#1349

@@ -5,7 +5,7 @@
typedef struct {
PyObject_VAR_HEAD
Py_hash_t ob_shash;
char ob_sval[1];
char ob_sval[];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to:

  • Leave the structure as it is
  • Use a code search to check which C extension access directly PyBytesObject structure members (well, first check Cython and pybind11)
  • Prepare affected C extensions
  • Once the number of affected C extensions is low enough: make the structure opaque, enforce the usage of public C functions.

@gpshead
Copy link
Member Author

gpshead commented Jan 21, 2022

see the bug, a new PR can be made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants