-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
bytearray.extend lacks overflow check when increasing buffer #71694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
As the title, bytearray.extend simply use |
Ohh, sorry for the disturb. I made a mistake. This is not needed and now close it. |
Sorry, after checking again, this is still needed but maybe |
It is possible to make an infinite iterable, e.g. iter(int, 1), so it is definitely worth checking for overflow. The patch looks okay to me. An alternative would be to raise the error without trying to allocate Py_SSIZE_T_MAX first, but I am okay with either way. |
Nice to see the real example. I don't think of that. |
Totally agreed with Martin. |
So would you like to merge it like this or Martin's alternative way? But honestly speaking I don't get Martin's point, "without trying to allocate Py_SSIZE_T_MAX first", where is this place? |
if (len == PY_SSIZE_T_MAX) is necessary for the case that the iterable is already PY_SSIZE_T_MAX items. However it could be moved inside the *other* if because if (len == PY_SSIZE_T_MAX) should also fail the overflow check. However, I believe it is theoretical at most with stuff that Python supports that it would even be possible to allocate an array of PY_SSIZE_T_MAX pointers. The reason is that the maximum size of object can be only that of In any case, this would be another place where my proposed macro "SUM_OVERFLOWS_PY_SSIZE_T" or something would be in order to make it read
which would make it easier to spot mistakes in the sign preceding 1. |
Thanks for the analysis Antti. I don't think if (len == PY_SSIZE_T_MAX) can be moved inside the *other* if. Their handlings are different. if (len == PY_SSIZE_T_MAX) is True, it should exit immediately. But in the *other* if, you can still have a try to allocate PY_SSIZE_T_MAX memory. As for your overflow macro, I think it's not very useful. First, not only Py_ssize_t can overflow, all signed types can. So a single SUM_OVERFLOWS_PY_SSIZE_T is not enough. Second, current overflow check pattern like a > PY_SSIZE_T_MAX - b is very obvious in my opinion. |
Not particularly related, but the special fast case in Objects/listobject.c:811, listextend(), also seems to lack an overflow check. “An alternative would be to raise the error without trying to allocate Py_SSIZE_T_MAX first”: what I meant was removing the special case to allocate PY_SSIZE_T_MAX. As soon as it attempts to overallocate 2+ GiB of memory it fails. Something more like addition = len >> 1;
if (addition > PY_SSIZE_T_MAX - len - 1) {
/* . . . */
return PyErr_NoMemory();
}
buf_size = len + addition; Antti: in this case we are allocating an array of _bytes_, not pointers. So maybe it is possible to reach the limit with a 32-bit address space. |
I can't totally agree the point. The code means every time we increase the buffer by half the current length. So when the length arrives 2/3 * PY_SSIZE_T_MAX it's going to overflow. There is a 1/3 * PY_SSIZE_T_MAX gap between the theoretical upper limit. I think leaving such a gap and exiting is somewhat too rude. So in such case I make it have a try. |
Yeah I see your point. Anyway I think the current patch is fine. |
Only 1 way to find out, test it till it breaks. :P |
Ah indeed, this is a bytearray and it is indeed possible to theoretically allocate PY_SSIZE_T_MAX bytes, if on an architecture that does segmented memory. As for if (addition > PY_SSIZE_T_MAX - len - 1) { it is very clear to *us* but it is not quite self-documenting on why to do it this way to someone who doesn't know undefined behaviours in C (hint: next to no one knows, judging from the amount of complaints that the GCC "bug" received), instead of say
Where the INT_ADD_OVERFLOW would have a comment above explaining why it has to be done that way. But more discussion about it at https://bugs.python.org/issue1621 |
New changeset 54cc0480904c by Martin Panter in branch '3.5': New changeset 6e166b66aa44 by Martin Panter in branch '2.7': New changeset 646ad4894c32 by Martin Panter in branch 'default': |
Committed without any macro for the time being |
Thanks for your work. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: