-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
various issues due to misuse of PySlice_GetIndicesEx #72054
Comments
Here I will describe 6 issues with various core objects (bytearray, list) and Common to them all is that they arise due to a misuse of the function This type of issue results in out-of-bounds array indexing which leads to memory For each issue I've attached a proof-of-concept script which either prints Issue 1: out-of-bounds indexing when taking a bytearray's subscript While taking the subscript of a bytearray, the function bytearray_subscript in Some of these indices might be objects with an __index__ method, and thus If the evaluation of the indices modifies the bytearray, the indices might no Here is a PoC which lets us read out 64 bytes of uninitialized memory from the --- class X:
def __index__(self):
b[:] = []
return 1
b = bytearray(b"A"*0x1000)
print(b[0:64:X()]) Here's the result on my system: $ ./python poc17.py
bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\xb0\xce\x86\x9ff\x7f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00') Issue 2: memory corruption in bytearray_ass_subscript This issue is similar to the one above. The problem exists when assigning to a Here's a PoC which leads to memory corruption of the heap: --- class X:
def __index__(self):
del b[0:0x10000]
return 1
b = bytearray(b"A"*0x10000)
b[0:0x8000:X()] = bytearray(b"B"*0x8000) Here's the result of running it: (gdb) r poc20.py Issue 3: use-after-free when taking the subscript of a list This issue is similar to the one above, but it occurs when taking the subscript --- class X:
def __index__(self):
b[:] = [1, 2, 3]
return 2
b = [123]*0x1000
print(b[0:64:X()]) It results in a segfault here because of a use-after-free: (gdb) run ./poc18.py Issue 4: use-after-free when assigning to a list via subscripting The same type of issue exists in list_ass_subscript where we assign to the list --- class X:
def __index__(self):
b[:] = [1, 2, 3]
return 2
b = [123]*0x1000
b[0:64:X()] = [0]*32 (gdb) r poc19.py Issue 5: out-of-bounds indexing in array_subscr Same type of issue. The problem is in the function array_subscr in Here's a PoC which leaks and prints uninitialized memory from the heap: --- import array
class X:
def __index__(self):
del a[:]
a.append(2)
return 1
a = array.array("b")
for _ in range(0x10):
a.append(1) print(a[0:0x10:X()]) --- And the result: $ ./python poc22.py
array('b', [2, -53, -53, -53, -5, -5, -5, -5, -5, -5, -5, -5, 0, 0, 0, 0]) Issue 6: out-of-bounds indexing in array_ass_subscr Same type of issue, also in the array module. Here's a PoC which segfaults here: --- import array
class X:
def __index__(self):
del a[:]
return 1
a = array.array("b")
a.frombytes(b"A"*0x100)
del a[::X()] How should these be fixed? I would suggest that in each instance we could add a check after calling (By the way: these issues might also exist in 2.7, I did not check.) |
I presume you are suggesting to raise if the length changes. This is similar to raising when a dict is mutated while iterating. Note that we do not do this with mutable sequences. (If the iteration is stopped with out-of-memory error, so be it.) An alternate approach would be to first fully evaluate start, stop, step , *and then length*, to ints, in that order, before using any of them. In particular, have everything stable before comparing and adjusting start and stop to length. This way, slices would continue to always work, barring other exceptions in __index__ or __length__. |
Even list suffers from this bug if slicing step is not 1. class X:
def __index__(self):
del a[:]
return 1
a = [0]
a[:X():2] |
There is really one immediate issue: PySlice_GetIndicesEx in Objects/sliceobject.c takes as inputs a slice object and a sequence length, but not the sequence itself. It starts with "/* this is harder to get right than you might think */" Yep. It assumes both inputs are constants. However, it thrice calls _PyEval_SliceIndex(intlike) (Python/ceval.c). This call in turn may call PyNunber_AsSsize_t(intlike), which calls intlike.__index__. At least in toy examples, this last can change an underlying mutable sequence and hence its length. Since the calculation of the returned start, stop, step, and slicelength depend on the length, the result is that PySlice_GetIndeces can indirectly invalidate its own results. Side effects in __index__ also affect indexing class X:
def __index__(self):
b[:] = []
return 1
b = [1,2,3,4]
b[X()] Traceback (most recent call last):
File "F:\Python\mypy\tem.py", line 7, in <module>
print(b[X()])
IndexError: list index out of range Similarly, "b[X()] = 4" results in "list assignment index ...". Crashes seem not possible as the conversion to a real int seems to always be done locally in the sequence_subscript method. See for instance list_subscript and list_ass_subscript in Objects/listojbect.c. If the subscript is an index, it is *first* converted to int with PyNumber_AsSsize_t *before* being possibly incremented by PySize(self) and compared to the same. Side-effects also affect index methods. class X:
def __index__(self):
b[:] = []
return 1
b = [1]
b.index(1, X()) Traceback (most recent call last):
File "F:\Python\mypy\tem.py", line 7, in <module>
print(b.index(1, X()))
ValueError: 1 is not in list For tuple/list/deque.index(val, start, stop), start and stop are converted to int with _PyEval_SliceIndex, as with slices. However, they make this call as part of an initial PyArg_ParseTuple call, before adjusting them to the length of the sequence. So again, no crash is possible. I suppose there are other places where PyNumber_AsSsize_t is called, and where a crash *might* be possible, and should be checked, But back to slicing. Action on this issue requires a policy decision among three options.
Counter arguments: A. Our usual policy is that pure Python code only using the stdlib (minus ctypes) should not crash. B. The writer and user of 'intlike' might be different people. Question: does 'no crash' apply to arbitrary side-effects in special methods?
I will ask on pydev which, if any, of the 4 possible patches would be best. |
This is a toy example that exposes the problem, but the problem itself is not a toy problem. The key point is that calculating slice indices cause executing Python code and releases GIL. In multithread program a sequence can be changed not in toy __index__ method, but in other thread, in legitimate code. This is very hardly reproducible bug. Variants B are not efficient. To determine the size of a sequence we should call its __len__() method. This is less efficient than using macros Py_SIZE() or PyUnicode_GET_LENGTH(). And it is not always possible to pass a sequence. In multidimensional array there is no such sequence (see for example _testbuffer.ndarray). |
FWIW. Py_SIZE is used all over listobject.c. Are you saying that this could be improved? |
I'm saying that PySlice_GetIndicesEx2 can't just use Py_SIZE. |
Actually making slicing always working is easier than I expected. Maybe it is even easier than raise an error. PySlice_GetIndicesEx() is split on two functions. First convert slice attributes to Py_ssize_t, then scale them to appropriate range depending on the length. Here is a sample patch. |
I like this. Very nice. What I understand is that callers that access PySlice_GetIndicesEx via the header file (included with Python.h) will see the function as a macro. When the macro is expanded, the length expression will be evaluated after any __index__ calls. This approach requires that the length expression calculate the length from the sequence, rather than being a length computer before the call. I checked and all of our users in /Objects pass some form of seq.get_size(). This approach also requires that the function be accessed via .h rather than directly as the function in the .c file. If we go this way, should he PySlice_GetIndicesEx doc say something? I reviewed the two new functions and am satisfied a) that they correctly separate converting None and non-ints to ints from adjusting start and stop as ints according to length and b) that the effect of the change in logic for the latter is to stop making unnecessary checks that must fail. |
Nice! The one thing I would suggest double checking with this change is whether or not we have test cases covering ranges with lengths that don't fit into ssize_t. It's been years since I looked at that code, so I don't remember exactly how it currently works, but it does work (except for __len__, due to the signature of the C level length slot): >>> bigrange = range(int(-10e30), int(10e30))
>>> len(bigrange)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C ssize_t
>>> bigrange[:]
range(-9999999999999999635896294965248, 9999999999999999635896294965248)
>>> bigrange[0:-1]
range(-9999999999999999635896294965248, 9999999999999999635896294965247)
>>> bigrange[::2]
range(-9999999999999999635896294965248, 9999999999999999635896294965248, 2)
>>> bigrange[0:-1:2]
range(-9999999999999999635896294965248, 9999999999999999635896294965247, 2) |
Yet one possible solution is to make slice constructor converting arguments to exact ints. This allows to leave user code unchanged. But this is 3.6-only solution of course. I would like to know Mark's thoughts on this. |
As in, for arguments that have __index__() methods, do the conversion to a true Python integer eagerly when the slice is built rather than lazily when slice.indices() (or the C-level equivalent) is called? That actually seems like a potentially plausible future approach to me, but isn't a change I'd want to make hastily - those values are visible as the start, stop and step attributes on the slice, and https://docs.python.org/3/reference/datamodel.html#types currently describes those as "These attributes can have any type." Given that folks do a lot of arcane things with the subscript notation, I wouldn't want to break working code if we have less intrusive alternatives. |
Then there is a design question. I believe that after all we should expose these two new functions publicly. And the question is about function names and the order of arguments. Currently signatures are: int _PySlice_Unpack(PyObject *r, Py_ssize_t *start, Py_ssize_t *stop, Py_ssize_t *step);
int _PySlice_EvalIndices(Py_ssize_t *start, Py_ssize_t *stop, Py_ssize_t step, Py_ssize_t length, Py_ssize_t *slicelength); Are there suggestions for names? Perhaps the second functions should not have prefix PySlice_, since it doesn't work with slice object. |
I think those names (with the leading underscore removed) would be fine as a public API - the fact that PySlice_EvalIndices doesn't take a reference to the slice object seems similar to a static method, where the prefix is there for namespacing reasons, rather than because it actually operates on a slice instance. |
Renamed _PySlice_EvalIndices() to _PySlice_AdjustIndices() and changed its signature. Updated the documentation and python3.def. Fixed yet one bug: implementation-defined behavior with division by negative step. Note that since new functions are used in public macro, they become a part of the stable API. Shouldn't starting underscores be removed from names? An attempt of discussing names and signatures on Python-Dev: https://mail.python.org/pipermail/python-dev/2016-December/147048.html. |
We can't just add API functions in maintained releases, because it will break the stable ABI. We can use them only when explicitly define the version of API. Proposed patch for 3.6 and 3.7 adds public API functions PySlice_Unpack() and PySlice_AdjustIndices() and makes PySlice_GetIndicesEx() a macro if set Py_LIMITED_API to the version that supports new API. Otherwise PySlice_GetIndicesEx() becomes deprecated. This doesn't break extensions compiled with older Python versions. Extensions compiled with new Python versions without limited API or with high API version are not compatible with older Python versions as expected, but have fixed the original issue. Compiling extensions with new Python versions with set low Py_LIMITED_API value will produce a deprecation warning. Pay attention to names and signatures of new API. It would be hard to change it when it added. I think this is the safest way. In 2.7 we should replace PySlice_GetIndicesEx() with a macro for internal use only if we want to fix an issue for builtins and preserve a binary compatibility. |
New changeset d5590f357d74 by Serhiy Storchaka in branch '2.7': New changeset 96f5327f7253 by Serhiy Storchaka in branch '3.5': New changeset b4457fe7fdb8 by Serhiy Storchaka in branch '3.6': New changeset 6093ce8eed6c by Serhiy Storchaka in branch 'default': |
Not a big deal, but the change produces compiler warnings with GCC 6.1.1: /home/proj/python/cpython/Objects/bytesobject.c: In function ‘bytes_subscript’:
/home/proj/python/cpython/Objects/bytesobject.c:1701:13: warning: ‘slicelength’ may be used uninitialized in this function [-Wmaybe-uninitialized]
for (cur = start, i = 0; i < slicelength;
^~~
/home/proj/python/cpython/Objects/listobject.c: In function ‘list_ass_subscript’:
/home/proj/python/cpython/Objects/listobject.c:2602:13: warning: ‘slicelength’ may be used uninitialized in this function [-Wmaybe-uninitialized]
for (i = 0; i < slicelength; i++) {
^~~
/home/proj/python/cpython/Objects/unicodeobject.c: In function ‘unicode_subscript’:
/home/proj/python/cpython/Objects/unicodeobject.c:14013:16: warning: ‘slicelength’ may be used uninitialized in this function [-Wmaybe-uninitialized]
result = PyUnicode_New(slicelength, max_char);
~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/media/disk/home/proj/python/cpython/Modules/_elementtree.c: In function ‘element_ass_subscr’:
/media/disk/home/proj/python/cpython/Modules/_elementtree.c:1896:50: warning: ‘slicelen’ may be used uninitialized in this function [-Wmaybe-uninitialized]
self->extra->children[i + newlen - slicelen] = self->extra->children[i];
~~~~~~~~~~~^~~~~~~~~~
/media/disk/home/proj/python/cpython/Modules/_ctypes/_ctypes.c: In function ‘Array_subscript’:
/media/disk/home/proj/python/cpython/Modules/_ctypes/_ctypes.c:4327:16: warning: ‘slicelen’ may be used uninitialized in this function [-Wmaybe-uninitialized]
np = PyUnicode_FromWideChar(dest, slicelen);
~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ My build used to be free of warnings. This warning is enabled via -Wall. The reason is probably that the new macro skips the slicelength assignment if PySlice_Unpack() fails. Workarounds could be to assign or initialize slicelength to zero (at the call sites or inside the macro), or to compile with -Wno-maybe-uninitialized. |
Good point Martin. I missed this because warnings are not emitted in non-debug build and were emitted only once in incremental debug build. Your idea about initializing slicelength in a macro LGTM. |
New changeset d7b637af5a7e by Serhiy Storchaka in branch '3.5': New changeset 17d0cfc64a32 by Serhiy Storchaka in branch '2.7': New changeset b8fc4de84b9a by Serhiy Storchaka in branch '3.6': New changeset af8315720e67 by Serhiy Storchaka in branch 'default': |
New changeset 110ec861e5ea by Serhiy Storchaka in branch '2.7': |
New changeset 745dda46d2e3e27206bb33188c770e1f6c73766e by Serhiy Storchaka in branch '2.7': New changeset e9d77e9fce477b5589c7eb5e1b4179b1d8e1fecc by Serhiy Storchaka in branch '2.7': |
New changeset faa1891d4d1237d6df0af4622ff520ccd6768e04 by Serhiy Storchaka in branch 'master': New changeset 8bd58e9c725a15854a99d19daf935fb08df77a05 by Serhiy Storchaka in branch 'master': New changeset 65febbec9d09101f76a04efeef6b3dc7f9b06ee8 by Serhiy Storchaka in branch 'master': |
New changeset faa1891d4d1237d6df0af4622ff520ccd6768e04 by Serhiy Storchaka in branch '3.6': New changeset 8bd58e9c725a15854a99d19daf935fb08df77a05 by Serhiy Storchaka in branch '3.6': |
This issue is left open because it needs to add a porting guide in What's New. See also a problem with breaking ABI in bpo-29943. |
If don't make PySlice_GetIndicesEx a macro when Py_LIMITED_API is not defined, it should be expanded to PySlice_Unpack and PySlice_AdjustIndices. PR 1023 does this for master branch. The patch is generated by Coccinelle's semantic patch. |
New changeset b879fe8 by Serhiy Storchaka in branch 'master': New changeset c26b19d by Serhiy Storchaka in branch '3.6': New changeset fa25f16 by Serhiy Storchaka in branch '3.5': |
PR 1973 adds a porting guide. This should be the last commit for this issue. Please make a review and suggest better wording. |
Could anyone please make review of the documentation? |
@serhiy.storchaka: review done. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: