-
-
Notifications
You must be signed in to change notification settings - Fork 7.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
py/objarray: Fix use-after-free if extending a slice from itself. #14029
py/objarray: Fix use-after-free if extending a slice from itself. #14029
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #14029 +/- ##
=======================================
Coverage 98.39% 98.39%
=======================================
Files 161 161
Lines 21200 21204 +4
=======================================
+ Hits 20860 20864 +4
Misses 340 340 ☔ View full report in Codecov by Sentry. |
Code size report:
|
7eb2a0c
to
18406c9
Compare
# Growth of bytearray via slice extension from itself | ||
b = bytearray(b"1234567") | ||
for i in range(3): | ||
b[-1:] = b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there needs to be an additional test that extends a bytearray by a memoryview of itself, where that memoryview is offset by non-zero. That will test the case where src_items
was realloc'd by the GC and src_offs
is non-zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I'd missed there's no existing coverage for this.
Interestingly, this turns up a twist: CPython doesn't support resizing any buffer object which has an active memoryview into it. This code:
# Growth of bytearray via slice extension of a memoryview to itself
b = bytearray(b"1234567")
m = memoryview(b)[2:5]
for _ in range(3):
b[-1:] = m
print(len(b), b)
Triggers:
Traceback (most recent call last):
File "/home/gus/ry/george/micropython/tests/basics/bytearray_slice_assign.py", line 80, in <module>
b[-1:] = m
~^^^^^
BufferError: Existing exports of data: object cannot be re-sized
You're no doubt more familiar with this than me, but AFAIK the related memory safety issue also exists in MicroPython - if there's a memoryview into a buffer object and an unrelated resize moves the buffer then the memoryview will point to invalid memory. My understanding is this a known (but undocumented?) limitation of memoryviews in Micropython.
In this particular case (buffer extended by memoryview to itself) this patch makes it memory safe, so perhaps I can make a separate test file for this with its own .exp? Plus add some documentation around the difference to CPython. What do you recommend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there's a memoryview into a buffer object and an unrelated resize moves the buffer then the memoryview will point to invalid memory.
The memoryview will still point to "valid" memory, and importantly memory that won't be GC'd because the memoryview object retains a pointer to the head of the buffer.
In this particular case (buffer extended by memoryview to itself) this patch makes it memory safe, so perhaps I can make a separate test file for this with its own .exp? Plus add some documentation around the difference to CPython. What do you recommend?
Yes, that all sounds good: separate test with .exp, and cpydiff documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The memoryview will still point to "valid" memory, and importantly memory that won't be GC'd because the memoryview object retains a pointer to the head of the buffer.
That's a pretty elegant solution, but I don't think this is how it works at the moment. As per the bug this PR is fixing, resizing calls m_renew
which calls through to gc_realloc
. If resizing in place fails, gc_realloc
allocates a new buffer, copies into it, and then frees the old one with gc_free
. So any remaining pointers into that buffer from other memoryviews will become pointers to freed blocks.
Maybe it should work in the way you describe, where the old buffer is left for the GC to pick up once it's no longer used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If resizing in place fails,
gc_realloc
allocates a new buffer, copies into it, and then frees the old one withgc_free
. So any remaining pointers into that buffer from other memoryviews will become pointers to freed blocks.
Ahh, indeed! That's a problem. But that's kind of separate to this PR, so maybe we can think about that and fix that separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to you. If we were to change resizing to not explicitly free the old buffer then this PR wouldn't be necessary (no use after free would exist here, as the old buffer hadn't been freed yet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were to change resizing to not explicitly free the old buffer then this PR wouldn't be necessary
Yes, but I feel like that's quite a big change that doesn't benefit many cases. In most cases the code won't be resizing a memoryview and it's definitely beneficial (??) to explicitly free memory doing a realloc.
Also, are there other places in the code (probably confined to py/objarray.c
) where we'd also need to be careful not to free memory that may be pointed to by a memoryview?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the PR to address the cases with bytearrays. The cases with memoryviews are much more complex, as we discussed offline. Have added a commit here to document the requirement of not resizing a bytearray that has a memoryview pointing to it, at least until we have a better fix.
180783f
to
b0c00cc
Compare
Two cases, one assigning to a slice. Closes micropython#13283 Second is extending a slice from itself, similar logic. In both cases the problem occurs when m_renew causes realloc to move the buffer, leaving a dangling pointer behind. There are more complex and hard to fix cases when either argument is a memoryview into the buffer, currently resizing to a new address breaks memoryviews into that object. Reproducing this bug and confirming the fix was done by running the unix port under valgrind with GC-aware extensions. Note in default configurations with GIL this bug exists but has no impact (the free buffer won't be reused while the function is still executing, and is no longer referenced after it returns). Signed-off-by: Angus Gratton <angus@redyak.com.au>
This a stop-gap until there is a proper fix for this. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>
b0c00cc
to
6877987
Compare
Closes #13283 which reported this bug when assigning to a slice, also fixes a related bug when a bytearray is extended by itself (h/t @dpgeorge for spotting.) Unit tests added for both.
There is a related much more nefarious set of bugs when a bytearray or similar is targeted by a memoryview and then resized. For now these are not fixed, but have pushed a commit here to document this case as a difference from CPython.
Reproducing this bug and confirming the fix was done by running the unix port under valgrind with GC-aware extensions (PR at #14196).
Note in default configurations with GIL this bug exists but has no impact (the free buffer won't have new data written to it while the function is still executing, and is no longer referenced after the function returns).