Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

py/objarray: Fix use-after-free if extending a slice from itself. #14029

Merged
merged 2 commits into from
Apr 22, 2024

Conversation

projectgus
Copy link
Contributor

@projectgus projectgus commented Mar 5, 2024

Closes #13283 which reported this bug when assigning to a slice, also fixes a related bug when a bytearray is extended by itself (h/t @dpgeorge for spotting.) Unit tests added for both.

There is a related much more nefarious set of bugs when a bytearray or similar is targeted by a memoryview and then resized. For now these are not fixed, but have pushed a commit here to document this case as a difference from CPython.

Reproducing this bug and confirming the fix was done by running the unix port under valgrind with GC-aware extensions (PR at #14196).

Note in default configurations with GIL this bug exists but has no impact (the free buffer won't have new data written to it while the function is still executing, and is no longer referenced after the function returns).

@projectgus projectgus added the bug label Mar 5, 2024
Copy link

codecov bot commented Mar 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.39%. Comparing base (ce491ab) to head (6877987).

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #14029   +/-   ##
=======================================
  Coverage   98.39%   98.39%           
=======================================
  Files         161      161           
  Lines       21200    21204    +4     
=======================================
+ Hits        20860    20864    +4     
  Misses        340      340           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Mar 5, 2024

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:   +80 +0.010% standard
      stm32:   +40 +0.010% PYBV10
     mimxrt:   +40 +0.011% TEENSY40
        rp2:   +64 +0.019% RPI_PICO
       samd:   +40 +0.015% ADAFRUIT_ITSYBITSY_M4_EXPRESS

@projectgus projectgus changed the title py/objarray: Fix use-after-free if extending a slice. py/objarray: Fix use-after-free if extending a slice from itself. Mar 5, 2024
@projectgus projectgus marked this pull request as ready for review March 13, 2024 03:53
py/objarray.c Outdated Show resolved Hide resolved
# Growth of bytearray via slice extension from itself
b = bytearray(b"1234567")
for i in range(3):
b[-1:] = b
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there needs to be an additional test that extends a bytearray by a memoryview of itself, where that memoryview is offset by non-zero. That will test the case where src_items was realloc'd by the GC and src_offs is non-zero.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I'd missed there's no existing coverage for this.

Interestingly, this turns up a twist: CPython doesn't support resizing any buffer object which has an active memoryview into it. This code:

# Growth of bytearray via slice extension of a memoryview to itself
b = bytearray(b"1234567")
m = memoryview(b)[2:5]
for _ in range(3):
    b[-1:] = m
print(len(b), b)

Triggers:

Traceback (most recent call last):
  File "/home/gus/ry/george/micropython/tests/basics/bytearray_slice_assign.py", line 80, in <module>
    b[-1:] = m
    ~^^^^^
BufferError: Existing exports of data: object cannot be re-sized

Some explanation here.

You're no doubt more familiar with this than me, but AFAIK the related memory safety issue also exists in MicroPython - if there's a memoryview into a buffer object and an unrelated resize moves the buffer then the memoryview will point to invalid memory. My understanding is this a known (but undocumented?) limitation of memoryviews in Micropython.

In this particular case (buffer extended by memoryview to itself) this patch makes it memory safe, so perhaps I can make a separate test file for this with its own .exp? Plus add some documentation around the difference to CPython. What do you recommend?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there's a memoryview into a buffer object and an unrelated resize moves the buffer then the memoryview will point to invalid memory.

The memoryview will still point to "valid" memory, and importantly memory that won't be GC'd because the memoryview object retains a pointer to the head of the buffer.

In this particular case (buffer extended by memoryview to itself) this patch makes it memory safe, so perhaps I can make a separate test file for this with its own .exp? Plus add some documentation around the difference to CPython. What do you recommend?

Yes, that all sounds good: separate test with .exp, and cpydiff documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memoryview will still point to "valid" memory, and importantly memory that won't be GC'd because the memoryview object retains a pointer to the head of the buffer.

That's a pretty elegant solution, but I don't think this is how it works at the moment. As per the bug this PR is fixing, resizing calls m_renew which calls through to gc_realloc. If resizing in place fails, gc_realloc allocates a new buffer, copies into it, and then frees the old one with gc_free. So any remaining pointers into that buffer from other memoryviews will become pointers to freed blocks.

Maybe it should work in the way you describe, where the old buffer is left for the GC to pick up once it's no longer used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If resizing in place fails, gc_realloc allocates a new buffer, copies into it, and then frees the old one with gc_free. So any remaining pointers into that buffer from other memoryviews will become pointers to freed blocks.

Ahh, indeed! That's a problem. But that's kind of separate to this PR, so maybe we can think about that and fix that separately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you. If we were to change resizing to not explicitly free the old buffer then this PR wouldn't be necessary (no use after free would exist here, as the old buffer hadn't been freed yet).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to change resizing to not explicitly free the old buffer then this PR wouldn't be necessary

Yes, but I feel like that's quite a big change that doesn't benefit many cases. In most cases the code won't be resizing a memoryview and it's definitely beneficial (??) to explicitly free memory doing a realloc.

Also, are there other places in the code (probably confined to py/objarray.c) where we'd also need to be careful not to free memory that may be pointed to by a memoryview?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR to address the cases with bytearrays. The cases with memoryviews are much more complex, as we discussed offline. Have added a commit here to document the requirement of not resizing a bytearray that has a memoryview pointing to it, at least until we have a better fix.

@dpgeorge dpgeorge added this to the release-1.23.0 milestone Mar 25, 2024
@projectgus projectgus force-pushed the bugfix/realloc_array_subscr branch 3 times, most recently from 180783f to b0c00cc Compare March 27, 2024 06:27
Two cases, one assigning to a slice.
Closes micropython#13283

Second is extending a slice from itself, similar logic.

In both cases the problem occurs when m_renew causes realloc to move the
buffer, leaving a dangling pointer behind.

There are more complex and hard to fix cases when either argument is a
memoryview into the buffer, currently resizing to a new address breaks
memoryviews into that object.

Reproducing this bug and confirming the fix was done by running the unix
port under valgrind with GC-aware extensions.

Note in default configurations with GIL this bug exists but has no impact
(the free buffer won't be reused while the function is still executing, and
is no longer referenced after it returns).

Signed-off-by: Angus Gratton <angus@redyak.com.au>
This a stop-gap until there is a proper fix for this.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <angus@redyak.com.au>
@dpgeorge dpgeorge force-pushed the bugfix/realloc_array_subscr branch from b0c00cc to 6877987 Compare April 22, 2024 01:52
@dpgeorge dpgeorge merged commit 6877987 into micropython:master Apr 22, 2024
62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

use after free found at micropython/py/objarray.c:509 [micropython@a5bdd39127]
2 participants