Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Off-by-one memory error in a string fastsearch since 3.11 #105235

Closed
bacher09 opened this issue Jun 2, 2023 · 3 comments
Closed

Off-by-one memory error in a string fastsearch since 3.11 #105235

bacher09 opened this issue Jun 2, 2023 · 3 comments
Assignees
Labels
3.11 only security fixes 3.12 bugs and security fixes 3.13 new features, bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@bacher09
Copy link

bacher09 commented Jun 2, 2023

Bug report

This bug happens in Objects/stringlib/fastsearch.h:589 during matching the last symbol. In some cases, it causes crashes, but it's a bit hard to reproduce since in order this to happen, the last symbol should be the last in this particular memory page and the next page should not be read accessible or have a different non-contiguous address with the previous one.

The simplest script that reproduces the bug for me is:

import mmap

def bug():
    with open("file.tmp", "wb") as f:
        # this is the smallest size that triggers bug for me
        f.write(bytes(8388608))

    with open("file.tmp", "rb") as f:
        with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as fm:
            with open("/proc/self/maps", "rt") as f:
                print(f.read())

            # this triggers bug
            res = fm.find(b"fo")

if __name__ == "__main__":
    bug()

But since the result of this script depends on a file system, kernel, and perhaps even a moon phase 😄 , here's a much more reliable way to reproduce it:

import mmap

def read_maps():
    with open("/proc/self/maps", "rt") as f:
        return f.read()

def bug():
    prev_map = frozenset(read_maps().split('\n'))
    new_map = None
    for i in range(0, 2049):
        # guard mmap
        with mmap.mmap(0, 4096 * (i + 1), flags=mmap.MAP_PRIVATE | mmap.MAP_ANONYMOUS, prot=0) as guard:
            with mmap.mmap(0, 8388608 + 4096 * i, flags=mmap.MAP_ANONYMOUS | mmap.MAP_PRIVATE, prot=mmap.PROT_READ) as fm:
                new_map = frozenset(read_maps().split('\n'))
                for diff in new_map.difference(prev_map):
                    print(diff)

                prev_map = new_map
                # this crashes
                fm.find(b"fo")
                print("---")

if __name__ == "__main__":
    bug()

This causes the bug across all Linux environments that I've tried. It uses a trick with inaccessible memory region to increase the chances of this bug happening and no files, to speed it up.
Here's some extra info from GDB:

Program received signal SIGSEGV, Segmentation fault.
0x000055555570ba81 in stringlib_default_find (s=0x7ffff6a00000 "", n=8388608, p=0x7ffff745a3e0 "fo", m=2, maxcount=-1, mode=1)
    at Objects/stringlib/fastsearch.h:589
589                 if (!STRINGLIB_BLOOM(mask, ss[i+1])) {
(gdb) pipe info proc mappings | grep -A 1 -B 1 file.tmp
      0x555555cb4000     0x555555d66000    0xb2000        0x0  rw-p   [heap]
      0x7ffff6a00000     0x7ffff7200000   0x800000        0x0  r--s   /home/slava/src/cpython/python_bug/file.tmp
      0x7ffff7400000     0x7ffff7600000   0x200000        0x0  rw-p   
(gdb) p &ss[i]
$1 = 0x7ffff71fffff ""
(gdb) p &ss[i + 1]
$2 = 0x7ffff7200000 <error: Cannot access memory at address 0x7ffff7200000>
(gdb) p i
$3 = 8388606
(gdb) p ss
$4 = 0x7ffff6a00001 ""
(gdb) p s
$5 = 0x7ffff6a00000 ""

Your environment

  • CPython 3.11.3
  • OS: Linux 6.1 (but it should be OS independent)

I've also tried a bit modified version of a script on OS X, and it crashes there as well.

cc @sweeneyde (since you are the author of d01dceb and 6ddb09f).

Linked PRs

@bacher09 bacher09 added the type-bug An unexpected behavior, bug, or error label Jun 2, 2023
@AlexWaygood AlexWaygood changed the title Of-by-one memory error in a string fastsearch since 3.11 Off-by-one memory error in a string fastsearch since 3.11 Jun 2, 2023
@bacher09
Copy link
Author

bacher09 commented Jun 2, 2023

Also wanted to add that this bug happens not only with mmap, it's very easy to see if you add a few extra assert statements into fastsearch.h.

@arhadthedev arhadthedev added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Jun 2, 2023
@bacher09
Copy link
Author

bacher09 commented Jun 2, 2023

The simplest fix would be just doing bounds checks, but this could eat away some performance improvements from a fastsearch and perhaps someone who is more familiar with the intricacies of this search algorithm would be able to come up with something better.

diff --git a/Objects/stringlib/fastsearch.h b/Objects/stringlib/fastsearch.h
index 7403d8a3f7..785407fbf6 100644
--- a/Objects/stringlib/fastsearch.h
+++ b/Objects/stringlib/fastsearch.h
@@ -577,7 +577,8 @@ STRINGLIB(default_find)(const STRINGLIB_CHAR* s, Py_ssize_t n,
                 continue;
             }
             /* miss: check if next character is part of pattern */
-            if (!STRINGLIB_BLOOM(mask, ss[i+1])) {
+            if ((i != w) && !STRINGLIB_BLOOM(mask, ss[i+1])) {
                 i = i + m;
             }
             else {
@@ -586,7 +587,8 @@ STRINGLIB(default_find)(const STRINGLIB_CHAR* s, Py_ssize_t n,
         }
         else {
             /* skip: check if next character is part of pattern */
-            if (!STRINGLIB_BLOOM(mask, ss[i+1])) {
+            if ((i != w) && !STRINGLIB_BLOOM(mask, ss[i+1])) {
                 i = i + m;
             }
         }
@@ -650,7 +652,8 @@ STRINGLIB(adaptive_find)(const STRINGLIB_CHAR* s, Py_ssize_t n,
                 }
             }
             /* miss: check if next character is part of pattern */
-            if (!STRINGLIB_BLOOM(mask, ss[i+1])) {
+            if ((i != w) && !STRINGLIB_BLOOM(mask, ss[i+1])) {
                 i = i + m;
             }
             else {
@@ -659,7 +662,8 @@ STRINGLIB(adaptive_find)(const STRINGLIB_CHAR* s, Py_ssize_t n,
         }
         else {
             /* skip: check if next character is part of pattern */
-            if (!STRINGLIB_BLOOM(mask, ss[i+1])) {
+            if ((i != w) && !STRINGLIB_BLOOM(mask, ss[i+1])) {
                 i = i + m;
             }
         }

@sweeneyde
Copy link
Member

Thanks for the report. We might be able to keep fastsearch.h the way it is (reading one past the end of the haystack, which is safe for null-terminated strings including python bytes objects), and instead alter the call for the mmap module to add a special-case to check for haystack[-len(needle):] == needle with one memcmp. I can look into this today.

@sweeneyde sweeneyde self-assigned this Jun 2, 2023
@AlexWaygood AlexWaygood added 3.11 only security fixes 3.12 bugs and security fixes 3.13 new features, bugs and security fixes labels Jun 3, 2023
sweeneyde added a commit that referenced this issue Jul 13, 2023
* Add a special case for s[-m:] == p in _PyBytes_Find

* Add tests for _PyBytes_Find

* Make sure that start <= end in mmap.find
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 13, 2023
…ythonGH-105252)

* Add a special case for s[-m:] == p in _PyBytes_Find

* Add tests for _PyBytes_Find

* Make sure that start <= end in mmap.find
(cherry picked from commit ab86426)

Co-authored-by: Dennis Sweeney <36520290+sweeneyde@users.noreply.github.com>
sweeneyde added a commit that referenced this issue Jul 15, 2023
…H-105252) (#106708)

gh-105235: Prevent reading outside buffer during mmap.find() (GH-105252)

* Add a special case for s[-m:] == p in _PyBytes_Find

* Add tests for _PyBytes_Find

* Make sure that start <= end in mmap.find
(cherry picked from commit ab86426)

Co-authored-by: Dennis Sweeney <36520290+sweeneyde@users.noreply.github.com>
sweeneyde added a commit that referenced this issue Jul 15, 2023
#106710)

[3.11] gh-105235: Prevent reading outside buffer during mmap.find() (GH-105252)

* Add a special case for s[-m:] == p in _PyBytes_Find

* Add tests for _PyBytes_Find

* Make sure that start <= end in mmap.find.
(cherry picked from commit ab86426)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes 3.12 bugs and security fixes 3.13 new features, bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants