Skip to content

re: _validate_inner() function small bug #140979

@wjssz

Description

@wjssz

Bug description:

This line:

if (arg > 2 * (size_t)groups + 1) {

It should be: if (arg >= 2 * (size_t)groups) {

Before the change, no exception raised for below code.
After the change, it raises a RuntimeError: invalid SRE code as expected.

import re
from re._compiler import _sre

# (groups = 0) means no capturing groups
def new_compile(pattern, flags, code, groups, groupindex, indexgroup):
    return old_compile(pattern, flags, code, 0, groupindex, indexgroup)

old_compile = _sre.compile
_sre.compile = new_compile

re.compile(r'(a)')

Welcome newcomer to submit a PR.
This is an internal function, it's not visible to users, so no need a news entry.

Patch:

 Lib/test/test_re.py | 19 +++++++++++++++++++
 Modules/_sre/sre.c  |  2 +-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/Lib/test/test_re.py b/Lib/test/test_re.py
index 5fc95087f2b..260e776e244 100644
--- a/Lib/test/test_re.py
+++ b/Lib/test/test_re.py
@@ -2666,6 +2666,25 @@ def test_regression_gh94675(self):
                 p.terminate()
                 p.join()
 
+    def test_bug_gh140979(self):
+        # _validate_inner() function incorrectly validated MARK range
+        from unittest.mock import patch
+        from re._compiler import _sre
+        original_compile = _sre.compile
+
+        def wrapper(*args):
+            lst = list(args)
+            lst[3] -= 1  # (groups-1) triggers the bug
+            return original_compile(*tuple(lst))
+
+        msg = re.compile("invalid SRE code")
+        with patch("_sre.compile") as mock_compile:
+            mock_compile.side_effect = wrapper
+            with self.assertRaisesRegex(RuntimeError, msg):
+                re.compile("(a)")
+            with self.assertRaisesRegex(RuntimeError, msg):
+                re.compile("(a)(b)")
+
     def test_fail(self):
         self.assertEqual(re.search(r'12(?!)|3', '123')[0], '3')
 
diff --git a/Modules/_sre/sre.c b/Modules/_sre/sre.c
index fdf00e6499c..4e97101b699 100644
--- a/Modules/_sre/sre.c
+++ b/Modules/_sre/sre.c
@@ -1946,7 +1946,7 @@ _validate_inner(SRE_CODE *code, SRE_CODE *end, Py_ssize_t groups)
                sre_match() code is robust even if they don't, and the worst
                you can get is nonsensical match results. */
             GET_ARG;
-            if (arg > 2 * (size_t)groups + 1) {
+            if (arg >= 2 * (size_t)groups) {
                 VTRACE(("arg=%d, groups=%d\n", (int)arg, (int)groups));
                 FAIL;
             }

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    type-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions