gh-150717: Avoid mark-array allocation for groupless regex patterns#150719
Conversation
state_init() always did PyMem_New(state->mark, groups*2), which for a pattern with no capturing groups is PyMem_Malloc(0) -- a real allocation (plus matching free) on every match/search/fullmatch call, for an array that is never read: groupless patterns emit no MARK opcodes and group 0's span is taken from state->start/ptr. Guard the allocation with `if (pattern->groups)`. state->mark stays NULL (set by the preceding memset), and both the error path and state_fini already PyMem_Free(NULL) safely.
6fdde54 to
28624be
Compare
|
Benchmark using Built base ( import pyperf
runner = pyperf.Runner()
BENCHES = [
(r"sub \s+ (collapse ws) [66 repos]", r"import re; p=re.compile(r'\s+'); s='the quick brown\tfox jumps'", "p.sub(' ', s)"),
(r"sub [-_.]+ (PEP 503 norm) [21 repos]", r"import re; p=re.compile(r'[-_.]+'); s='Foo_._Bar--Baz'", "p.sub('-', s)"),
(r"match ^\d+\.\d+\.\d+$ (version) [11]", r"import re; p=re.compile(r'^\d+\.\d+\.\d+$'); s='12.4.301'", "p.match(s)"),
(r"match ^[A-Za-z0-9_-]+$ (slug) [6]", r"import re; p=re.compile(r'^[a-zA-Z0-9_-]+$'); s='my-package_v2'", "p.match(s)"),
(r"search \d+ (number) [20]", r"import re; p=re.compile(r'\d+'); s='retry after 30 seconds'", "p.search(s)"),
(r"search \s (has ws) [23]", r"import re; p=re.compile(r'\s'); s='no_spaces_here_value'", "p.search(s)"),
(r"sub (?<!^)(?=[A-Z]) (camel split) [11]", r"import re; p=re.compile(r'(?<!^)(?=[A-Z])'); s='CamelCaseClassName'", "p.sub('_', s)"),
(r"match (\d+)\.(\d+)\.(\d+) (CONTROL, groups)", r"import re; p=re.compile(r'(\d+)\.(\d+)\.(\d+)'); s='12.4.301'", "p.match(s)"),
]
for name, setup, stmt in BENCHES:
runner.timeit(name=name, stmt=stmt, setup=setup)Results ( Group-less patterns run 1.03-1.17x faster (geometric mean 1.09x); the short match/search cases gain most, since the per-call allocation is a larger share of their cost. The capturing-group control was hidden as not significant, i.e. unchanged. |
Every
match,search, orfullmatchon a pattern with no capturing groups allocates capture-group bookkeeping, then frees it without ever reading it. Group-less patterns are common in validation and scanning code.This skips the allocation for patterns with no capturing groups. Patterns that capture stay untouched, and results are identical.
It helps validation and scanning in tight loops: checking record formats during an import with
re.match(r"\d{4}-\d{2}-\d{2}", value), scanning log lines withre.search(r"ERROR|WARN", line), or testing many small patterns per request.A
pyperfcomparison of base versus patched builds (script and full table in a comment below) uses the most widely used group-less patterns mined from the top-1000 PyPI packages. They run 1.03 to 1.17x faster, geometric mean 1.09x, with the shortmatch/searchcases gaining most; a capturing-group control is unchanged.Resolves #150717.