Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upMatching fancy Unicode regex against an ASCII string leaks memory #17140
Comments
This comment has been minimized.
This comment has been minimized.
From choroba@matfyz.czCreated by choroba@matfyz.czIf a regex contains a fancy Unicode character and the string being "a" =~ /\N{U+2129}/ while 1; # Don't forget to kill the script before it eats all the memory! Using an upgraded string doesn't leak at all: utf8::upgrade(my $x = 'a'); See https://www.perlmonks.org/?node_id=11105281 for the original report (with Ch. Perl Info
|
This comment has been minimized.
This comment has been minimized.
From @khwilliamsonOn Fri, 30 Aug 2019 04:52:16 -0700, choroba@matfyz.cz wrote:
What is happening here is that in re_intuit_start() at line 922 in regexec.c, it determines there is no possible match because you need the target string to be in UTF-8 to match the character in the pattern. But something is not returning memory when re_intuit_start returns failure. There are other instances of this failure return in re_intuit_start, and I suspect they leak as well. I'm thinking someone who knows about the regex memory allocation can answer this without much effort, so I'm deferring to someone like that to step forward |
This comment has been minimized.
This comment has been minimized.
The RT System itself - Status changed from 'new' to 'open' |
This comment has been minimized.
This comment has been minimized.
From @demerphqI can easily imagine that SV's constructed during compilation arent Yves On Fri, 30 Aug 2019 at 17:35, Karl Williamson via RT
-- |
This comment has been minimized.
This comment has been minimized.
From @tonycozOn Fri, 30 Aug 2019 08:35:45 -0700, khw wrote:
It was fairly simple, I ran: valgrind --leak-check=full --show-leak-kinds=all ./perl -Ilib -e '"a" =~ /\N{U+2129}/ for 1 .. 1000' 2>&1 | less The leak with 1000 entries: ==25945== 10,000 bytes in 1,000 blocks are still reachable in loss record 227 of 230 Fix attached. Tony |
This comment has been minimized.
This comment has been minimized.
From @tonycoz0001-perl-134390-don-t-leak-the-SV-we-just-created-on-an-.patchFrom 05a03c0da6f3694904885fa1629a6e35e75d2875 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Mon, 2 Sep 2019 15:35:36 +1000
Subject: (perl #134390) don't leak the SV we just created on an early return
---
regexec.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/regexec.c b/regexec.c
index c390bff72e..97ea458a20 100644
--- a/regexec.c
+++ b/regexec.c
@@ -10405,6 +10405,7 @@ S_to_byte_substr(pTHX_ regexp *prog)
&& !prog->substrs->data[i].substr) {
SV* sv = newSVsv(prog->substrs->data[i].utf8_substr);
if (! sv_utf8_downgrade(sv, TRUE)) {
+ SvREFCNT_dec_NN(sv);
return FALSE;
}
if (SvVALID(prog->substrs->data[i].utf8_substr)) {
--
2.11.0
|
This comment has been minimized.
This comment has been minimized.
From @tonycozOn Sun, 01 Sep 2019 22:38:31 -0700, tonyc wrote:
Applied as 05a03c0. Tony |
This comment has been minimized.
This comment has been minimized.
@tonycoz - Status changed from 'open' to 'pending release' |
Migrated from rt.perl.org#134390 (status was 'pending release')
Searchable as RT134390$