Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plug a regex memory leak. #156

Closed
wants to merge 1 commit into from
Closed

Conversation

millert
Copy link
Contributor

@millert millert commented Aug 26, 2022

The string returned by reg_expr was never freed after conversion to a dfa.

The string returned by reg_expr was never freed after conversion to a dfa.
@millert millert changed the base branch from master to staging August 26, 2022 20:27
@mpinjr
Copy link
Contributor

mpinjr commented Aug 26, 2022

Hi, millert:

What do you think of this alternative?

diff --git a/lex.c b/lex.c
index d7b2d03..c162a70 100644
--- a/lex.c
+++ b/lex.c
@@ -545,7 +545,7 @@ int regexpr(void)
        *bp = 0;
        if (c == 0)
                SYNTAX("non-terminated regular expression %.10s...", buf);
-       yylval.s = tostring(buf);
+       yylval.s = buf;
        unput('/');
        RET(REGEXPR);
 }

This tostring is the source of the leaked memory and it's unnecessary (makedfa->mkdfa will make a copy).

The last bit of your commit seems to regard an unrelated grammatical matter which my proposal does not address.

Take care,
Miguel

@millert
Copy link
Contributor Author

millert commented Aug 27, 2022

Yes, that looks better indeed. I didn't intend for that last hunk to be included, too many tree :-)

@millert
Copy link
Contributor Author

millert commented Aug 27, 2022

Closing in favor of the fix from @mpinjr

@millert millert closed this Aug 27, 2022
@millert millert mentioned this pull request Aug 27, 2022
mpinjr added a commit to mpinjr/awk that referenced this pull request Feb 7, 2023
It is a mistake for the lexer not to make a copy of regexp literals,
because the parser may need to shift more than one of them onto its
stack as it searches for a matching rule.

For the following demonstration, the correct output is "1b":

        $ echo ab | ./master 'sub(/a/, "b" ~ /b/)'
        a1

The debugging output confirms that /b/ clobbered /a/, incorrectly
modifying sub's first argument:

        $ echo ab | ./master -d 'sub(/a/, "b" ~ /b/)' | grep reparse
        reparse <b>
        reparse <b>

One of those should have been "reparse <a>".

I introduced this bug in a fix [1] for memory leaks discovered by
Todd C. Miller [2]. This commit reverts my fix and applies a slightly
modified version of Todd's recommendation (I omit the INDEX case
because reg_expr is stored in a node).

[1] 577a67c
[2] onetrueawk#156

Testing with `valgrind --leak-check=full $awk "$prog" </dev/null`, the
leaks produce something like this:

==473055== 4 bytes in 1 blocks are definitely lost in loss record 36 of 128
==473055==    at 0x484586F: malloc (vg_replace_malloc.c:381)
==473055==    by 0x49E54FD: strdup (strdup.c:42)
==473055==    by 0x409288: tostring (tran.c:522)
==473055==    by 0x411DF5: regexpr (lex.c:557)
==473055==    by 0x402E99: yyparse (awkgram.tab.c:2251)
==473055==    by 0x402877: main (main.c:211)

Where $prog is any one of these one-liners:

'/abc/; /cde/'
'"abc" ~ /cde/; /fgh/'
'match(/abc/, /cde/)'
'split(/abc/, a, /cde/)'
'sub(/abc/, /cde/)'
'gsub(/abc/, /cde/)'
'sub(/abc/, /cde/, a)'
'gsub(/abc/, /cde/, a)'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants