Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regular expression clobbering in the lexer #170

Merged
merged 1 commit into from Sep 10, 2023

Conversation

mpinjr
Copy link
Contributor

@mpinjr mpinjr commented Feb 7, 2023

It is a mistake for the lexer not to make a copy of regexp literals, because the parser may need to shift more than one of them onto its stack as it searches for a matching rule.

For the following demonstration, the correct output is "1b":

    $ echo ab | ./master 'sub(/a/, "b" ~ /b/)'
    a1

The debugging output confirms that /b/ clobbered /a/, incorrectly modifying sub's first argument:

    $ echo ab | ./master -d 'sub(/a/, "b" ~ /b/)' | grep reparse
    reparse <b>
    reparse <b>

One of those should have been "reparse <a>".

I introduced this bug in a fix [1] for memory leaks discovered by Todd C. Miller [2]. This commit reverts my fix and applies a slightly modified version of Todd's recommendation (I omit the INDEX case because reg_expr is stored in a node).

[1] 577a67c
[2] #156

Testing with valgrind --leak-check=full $awk "$prog" </dev/null, the leaks produce something like this:

==473055== 4 bytes in 1 blocks are definitely lost in loss record 36 of 128
==473055== at 0x484586F: malloc (vg_replace_malloc.c:381)
==473055== by 0x49E54FD: strdup (strdup.c:42)
==473055== by 0x409288: tostring (tran.c:522)
==473055== by 0x411DF5: regexpr (lex.c:557)
==473055== by 0x402E99: yyparse (awkgram.tab.c:2251)
==473055== by 0x402877: main (main.c:211)

Where $prog is any one of these one-liners:

'/abc/; /cde/'
'"abc" ~ /cde/; /fgh/'
'match(/abc/, /cde/)'
'split(/abc/, a, /cde/)'
'sub(/abc/, /cde/)'
'gsub(/abc/, /cde/)'
'sub(/abc/, /cde/, a)'
'gsub(/abc/, /cde/, a)'

It is a mistake for the lexer not to make a copy of regexp literals,
because the parser may need to shift more than one of them onto its
stack as it searches for a matching rule.

For the following demonstration, the correct output is "1b":

        $ echo ab | ./master 'sub(/a/, "b" ~ /b/)'
        a1

The debugging output confirms that /b/ clobbered /a/, incorrectly
modifying sub's first argument:

        $ echo ab | ./master -d 'sub(/a/, "b" ~ /b/)' | grep reparse
        reparse <b>
        reparse <b>

One of those should have been "reparse <a>".

I introduced this bug in a fix [1] for memory leaks discovered by
Todd C. Miller [2]. This commit reverts my fix and applies a slightly
modified version of Todd's recommendation (I omit the INDEX case
because reg_expr is stored in a node).

[1] 577a67c
[2] onetrueawk#156

Testing with `valgrind --leak-check=full $awk "$prog" </dev/null`, the
leaks produce something like this:

==473055== 4 bytes in 1 blocks are definitely lost in loss record 36 of 128
==473055==    at 0x484586F: malloc (vg_replace_malloc.c:381)
==473055==    by 0x49E54FD: strdup (strdup.c:42)
==473055==    by 0x409288: tostring (tran.c:522)
==473055==    by 0x411DF5: regexpr (lex.c:557)
==473055==    by 0x402E99: yyparse (awkgram.tab.c:2251)
==473055==    by 0x402877: main (main.c:211)

Where $prog is any one of these one-liners:

'/abc/; /cde/'
'"abc" ~ /cde/; /fgh/'
'match(/abc/, /cde/)'
'split(/abc/, a, /cde/)'
'sub(/abc/, /cde/)'
'gsub(/abc/, /cde/)'
'sub(/abc/, /cde/, a)'
'gsub(/abc/, /cde/, a)'
@mpinjr
Copy link
Contributor Author

mpinjr commented Feb 7, 2023

There are at least two more of these leaks to plug in the bsd branch:

$ git diff master..bsd-features awkgram.y | grep -A1 reg_expr
+	| GENSUB '(' reg_expr comma pattern comma pattern ')'
+		{ $$ = op5(GENSUB, NIL, (Node*)makedfa($3, 1), $5, $7, rectonode()); }
--
+	| GENSUB '(' reg_expr comma pattern comma pattern comma pattern ')'
+		{ $$ = op5(GENSUB, NIL, (Node*)makedfa($3, 1), $5, $7, $9); }

Hope everyone is well,
Miguel

@plan9
Copy link
Collaborator

plan9 commented Feb 8, 2023

thanks miguel, appreciated.

@plan9 plan9 merged commit ddfd88a into onetrueawk:staging Sep 10, 2023
@mpinjr mpinjr deleted the fix-re-clobbering branch January 12, 2024 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants