Fix regular expression clobbering in the lexer #170
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
It is a mistake for the lexer not to make a copy of regexp literals, because the parser may need to shift more than one of them onto its stack as it searches for a matching rule.
For the following demonstration, the correct output is "1b":
The debugging output confirms that /b/ clobbered /a/, incorrectly modifying sub's first argument:
One of those should have been "reparse <a>".
I introduced this bug in a fix [1] for memory leaks discovered by Todd C. Miller [2]. This commit reverts my fix and applies a slightly modified version of Todd's recommendation (I omit the INDEX case because reg_expr is stored in a node).
[1] 577a67c
[2] #156
Testing with
valgrind --leak-check=full $awk "$prog" </dev/null
, the leaks produce something like this:==473055== 4 bytes in 1 blocks are definitely lost in loss record 36 of 128
==473055== at 0x484586F: malloc (vg_replace_malloc.c:381)
==473055== by 0x49E54FD: strdup (strdup.c:42)
==473055== by 0x409288: tostring (tran.c:522)
==473055== by 0x411DF5: regexpr (lex.c:557)
==473055== by 0x402E99: yyparse (awkgram.tab.c:2251)
==473055== by 0x402877: main (main.c:211)
Where $prog is any one of these one-liners:
'/abc/; /cde/'
'"abc" ~ /cde/; /fgh/'
'match(/abc/, /cde/)'
'split(/abc/, a, /cde/)'
'sub(/abc/, /cde/)'
'gsub(/abc/, /cde/)'
'sub(/abc/, /cde/, a)'
'gsub(/abc/, /cde/, a)'