Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regular expression clobbering in the lexer #170

Merged
merged 1 commit into from
Sep 10, 2023

Commits on Feb 6, 2023

  1. Fix regular expression clobbering in the lexer

    It is a mistake for the lexer not to make a copy of regexp literals,
    because the parser may need to shift more than one of them onto its
    stack as it searches for a matching rule.
    
    For the following demonstration, the correct output is "1b":
    
            $ echo ab | ./master 'sub(/a/, "b" ~ /b/)'
            a1
    
    The debugging output confirms that /b/ clobbered /a/, incorrectly
    modifying sub's first argument:
    
            $ echo ab | ./master -d 'sub(/a/, "b" ~ /b/)' | grep reparse
            reparse <b>
            reparse <b>
    
    One of those should have been "reparse <a>".
    
    I introduced this bug in a fix [1] for memory leaks discovered by
    Todd C. Miller [2]. This commit reverts my fix and applies a slightly
    modified version of Todd's recommendation (I omit the INDEX case
    because reg_expr is stored in a node).
    
    [1] 577a67c
    [2] onetrueawk#156
    
    Testing with `valgrind --leak-check=full $awk "$prog" </dev/null`, the
    leaks produce something like this:
    
    ==473055== 4 bytes in 1 blocks are definitely lost in loss record 36 of 128
    ==473055==    at 0x484586F: malloc (vg_replace_malloc.c:381)
    ==473055==    by 0x49E54FD: strdup (strdup.c:42)
    ==473055==    by 0x409288: tostring (tran.c:522)
    ==473055==    by 0x411DF5: regexpr (lex.c:557)
    ==473055==    by 0x402E99: yyparse (awkgram.tab.c:2251)
    ==473055==    by 0x402877: main (main.c:211)
    
    Where $prog is any one of these one-liners:
    
    '/abc/; /cde/'
    '"abc" ~ /cde/; /fgh/'
    'match(/abc/, /cde/)'
    'split(/abc/, a, /cde/)'
    'sub(/abc/, /cde/)'
    'gsub(/abc/, /cde/)'
    'sub(/abc/, /cde/, a)'
    'gsub(/abc/, /cde/, a)'
    mpinjr committed Feb 6, 2023
    Configuration menu
    Copy the full SHA
    4236e5a View commit details
    Browse the repository at this point in the history