Hello! I'm trying to use re2c to improve the lexer in unifdef. I
really like the way re2c works, but I have run into a problem, and I'm
not sure if I have made a mistake or misunderstood something or if it
is a bug in re2c.
When I run the program below, the lexer generated by re2c calls
YYRESTORE() without previously calling YYBACKUP(). I thought that
this should not happen, because all the examples in the documentation
start the lexer with YYMARKER uninitialized. I cautiously initialize
YYMARKER to NULL, which means that after a mismatched YYRESTORE()
the lexer tries to dereference a NULL pointer.
The same thing happens with both re2c 1.2 and 1.3.
I have tried this program with and without --input custom and it
seems to behave the same in both cases. However, I'm not confident
that it is compiled correcctly when it is built without --input custom because in that case I can't protect against undefined
behaviour from dereferencing NULL.
I tried to initialize YYMARKER to the same as YYCURSOR instead of NULL,
and that causes the complete version of my lexer to go into a loop.
I found this problem by testing my rather large lexer with LLVM
libfuzzer. I have tried to find something close to a minimal example
that reproduces the failure mode.
Here is the source code (called 2020-05-30.re):
#include <stdio.h>
#define YYCTYPE char
#define YYPEEK() (YYCURSOR == NULL ? 0 : *YYCURSOR)
#define YYSKIP() ++YYCURSOR
#define YYLESSTHAN(n) (YYDEBUG(9999,0), YYLIMIT - YYCURSOR < n)
#define YYBACKUP() (YYDEBUG(8888,0), YYMARKER = YYCURSOR)
#define YYRESTORE() (YYDEBUG(7777,0), YYCURSOR = YYMARKER)
#define myreturn() return(YYDEBUG(6666,0), YYCURSOR == 0)
#define V const void *
#define YYDEBUG(st, ch) \
printf("re2c base %p cursor %p limit %p " \
"accept %d state %d char %c \n", \
(V)ptr, (V)YYCURSOR, (V)YYLIMIT, \
yyaccept, st, ch)
#define yyaccept 0
int main(void) {
const char *ptr = "/*";
const char *YYCURSOR = ptr;
const char *YYLIMIT = ptr + 2;
const char *YYMARKER = NULL;
/*!re2c re2c:flags:input = custom;
re2c:yyfill:enable = 0;
re2c:eof = 0;
$ { myreturn(); }
* { myreturn(); }
"%:%:" { myreturn(); }
"/*"([^*]*[*]+[^/*])*[^*]*[*]+[/] { myreturn(); }
*/
}
And here is the script I use to build and run it:
#!/bin/sh -x
re2c -W -Werror --debug-output -o 2020-05-30.c 2020-05-30.re &&
cc -Wall -Wextra -Werror -O2 -g -o 2020-05-30 2020-05-30.c &&
./2020-05-30 ||
: FAILED
The output from the debug printf()s does not contain 8888 (YYBACKUP) before 7777 (YYRESTORE):
re2c base 0x10bc48f72 cursor 0x10bc48f72 limit 0x10bc48f74 accept 0 state 0 char /
re2c base 0x10bc48f72 cursor 0x10bc48f72 limit 0x10bc48f74 accept 0 state 5 char /
re2c base 0x10bc48f72 cursor 0x10bc48f73 limit 0x10bc48f74 accept 0 state 8 char *
re2c base 0x10bc48f72 cursor 0x10bc48f74 limit 0x10bc48f74 accept 0 state 9 char
re2c base 0x10bc48f72 cursor 0x10bc48f74 limit 0x10bc48f74 accept 0 state 9999 char
re2c base 0x10bc48f72 cursor 0x10bc48f74 limit 0x10bc48f74 accept 0 state 7 char
re2c base 0x10bc48f72 cursor 0x10bc48f74 limit 0x10bc48f74 accept 0 state 7777 char
re2c base 0x10bc48f72 cursor 0x0 limit 0x10bc48f74 accept 0 state 3 char
re2c base 0x10bc48f72 cursor 0x0 limit 0x10bc48f74 accept 0 state 6666 char
Hello! I'm trying to use re2c to improve the lexer in unifdef. I
really like the way re2c works, but I have run into a problem, and I'm
not sure if I have made a mistake or misunderstood something or if it
is a bug in re2c.
When I run the program below, the lexer generated by re2c calls
YYRESTORE()without previously callingYYBACKUP(). I thought thatthis should not happen, because all the examples in the documentation
start the lexer with
YYMARKERuninitialized. I cautiously initializeYYMARKERto NULL, which means that after a mismatchedYYRESTORE()the lexer tries to dereference a NULL pointer.
The same thing happens with both re2c 1.2 and 1.3.
I have tried this program with and without
--input customand itseems to behave the same in both cases. However, I'm not confident
that it is compiled correcctly when it is built without
--input custombecause in that case I can't protect against undefinedbehaviour from dereferencing NULL.
I tried to initialize
YYMARKERto the same asYYCURSORinstead of NULL,and that causes the complete version of my lexer to go into a loop.
I found this problem by testing my rather large lexer with LLVM
libfuzzer. I have tried to find something close to a minimal example
that reproduces the failure mode.
Here is the source code (called
2020-05-30.re):And here is the script I use to build and run it:
The output from the debug printf()s does not contain 8888 (YYBACKUP) before 7777 (YYRESTORE):