Skip to content

mismatched YYBACKUP and YYRESTORE #284

@fanf2

Description

@fanf2

Hello! I'm trying to use re2c to improve the lexer in unifdef. I
really like the way re2c works, but I have run into a problem, and I'm
not sure if I have made a mistake or misunderstood something or if it
is a bug in re2c.

When I run the program below, the lexer generated by re2c calls
YYRESTORE() without previously calling YYBACKUP(). I thought that
this should not happen, because all the examples in the documentation
start the lexer with YYMARKER uninitialized. I cautiously initialize
YYMARKER to NULL, which means that after a mismatched YYRESTORE()
the lexer tries to dereference a NULL pointer.

The same thing happens with both re2c 1.2 and 1.3.

I have tried this program with and without --input custom and it
seems to behave the same in both cases. However, I'm not confident
that it is compiled correcctly when it is built without --input custom because in that case I can't protect against undefined
behaviour from dereferencing NULL.

I tried to initialize YYMARKER to the same as YYCURSOR instead of NULL,
and that causes the complete version of my lexer to go into a loop.

I found this problem by testing my rather large lexer with LLVM
libfuzzer. I have tried to find something close to a minimal example
that reproduces the failure mode.

Here is the source code (called 2020-05-30.re):

#include <stdio.h>

#define YYCTYPE		char
#define YYPEEK()	(YYCURSOR == NULL ? 0 : *YYCURSOR)
#define YYSKIP()	++YYCURSOR
#define YYLESSTHAN(n)	(YYDEBUG(9999,0), YYLIMIT - YYCURSOR < n)
#define YYBACKUP()	(YYDEBUG(8888,0), YYMARKER = YYCURSOR)
#define YYRESTORE()	(YYDEBUG(7777,0), YYCURSOR = YYMARKER)

#define myreturn()	return(YYDEBUG(6666,0), YYCURSOR == 0)

#define V const void *
#define YYDEBUG(st, ch)							\
	printf("re2c base %p cursor %p limit %p "			\
	       "accept %d state %d char %c \n",				\
	       (V)ptr, (V)YYCURSOR, (V)YYLIMIT,				\
	       yyaccept, st, ch)

#define yyaccept 0

int main(void) {
	const char *ptr = "/*";
	const char *YYCURSOR = ptr;
	const char *YYLIMIT  = ptr + 2;
	const char *YYMARKER = NULL;

/*!re2c	re2c:flags:input = custom;
        re2c:yyfill:enable = 0;
        re2c:eof = 0;

$	{ myreturn(); }

*	{ myreturn(); }

"%:%:"	{ myreturn(); }

"/*"([^*]*[*]+[^/*])*[^*]*[*]+[/]	{ myreturn(); }

*/
}

And here is the script I use to build and run it:

#!/bin/sh -x

re2c -W -Werror --debug-output -o 2020-05-30.c 2020-05-30.re &&
cc -Wall -Wextra -Werror -O2 -g -o 2020-05-30 2020-05-30.c &&
./2020-05-30 ||
: FAILED

The output from the debug printf()s does not contain 8888 (YYBACKUP) before 7777 (YYRESTORE):

re2c base 0x10bc48f72 cursor 0x10bc48f72 limit 0x10bc48f74 accept 0 state 0 char / 
re2c base 0x10bc48f72 cursor 0x10bc48f72 limit 0x10bc48f74 accept 0 state 5 char / 
re2c base 0x10bc48f72 cursor 0x10bc48f73 limit 0x10bc48f74 accept 0 state 8 char * 
re2c base 0x10bc48f72 cursor 0x10bc48f74 limit 0x10bc48f74 accept 0 state 9 char  
re2c base 0x10bc48f72 cursor 0x10bc48f74 limit 0x10bc48f74 accept 0 state 9999 char  
re2c base 0x10bc48f72 cursor 0x10bc48f74 limit 0x10bc48f74 accept 0 state 7 char  
re2c base 0x10bc48f72 cursor 0x10bc48f74 limit 0x10bc48f74 accept 0 state 7777 char  
re2c base 0x10bc48f72 cursor 0x0 limit 0x10bc48f74 accept 0 state 3 char  
re2c base 0x10bc48f72 cursor 0x0 limit 0x10bc48f74 accept 0 state 6666 char  

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions