Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review right recursion rules in our parser #2884

Open
ojwb opened this issue Apr 18, 2024 · 4 comments
Open

Review right recursion rules in our parser #2884

ojwb opened this issue Apr 18, 2024 · 4 comments
Labels

Comments

@ojwb
Copy link
Member

ojwb commented Apr 18, 2024

#2876 made me look to see if there were other cases of right recursion and it is more widespread (and this is just the cases where it happens in the first rule):

$ grep --perl-regexp '^(\w+)\b.*\b\1\b\s*{' parser.y
tm_tail        : COMMA typemap_parm tm_tail {
templateparameterstail : COMMA templateparameter templateparameterstail {
cpp_members  : cpp_member cpp_members {
valptail       : COMMA valparm valptail {
callptail      : COMMA valexpr callptail {
pointer    : STAR type_qualifier pointer { 
idcolontail    : DCOLON idtemplatetemplate idcolontail {
idcolontailnt   : DCOLON identifier idcolontailnt {

This is something you're meant to avoid doing because "Right recursion uses up space on the Bison stack in proportion to the number of elements in the sequence, because all the elements must be shifted onto the stack before the rule can be applied even once."

Probably in some of the cases it's OK in practice because there are never going to be more than a handful of items which have to be stored on the stack before they can be reduced, e.g. it's unlikely we will have enough type qualifiers to cause a problem with stack depth:

type_qualifier [...] | type_qualifier_raw type_qualifier { [...]

We should review all of these instances and eliminate the ones where there could be deep recursion in real world cases, and ideally eliminate as many others as we easily can to. Unhelpfully I can't see a trivial way to find them all - bison doesn't seem to provide a way to warn about them, and they're not trivial to grep for since (like the above case) the rule can be split over multiple lines. At least some can be grepped for though, and if need be a scan by eye through the file should identify the rest fairly quickly.

@ojwb ojwb added the parser label Apr 18, 2024
@ojwb
Copy link
Member Author

ojwb commented Apr 19, 2024

I had a quick poke at the first example above, and it's pretty easy to avoid the right recursion there at least. It seems to be used to create a linked list of the parameters, and we can achieve that with a left recursive rule which tracks both the list so far and the current last entry, e.g. like so:

diff --git a/Source/CParse/parser.y b/Source/CParse/parser.y
index 81967125..34cc8b37 100644
--- a/Source/CParse/parser.y
+++ b/Source/CParse/parser.y
@@ -1652,6 +1652,10 @@ static String *add_qualifier_to_declarator(SwigType *type, SwigType *qualifier)
   ParmList     *pl;
   int           intvalue;
   Node         *node;
+  struct {
+    Parm       *parms;
+    Parm       *last;
+  } pbuilder;
 };
 
 /* Define special token END for end of input. */
@@ -1743,7 +1747,8 @@ static String *add_qualifier_to_declarator(SwigType *type, SwigType *qualifier)
 %type <pl>       parms rawparms varargs_parms ;
 %type <pl>       templateparameterstail;
 %type <p>        parm_no_dox parm valparm rawvalparms valparms valptail ;
-%type <p>        typemap_parm tm_list tm_tail ;
+%type <p>        typemap_parm tm_list ;
+%type <pbuilder> tm_list_builder ;
 %type <p>        templateparameter ;
 %type <type>     templcpptype cpptype classkey classkeyopt;
 %type <id>       access_specifier;
@@ -2781,17 +2786,19 @@ typemap_type   : kwargs {
                 }
                ;
 
-tm_list        : typemap_parm tm_tail {
-                 $$ = $typemap_parm;
-		 set_nextSibling($$,$tm_tail);
-		}
+tm_list        : tm_list_builder {
+	         $$ = $tm_list_builder.parms;
+	       }
                ;
 
-tm_tail        : COMMA typemap_parm tm_tail[in] {
-                 $$ = $typemap_parm;
-		 set_nextSibling($$,$in);
-                }
-               | %empty { $$ = 0;}
+tm_list_builder: typemap_parm {
+                 $$.parms = $$.last = $typemap_parm;
+	       }
+	       | tm_list_builder[in] COMMA typemap_parm {
+                 $$ = $in;
+		 set_nextSibling($$.last, $typemap_parm);
+		 $$.last = $typemap_parm;
+	       }
                ;
 
 typemap_parm   : type plain_declarator {

@ojwb
Copy link
Member Author

ojwb commented Apr 21, 2024

Testcase:

$ perl -e 'print "%clear t0";for (1...9999) { print ", t$_"; } print ";\n"' > test.i
$ ../preinst-swig -python -module x test.i

Current SWIG exits with code 0 but doesn't generate any output files. With the patch above it works.

Nobody is realistically going to work on 10,000 typemaps at once like this, and with e.g. 1,000 it seems to work, but I'm inclined towards applying fixes like the above anyway as they don't really seem to add extra complications and eliminate a use of a bison anti-pattern.

For this case at least, we could also perhaps build the list in reverse of the current order - I suspect nothing actually cares about the order, but it's hard to easily see all the places that consume the resulting linked list.

@ojwb
Copy link
Member Author

ojwb commented Apr 23, 2024

Another artificial testcase. this one for the type_qualifier right recursion:

$ perl -e 'print "const " x 10000; print "int i = 42;\n"' > test-type-qualifier.i
$ ../preinst-swig -python test-type-qualifier.i

SWIG quietly exits with status 0 and nothing generated.

With "only" 9990 const qualifiers SWIG parses OK (and then complains no module name is specified so it's getting past parsing).

@ojwb
Copy link
Member Author

ojwb commented Apr 23, 2024

Incidentally, for this case both GCC and clang warn (clang by default, GCC with e.g. -Wall):

warning: duplicate 'const' declaration specifier [-Wduplicate-decl-specifier]

I've also worked out why SWIG fails to report an error - we ought to check if yyparse() returns 2. I'll push a fix for that, which will at least improve the reporting of such situations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant