Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault when setting precedence on a rule with empty rhs #22

Closed
mingodad opened this issue Dec 1, 2021 · 8 comments
Closed

Segfault when setting precedence on a rule with empty rhs #22

mingodad opened this issue Dec 1, 2021 · 8 comments
Labels

Comments

@mingodad
Copy link
Contributor

mingodad commented Dec 1, 2021

Making comparisons of 'LALR(1)' grammars between parsers with a modified version of byacc and lemon to convert the grammars (see https://github.com/mingodad/lalr-parser-test) I converted the sqlite3 grammar to unicc with byacc -C sqlite3-parser.y and when running the resulting converted grammar we get segfault because there is a rule with an empty rhs and a precedence (postgresql-13-3 also have several of then), to facilitate see the problem I'm posting a dummy small grammar that also segfault due to the empty rhs and precedence, also with the converted postgresql-13.3 unicc takes a long time to process it (unicc -> 100s, byacc -> 1.2s)

//%start list


@UMINUS "-" ;
@error "error" ;

@LETTER         'A-Za-z_' 'A-Za-z0-9_'*;
@number            '0-9'+;

whitespace          :       ' \r\n\t'+
                    |       "//" !'\n'* '\n'
                    ;

#left '|' ;
#left '&' ;
#left '+' '-' ;
#left '*' '/' '%' ;
#left @UMINUS  ; /* supplies precedence for unary minus */

//%% /* beginning of rules section */

list$  :  /* empty */
      |  list stat '\n'
      |  list @error '\n'
      ;

stat  :  expr
      |  @LETTER '=' expr
      ;

expr  :  '(' expr ')'
      |  expr '+' expr
      |  expr '-' expr
      |  expr '*' expr
      |  expr '/' expr
      |  expr '%' expr
      |  expr '&' expr
      |  expr '|' expr
      |  '-' expr #precedence @UMINUS
      |  @LETTER
      |  @number
      | /*empty*/ #precedence '&'
      ;
@mingodad
Copy link
Contributor Author

mingodad commented Dec 1, 2021

Here I¡m attaching the postgresql-13.3 and sqlite3 converted grammars with the rules with empty rhs and precedence where the precedence were commented out to be able to process with unicc,

Just converted:

on_opt : @ON /*13N*/ expr #precedence @ON /*13N*/ ;
on_opt : /*empty*/ #precedence @OR /*1L*/ ;

Commented to allow process with unicc:

on_opt : @ON /*13N*/ expr #precedence @ON /*13N*/ ;
on_opt : /*empty*/ ; //#precedence @OR /*1L*/ ;

grammars.zip

@mingodad mingodad changed the title Segfault when setting precedence on a rule with empty lhs Segfault when setting precedence on a rule with empty rhs Dec 1, 2021
@mingodad
Copy link
Contributor Author

mingodad commented Dec 1, 2021

On my previous messages where I wrote "empty lhs" in reality it's "empty rhs".

@mingodad
Copy link
Contributor Author

mingodad commented Dec 2, 2021

After reading almost all manual I've changed the conversion from yacc to unicc grammar to output quoted strings for terminals instead of @regex and also add #!mode insensitive; but even then the time spent on postgresql converted grammar is still high (100s).

@mingodad
Copy link
Contributor Author

mingodad commented Dec 2, 2021

It segfaults here:

prod_directives<int>: 	'#' "precedence" terminal

						[*	current_prod->prec = @terminal->prec; *]

					;

Where current_prod is NULL, also it doesn't seem to be a good choice here:

production<PROD*>	:	line_number rhs_opt:rhs code_opt_dup:act
							ast_node
							prod_directives* //<<< Wouldn't prod_directives? be better

Do we accept multiple prod_directives ?

@mingodad
Copy link
Contributor Author

mingodad commented Dec 2, 2021

This seems to fix the segfault problem:

@@ -712,7 +712,7 @@ production<PROD*>   :       line_number rhs_opt:rhs code_opt_dup:act
 
 rhs_opt<PROD*>         :       rhs
 
-                                       |       [*      @@ = create_production( parser,
+                                       |       [*      @@ = current_prod = create_production( parser,
                                                                (SYMBOL*)NULL );
                                                *]
                                        ;

@mingodad
Copy link
Contributor Author

mingodad commented Dec 2, 2021

With the previous fix now we can process the converted grammar for postgresql-13.3 and cql (https://github.com/facebookincubator/CG-SQL/blob/main/sources/cql.y) without any warning (see atthached).
sql-grammars.zip

@phorward
Copy link
Owner

phorward commented Dec 4, 2021

Hello @mingodad, thank you very much for your diving into this issue and finding a fix for the problem.

Indeed, the production-directives could be parsed by changing the grammar to accept just ? one of them, instead of an optional list *, but this won't work anymore in case a unicc grammar should accept more than one directive for productions.

Unfortunately, I don't spend much time on unicc and its abandoned successor unicc2 currently, as my focus moved to other projects. Anyway, I think your pull request is helpful and makes unicc a little bit more stable. I still like the idea of having one grammar and turning it into several target languages, optionally without change. Maybe this feature could be part of another, more actively developed program, for example Lark.

@phorward
Copy link
Owner

Segfault fixed by #24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants