feature request: make optional rules mandatory #21

SKalt · 2021-08-28T13:32:29Z

I've got a grammar where some rules match the empty string. I'd like to transform those rules such that they no longer match the empty string and have all references to the rules made optional. Example:

// before
my_rule: foo opt_bar baz;
opt_bar: bar | ;

// after
my_rule: foo opt_bar? baz;
opt_bar: bar;

More difficult example:

// before
foo: bar | opt_baz | quux; // <- foo matches the empty string too
opt_baz: baz | ;

// after
foo: bar | opt_baz? | quux;
opt_baz: baz;

If you point me to where this feature should go, I'd be happy to take a stab at it.

SKalt · 2021-09-26T17:52:11Z

Update: I'm able to identify the rules which include | ; using

setup

cat <<EOF > ./temp.g4
grammar temp;
foo: bar | opt_baz | quux;
opt_baz: baz | ; // <-- I'd like to delete `| ;`
bar: 'bar';
baz: 'baz';
quux: 'quux';
EOF

#!/bin/bash
xpath='//parserRuleSpec//alternative[not(element)]'
trparse ./temp.g4 | trxgrep "$xpath" | trxml
# Result size 1
# <alternative>
# </alternative>

and the names of rules which have an alternate branch | ; by

xpath='//parserRuleSpec//alternative[not(element)]/ancestor::parserRuleSpec/RULE_REF'
trparse ./temp.g4 | trxgrep "$xpath" | trxml
# Result size 1
# opt_baz

EDIT: I'm working with the following bash function:

# an xpath query to seek and destroy `| ;` tokens that weren't grouped into
# populated `<alternative>` nodes
xpath_del="//parserRuleSpec[RULE_REF/text() = '<RULE_NAME>']//alternative[not(element)]/../../*[text() = '|']"
# an XPath query to find all relevant references to a rule 
xpath_rename="//parserRuleSpec//ruleBlock//RULE_REF[text()='<RULE_NAME>']"

make_mandatory() {
    local rule_name="${1:?rule name required}"
    # read IR json from stdin
    trdelete "${xpath_del//<RULE_NAME>/$rule_name}" |                     # delete the offending `| ;`
        trreplace "${xpath_rename//<RULE_NAME>/$rule_name}" "$rule_name?" # rename references to `rule_name` to `rule_name?`
}

# which works as follows:
trparse ./temp.g4 | make_mandatory "opt_baz" | trprint
# grammar temp;
# foo: bar | opt_baz? | quux;
# opt_baz: baz ;
# bar: 'bar';
# baz: 'baz';
# quux: 'quux';

kaby76 · 2021-09-26T18:37:21Z

I think one way to do this would be to (1) find a LHS symbol of a rule that contains an empty top-level alt. (2) Construct an xpath expression that finds all uses of any of the symbol names found in step 1. Then, use trinsert to add a "?" after the symbol name. I don't think I have an "insert after" option for trinsert, so this would need to be added. (3) After you made the inserts, go back to find all the rules with empty top-level alts, and use trdelete to nuke those empty alts.

So, there would be a little bit of work fixing trinsert. I already added some function to the "TrashBase" to have InsertBefore() and InsertAfter(). I'm not sure what the best xpath expression would be for step (1).

kaby76 · 2021-09-27T23:49:52Z

Hi @SKalt ,

I finally figured out how to do this, and wrote this Bash script to do it.

names=`trparse temp.g4 | trxgrep ' //parserRuleSpec[*//alternative[count(*)=0]]/RULE_REF' | trtext`
cond=`echo "$names" | tr '\n' ' ' | tr -d '\r' | sed "s:\([a-zA-Z_][a-zA-Z_]*\):or text()='\1':g" | sed "s:^or ::"`
trparse temp.g4 | trinsert -a " //ruleBlock//RULE_REF[$cond]" '?' | trsponge -o out -c

In order to follow along at what this does, you have to really look at the parse tree (trparse temp.g4 | trtree).

( grammarSpec
  ( grammarDecl
    ( grammarType
      ( GRAMMAR i=0 txt=grammar tt=19 DEFAULT_TOKEN_CHANNEL
    ) ) 
    ( identifier
      ( RULE_REF i=8 txt=temp tt=2 DEFAULT_TOKEN_CHANNEL
    ) ) 
    ( SEMI i=12 txt=; tt=32 DEFAULT_TOKEN_CHANNEL
  ) ) 
  ( rules
    ( ruleSpec
      ( parserRuleSpec
	( RULE_REF i=15 txt=my_rule tt=2 DEFAULT_TOKEN_CHANNEL
	) 
	( COLON i=22 txt=: tt=29 DEFAULT_TOKEN_CHANNEL
	) 
	( ruleBlock
	  ( ruleAltList
	    ( labeledAlt
	      ( alternative
		( element
		  ( atom
		    ( ruleref
		      ( RULE_REF i=24 txt=foo tt=2 DEFAULT_TOKEN_CHANNEL
		) ) ) ) 
		( element
		  ( atom
		    ( ruleref
		      ( RULE_REF i=28 txt=opt_bar tt=2 DEFAULT_TOKEN_CHANNEL
		) ) ) ) 
		( element
		  ( atom
		    ( ruleref
		      ( RULE_REF i=36 txt=baz tt=2 DEFAULT_TOKEN_CHANNEL
	) ) ) ) ) ) ) ) 
	( SEMI i=39 txt=; tt=32 DEFAULT_TOKEN_CHANNEL
	) 
	( exceptionGroup
    ) ) ) 
    ( ruleSpec
      ( parserRuleSpec
	( RULE_REF i=42 txt=opt_bar tt=2 DEFAULT_TOKEN_CHANNEL
	) 
	( COLON i=49 txt=: tt=29 DEFAULT_TOKEN_CHANNEL
	) 
	( ruleBlock
	  ( ruleAltList
	    ( labeledAlt
	      ( alternative
		( element
		  ( atom
		    ( ruleref
		      ( RULE_REF i=51 txt=bar tt=2 DEFAULT_TOKEN_CHANNEL
	    ) ) ) ) ) ) 
	    ( OR i=55 txt=| tt=45 DEFAULT_TOKEN_CHANNEL
	    ) 
	    ( labeledAlt
	      ( alternative
	) ) ) ) 
	( SEMI i=57 txt=; tt=32 DEFAULT_TOKEN_CHANNEL
	) 
	( exceptionGroup
  ) ) ) ) 
  ( EOF i=62 txt= tt=-1 DEFAULT_TOKEN_CHANNEL
) )

The first line of this script parses the grammar then finds the names of rules that have an alternative that is empty. One of the problems was that I forgot what // means. Unless it is immediately preceeded with a name as in parserRuleSpec//alternative, or a symbol as in *//alternative, the step always starts back at the root of the tree. So, //parserRuleSpec[//alternative] would find all parser rules irrelevant of whether it contained a descendant of node type alternative, because the test [//alternative] would always be true--there are plent of rules in the entire grammar with alternative. But, //parserRuleSpec[*//alternative] would find all parser rules with an alternative descendant. We want to actually limit this further with the test [count(*)=0], which filters the alternative nodes that have no children.

The second line is fairly grungy because I don't know of a great way in Bash to take a string of rule names separated by spaces, and turn it into an xpath test. E.g., in the grammar cond would become text()='opt_bar'.

The last line finds all occurrences of the symbol opt_bar on the right hand side of a rule (because it is descended from a ruleBlock). and inserts the string ? after the symbol in the rule. The parse tree is serialized into a string and reparsed by trinsert itself. The output is piped to trsponge, which then outputs the file to the directory out/.

To get this to work, I had to modify trinsert. I noticed that some of the tools were crashing because dependent assemblies where not loading, so I checked and fixed all the tools, and have a new release tomorrow (v0.11.3).

This is a wonderful example of what the tool chain can do. Thanks so much for bringing it up.

SKalt · 2021-09-28T01:27:08Z

Nice use of tr '\n' $replacement and tr -d; I didn't know think of that.

The second line is fairly grungy

Do you think it's worth writing this transform out in C# rather than bash?

In the meantime I found a new, interesting variation of this problem: how to

take a rule that has two alternatives, one of which is the empty string, and
replace all references to the rule with an optional reference to the populated arm.

So

grammar temp
foo: bar opt_baz quux;
opt_baz: bar | ;
bar: 'bar';
quux: 'quux';

would be transformed like

+foo: bar baz? quux;
-foo: bar opt_baz quux;
-opt_baz: bar | ;

I solved that with

# an xpath to get the text of the non-optional branch of a trivially optional rule
populated="//parserRuleSpec[RULE_REF/text()='<RULE_NAME>']" # find a specific rule
populated="${populated}/ruleBlock/ruleAltList"              # with a top-level list of alternatives
populated="${populated}[count(labeledAlt) = 2]"             # check the alt list has only 2 branches
populated="${populated}/labeledAlt[alternative[element]]"   # select the populated branch
# ^ this is largely paranoia; in my pipeline I've already used a variation of this
# query to find the names of the relevant rules. 

trivial_ref="//parserRuleSpec//ruleref[RULE_REF/text()='<RULE_NAME>']"
trivial_def="//parserRuleSpec[RULE_REF/text()='<RULE_NAME>']"
dissolve_trivially_opt_rule() {
    local rule_name="${1:?rule name required}"
    local buffer
    local text

    buffer="$(cat -)"
    text="$(
        echo "$buffer" |
            trxgrep "${populated//<RULE_NAME>/$rule_name}" |
            trtext
    )"
    echo "$buffer" |
        trreplace "${trivial_ref//<RULE_NAME>/$rule_name}" "($text)?" |
        trdelete "${trivial_def//<RULE_NAME>/$rule_name}"
    echo "done: $rule_name" >&2
}

kaby76 mentioned this issue Sep 27, 2021

xpath engine doesn't seem to produce right result #29

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature request: make optional rules mandatory #21

feature request: make optional rules mandatory #21

SKalt commented Aug 28, 2021

SKalt commented Sep 26, 2021 •

edited

kaby76 commented Sep 26, 2021 •

edited

kaby76 commented Sep 27, 2021

SKalt commented Sep 28, 2021

feature request: make optional rules mandatory #21

feature request: make optional rules mandatory #21

Comments

SKalt commented Aug 28, 2021

SKalt commented Sep 26, 2021 • edited

kaby76 commented Sep 26, 2021 • edited

kaby76 commented Sep 27, 2021

SKalt commented Sep 28, 2021

SKalt commented Sep 26, 2021 •

edited

kaby76 commented Sep 26, 2021 •

edited