gvpr script to find max required lookahead for lx #116

jameysharp · 2019-02-22T01:07:44Z

This script analyzes the output of lx -l dot to compute how much input a lexer must be able to buffer after reaching an accepting state before it can determine whether there's a longer match that it should return instead. I wrote it to investigate issue #111 but I think it's a useful analysis in its own right.

Sample output from the lx examples:

$ for f in examples/lx/*.lx; do echo "$f:"; build/bin/lx -l dot < "$f" | gvpr -f share/bin/lookahead.gvpr; echo; done
examples/lx/act.lx:
after accepting, a longest pattern leading to another accept is:
- $act_assign [A-Z_a-z] $act_assign
- $act_ident [A-Z_a-z] $act_ident
- $act_label [A-Z_a-z] $act_label
- $act_ref [A-Z_a-z] $act_ref
- $act_str [^@] $act_str
- $act_token [A-Z_a-z] $act_token
- $ident [-A-Z_a-z] $ident
- discard [\t\n ] discard
requires 1 characters of lookahead

examples/lx/a.lx:
after accepting, a longest pattern leading to another accept is:
- $a bc $token
- $int [0-9] $int
- $str_char 0[0-9] $str_octal
- $str_octal [0-9] $str_octal
- discard / discard
unbounded lookahead required after these tokens:
- $string

examples/lx/c11-pp.lx:
after accepting, a longest pattern leading to another accept is:
- $chr / $block_comment_end
- $escape_hex [0-9A-Fa-f] $escape_hex
- $escape_octal [0-7] $escape_octal
- $identifier \\U[0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f] $identifier
- $newline \n $newline
- $other U[0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f] $identifier
- $pp_number \\U[0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f] $pp_number
- $punctuator %: $punctuator
- $whitespace [\t\v\f ] $whitespace
requires 10 characters of lookahead

examples/lx/c90-pp.lx:
ambiguous mappings to $q_char_sequence, $h_char_sequence; for example on input '<a'
ambiguous mappings to $h_char_sequence, $q_char_sequence; for example on input '<a>'
after accepting, a longest pattern leading to another accept is:
- $h_char_sequence [!"#%-=?A-Z\-_a-z|}~] $h_char_sequence
- $q_char_sequence [!#%-?A-Z\-_a-z|}~] $q_char_sequence
- discard [!#%-?A-Z\-_a-z|}~] $q_char_sequence
requires 1 characters of lookahead

examples/lx/literals.lx:
after accepting, a longest pattern leading to another accept is:
- $float [Ee][+-][0-9] $float
- $int [0-9] $int
- $str_lit [^"\] $str_lit
- discard [\t\n\r ] discard
requires 3 characters of lookahead

examples/lx/longest.lx:
after accepting, a longest pattern leading to another accept is:
- $dot .. $ellipsis
requires 2 characters of lookahead

examples/lx/scheme.lx:
after accepting, a longest pattern leading to another accept is:
- $ident /./ $ident
- discard [\t\n ] discard
requires 1 characters of lookahead

examples/lx/trie.lx:
after accepting, a longest pattern leading to another accept is:
- $bye e $bye
requires 1 characters of lookahead

katef · 2019-02-22T01:57:58Z

I love it! Thank you!

gvpr script to find max required lookahead for lx

0db1526

katef merged commit 9ec9b14 into katef:master Feb 22, 2019

jameysharp deleted the lookahead-analysis branch February 22, 2019 03:19

jameysharp mentioned this pull request Feb 22, 2019

lx doesn't always return the longest match #111

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gvpr script to find max required lookahead for lx #116

gvpr script to find max required lookahead for lx #116

jameysharp commented Feb 22, 2019

katef commented Feb 22, 2019

gvpr script to find max required lookahead for lx #116

gvpr script to find max required lookahead for lx #116

Conversation

jameysharp commented Feb 22, 2019

katef commented Feb 22, 2019