lx lookahead isn't always necessary #112

jameysharp · 2019-02-20T05:37:20Z

lx -l c doesn't generate exactly the code I'd prefer for accepting states which have no further out-transitions.

For example, my sample language specification in #111 generates four states. In state S2, we've already seen .. so if we see a third . then we've matched an $ellipsis token. Currently, in that case lx generates a transition to a state S3 and continues with a new iteration of the loop, which calls lx_getc and then unconditionally passes the result to lx_ungetc before returning TOK_ELLIPSIS.

But instead of transitioning to a new state and reading then unreading a new character, when S2 matches a third ., it could immediately return TOK_ELLIPSIS.

This applies to any accepting state that has no out-transitions. From such a state, reading any additional character will always trigger an error transition. At that point the state machine must roll back to the most recent accepting state and return that, and we know statically which state that was.

This is a minor optimization, but I mostly care because the generated code would be slightly easier to understand if these unnecessary extra states were removed.

The text was updated successfully, but these errors were encountered:

katef added the enhancement label Mar 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lx lookahead isn't always necessary #112

lx lookahead isn't always necessary #112

jameysharp commented Feb 20, 2019

lx lookahead isn't always necessary #112

lx lookahead isn't always necessary #112

Comments

jameysharp commented Feb 20, 2019