Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support {read, include}-related diagnostics. #33

Closed
modulovalue opened this issue Jul 31, 2023 · 14 comments
Closed

Support {read, include}-related diagnostics. #33

modulovalue opened this issue Jul 31, 2023 · 14 comments

Comments

@modulovalue
Copy link

modulovalue commented Jul 31, 2023

There are two theorems that can be used for detecting whether a grammar can't be LR(k) for any k.

One uses the includes relation in the DeRemer and Pennello approach: #26 (comment)

The other one uses the reads relation in the DeRemer and Pennello approach: #26 (comment)

Despite everybody stating the opposite, the DeRemer and Pennello algorithm is very simple. one needs to just collect a bunch of facts while traversing the grammar, construct a graph from those facts, find the SCCs and a topological sorting of that graph (the Tarjan SCC finding algorithm does both at the same time), and propagate lookaheads through the graph until fix points have been reached.

Intuitively, it looks to me like, those theorems can be restated by saying that if multiple paths in the grammar flow graph share the same nodes, then the grammar is known not to be LR(k) because multiple nodes can end up on the same path.

In any case, I think it would be useful if grammophone could one day support visualizing these diagnostics.

One challenge with doing that would be that graphviz doesn't seem to support drawing edges from an edge to another edge. The DeRemer and Pennello approach works on transitions of the DFA, not on states of the DFA.

Here's how they visualize the includes relation in their paper:

Bildschirm­foto 2023-07-31 um 17 09 05

I guess one could simulate that with graphviz by turning each edge into 2 edges connected by an invisible node, but the practicality of that approach would have to be investigated.

@mdaines
Copy link
Owner

mdaines commented Jul 31, 2023

I suppose you could fake the diagram like this:

digraph {

  start -> A [dir=none]
  A -> x1

  start -> b [dir=none]
  b -> x2
  x2 -> B [dir=none]
  B -> x3

  B -> A [style=dashed]

  A [shape=point, xlabel="A"]
  b [shape=point, xlabel="b"]
  B [shape=point, xlabel="B"]

  x1 [shape=circle, label=""]
  x2 [shape=circle, label=""]
  x3 [shape=circle, label=""]

  {
    rank=same
    start
    A
    x1
  }

  {
    rank=same

    b
    x2
    B
    x3
  }
  
}

@mdaines
Copy link
Owner

mdaines commented Jul 31, 2023

I meant to add an SVG rendering of that:

Unknown

@modulovalue

This comment was marked as resolved.

@mdaines
Copy link
Owner

mdaines commented Jul 31, 2023

In general, I like the idea of adding more "diagnostics". I was experimenting with something simpler recently -- adding a diagram that shows the relation used to find unreachable nonterminals. I have some ideas to improve that sort of thing, because it's sort of a pain right now.

What I think I'd like to do is structure diagnostics as plugins, with a default set, and provide a way to load more within Grammophone, kind of like the Babel REPL does.

@modulovalue

This comment was marked as resolved.

@modulovalue

This comment was marked as resolved.

@modulovalue

This comment was marked as resolved.

@modulovalue

This comment was marked as resolved.

@modulovalue

This comment was marked as resolved.

@modulovalue

This comment was marked as resolved.

@modulovalue

This comment was marked as resolved.

@modulovalue

This comment was marked as resolved.

@modulovalue
Copy link
Author

I'm working on a new algorithm for upgrading LALR(1) automata to LR(1) and the DeRemer read and include relations appear to be pretty fundamental to the act of building LR(1) automata efficiently.

This issue focused on augmenting existing automata visualizations with read and include sets. This is hard to visualize and while it would be useful, it probably is still not worth the effort.

However, I think that visualizing all LALR relations (while ignoring DFA-related structural details) and providing metrics related to read and include sets is very valuable. They capture precisely why lookaheads exist, which is important for debugging purposes (to answer the question: why does X have lookahead Y?). I'm reopening this issue for that reason.

@modulovalue modulovalue reopened this Nov 13, 2023
@modulovalue
Copy link
Author

I no longer agree with the statement that pursuing this would be worthwhile. While this would be nice to have, it would be much more useful to actually show ambiguities that LALR(k) can't resolve and that approach would supersede this while requiring a similar amount of effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants