Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose more than one rule? #191

Open
utkarshkukreti opened this issue May 30, 2020 · 3 comments
Open

Expose more than one rule? #191

utkarshkukreti opened this issue May 30, 2020 · 3 comments
Labels
enhancement New feature or request

Comments

@utkarshkukreti
Copy link
Contributor

Question / Feature Request: Is there any way to parse a specific rule as the starting parser? For example, if I have:

%start Expr
%%
Expr -> ...;

Int -> ...;
%%

I also want to be able to parse a string as Int, not just Expr.

(I'm trying to port my parser from LALRPOP to lrpar (mainly because of the operator precedence feature) which exposes a parser for any rule prefixed with the keyword pub.)

@ltratt
Copy link
Member

ltratt commented May 30, 2020

I must admit, this isn't a feature that I'd thought about. At first glance, it seems an awkward fit with LR parsing, in the sense that, at least conceptually, you need 1 statetable per start rule. However, in practise I think you might be able to use a single statetable and "emulate" the accept state for an arbitrary start rule such that you can get away with a single statetable. However, I might be wrong about that. LALRPOP, uses the lane table algorithm, which might or might not make this stuff easier -- I haven't familiarised myself with the algorithm. @nikomatsakis might have a thought or two on this.

So, at the moment, unfortunately, your only option is to duplicate the Yacc grammar for each start rule. However, what I can fairly easily do is remove some of the assumptions the grmtools libraries have about start states. That won't get us all of the way to the feature you're asking for, but it will make it easier for someone else to implement it -- and I hope they do, because this is the sort of feature that I think grmtools should be flexible enough to accommodate!

bors bot added a commit that referenced this issue Jun 1, 2020
194: Remove the pervasive assumption that the start state is 0. r=ptersilie a=ltratt

There's no good reason for us exposing an assumption about which state in the 
stategraph/statetable is the start state, and it may reduce our flexibility later. This commit abstracts this assumption away such that the start state is specified concretely just once (in pager::pager_stategraph), thus giving us a single place to consider if we ever do want to change this assumption.

This addresses one small part of #191.

Co-authored-by: Laurence Tratt <laurie@tratt.net>
@nikomatsakis
Copy link

I don't think Lane Table helps in particular but I've not given it a lot of thought. I think LALRPOP permits you to tag multiple rules as pub but IIRC it just generates separate parsers for each one, there is no shared code or state. (I could be misremembering.)

@ltratt
Copy link
Member

ltratt commented Mar 29, 2021

Sorry @nikomatsakis for not saying thanks for your comment (I'm only nearly a year late)!

Coming back to this one with the benefit of hindsight, I think that I'd be fine with generating 1 parser per start rule at first. It's probably suboptimal, but I'm fine with getting the functionality in and working out how to make it more efficient in the future.

AFAICS neither Yacc nor Bison supports this, so that gives us both the freedom to do what we want, but also the difficulty of sifting amongst the design choices. My first thought is that I think it would be reasonable to allow the %start rule to specify more than 1 start rules (e.g. %start R1 R2).

The slightly tricky thing to think about is what are the resulting parsers called? At the moment if you have a lexer g.l and a grammar g.y you end up with modules named g_l and g_y with functions lexerdef and parser respectively. I think if a user specifies a single start rule, we should maintain that behaviour. I can then see at least two possibilities:

  1. We generate two parser modules g_r1_y and g_r2_y both with parser functions.
  2. We generate one parser module with r1_parser and r2_parser functions.

I slightly prefer the second option, but could be persuaded otherwise.

@ltratt ltratt added the enhancement New feature or request label Mar 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants