Initial attempt at adding RecoveryKind to yacc %grmtools section by ratmice · Pull Request #547 · softdevteam/grmtools

ratmice · 2025-04-01T21:35:44Z

Here is an initial attempt at adding recoverer to the grmtools section for yacc.

Things I don't like:

It feels to me like quite a bit of boiler plate for adding a single flag (checking all the kinds of value it could be but shouldn't be to throw errors...).
Not checking for unused flags %grmtools{unknown: Foo}
Parsing the section multiple times

I have some thoughts on these last two, for grmtools but lex is perhaps more complicated.

In the same way that ctparserbuilder calls into self.yacckind = ast_validation.yacc_kind(); we could use a similar path to communicate the key/values that went unused by YaccParser.

On the first one, I'll try and think of helper functions, but we have things like YaccKind which can be either a Unitary or a Constructor, so we need to be able to check that it is oneOf those.

I'll try and fiddle with it more, but this is at least a naive draft implementation.

Edit: I have some general thoughts on things to attempt,
largely replacing the HashMap with something that tracks key consumption,
and a small mechanism for querying values based on a repr enum.

#[repr(u8)]
enum ValueRepr {
   Flag = 1 << 0,
   Setting = 1 << 1
}

#[repr(u8)]
enum SettingRepr {
    Num = 1 << 2,
    Unitary = 1 << 3,
    Constructor = 1 << 4,
}

Anyhow the feeling I get is that a key query API that checks the value in the hashmap against one of those in bitmask form. That along with, marking keys as "used", doesn't seem to fall into any complexity traps.

I'll elaborate tomorrow in code form...

ltratt · 2025-04-02T13:36:01Z

I'll elaborate tomorrow in code form...

👍

This patch implements (but does not yet use) a wrapper around hashmap. * Adds an API for marking keys as `used`. * Provides a mechanism for getting an iterator over `unused` keys. * A query mechanism that checks whether value for a given key matches the specified variant of the of the `Value` type. Otherwise producing an error.

ratmice · 2025-04-02T15:09:37Z

I added basic implementation of what I was thinking in 96d721d It doesn't get used yet anywhere, so is largely untested.

Anyhow that is the basic idea of what I had in mind.

ratmice · 2025-04-02T17:10:36Z

+    pub fn query_key(
+        &self,
+        key: &str,
+        query_mask: u16,


Of course there is some borrow checker issues with the combination of query_key/mark_key_unused,
So this interface seems to require some changes. I haven't gotten the entry API to work (since we need to mutate the header not the HashMap. So might have to resort to contains_key -> mutate -> query.

Should be fixed in 06458b1

I'm not sure it's the best fix, the "mark as used" even on error behavior
is intended to avoid the following situation:

specify %grmtools{yacckind: Foo}

Get an "invalid yacc kind" for Foo error.

also get unused yacckind

ratmice · 2025-04-02T18:04:11Z

That at least seems like a first try at fixing all 3 of those issues, and I haven't thought of any more either.

ratmice · 2025-04-02T19:25:45Z

+        }
+    }
+    let recoverykind = recoverykind.unwrap_or(RecoveryKind::CPCTPlus);
+    ast_validation.header_mut().mark_key_used("recoverer");


It's worth pointing out a difference here between nimbleparse and CTParserBuilder.

In nimbleparse we mark_key_used("recoverer") even when the value is ignored because the value was set on the command line.

But in CTParserBuilder it will complain if you set recoverykind(CPCTPlus) but also have the key in the %grmtools section. The thinking for allowing it to error in CTParserBuilder was it might be confusing if one of the build.rs or %grmtools section went unused.

So how it is, if you specify it in both places you should get an unused in %grmtools section error.

Edit: There is kind of an interaction here with the forcing behavior of lrlex, we should probably consider how to handle it generally? Anyhow I'm not entirely certain what to think about it yet.

There is kind of an interaction here with the forcing behavior of lrlex, we should probably consider how to handle it generally? Anyhow I'm not entirely certain what to think about it yet.

In the sense that it's not clear which setting should be forcing which?

There is kind of an interaction here with the forcing behavior of lrlex, we should probably consider how to handle it generally? Anyhow I'm not entirely certain what to think about it yet.

In the sense that it's not clear which setting should be forcing which?

I think a good way to explain it is that currently, we have Force, Default, NoDefault, but we lack XOR.

Force overrides input text.

Default is overridden by input text.

NoDefault forces input text to specify.

So we don't have a way to specify "one of the builder, or the input text should specify, but not both"
That is actually the behavior which CTParserBuilder kind of implements for RecoveryKind because RecoveryKind only gets marked used if it is None the CTParserBuilder, so if it is specified in both,
we complain that RecoveryKind is unused in the input file.

This wasn't really an intended thing, I only noticed it when I added my comment, but the behavior as implemented does have a correlation to an XOR forcing mechanism

I guess the other thing is we originally added YaccKindResolver, there was only one value yacckind for cfgrammar::YaccParser, but when we got to lrlex there were multiple flags. Now that we integrate CTParserBuilder the forcing mechanism specific to cfgrammar, and it isn't clear how to expand it across these multiple crates. Neither of the forcing mechanisms the way we did it in cfgrammar nor the way we did it in lrlex works when crossing crate boundaries.

I guess that is where my head is most fogged up.

Part of me wonders if I shouldn't be looking towards the mark_key_used mechanism, and adding a flag to it
to make it less specific to used/unused, and then reusing the structure as an input parameter, and doing the whole merge thing we did with lexflags but with all these string arguments.

so:

enum Mark { Used: 1 << 0, Default: 1 << 1, Force: 1 << 3, XOR: 1 << 4 } /// You *may* mark a key that may or may not be in the header itself. fn mark_key(self: Header, key, mark: Mark)

Edits: added XOR, removed NoDefault (I think it would be just an absent key from a header arg)
I feel like this is probably at least worth experimenting with...

Fwiw, here is the start of alternative to hashmap I was playing with...
Under the assumption that this has so few keys it can afford to be less than optimal.

https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=9e586aca49a935069c4fa57fcb3c4c45

Edit: By no means is this well tested though, e.g. get_mark has a bug in that link

FWIW, here is a better tested version that implements the merge, unused checking and adds a required mark.

That is to say, it is approaching the merging behavior or LexFlags except in a stringly typed fashion, and on a per-key basis unlike lex_flags where we had different collections for defaults and forcing.

https://gist.github.com/ratmice/4081fec2d6118682ff7b433cb413943b

Anyhow I still can't tell if it is a terrible idea or not...

There is probably quite a ways to go before it actually feels like the HashMap API (if we even want that).
But essentially it tries to mimic HashMap api which includes these merge, missing, and unused functions.
Which specifies on a per-key basis how to resolve conflicts, or whether they are required but missing, or present but unused.

Edit: Still some issues with it's logic, but it is getting there...

I went ahead and integrated the thing in cdccd47 just to see how far away from getting it working in place of the HashMap we were.

I had to add the Borrow thing to a few places, and add contains_key, and change some method signatures for the used function.

But overall it did drop into place pretty well, this doesn't actually utilize the merge behavior, but I guess that would be next.

So the next step making CTParserBuilder use these marks and merge behaviors instead of things YaccKindResolver, is sort of being a painful task, and showing it's warts.

I added things like impl Into<header::Value> for YaccKind and then insert them into a separate header than the one read from the grammar source.

One of the issues is that Header assumes it's going to be read from a file, so It has things like Span.
while the one constructed from the CTParserBuilder doesn't have span data.

Ideally I guess we'd have headers with/without spans and be able to merge them into one with optional spans
Anyhow, I'll press on and maybe this just my distaste for stringly typed nature of it, but I certainly don't have high hopes for this particular experiment.

So c16876c is the first functioning attempt which replaces YaccKindResolver with this whole merge_from behavior, there is quite a few warts in there.

But it does appear to function.

ratmice · 2025-04-03T19:23:12Z

+    pub fn new() -> Self {
+        Header {
+            contents: HashMap::new(),
+            used: vec![],


Given the thoughts on extending the mark behavior to include more than used, but also forcing flags.
I've been thinking about ways to get rid of the used value entirely,

For instance whether we can use hashbrown::HashTable to store the mark flags along side the key in a way that allows them to be mutated separately without affecting the Hash.

I don't yet know whether this HashTable API though allows us to add to keys in a way that doesn't also require us to add an element. At the very least relatively popular crates like SlotMap unfortunately don't fall under such a use case as slots are created during insertion.

The other option is just using a sorted vec instead contents: Vec<(String, Mark, Option<(Span, Value)>>
which honestly seems like the simplest option to me.

Removing used was done in cdccd47

It contains a lot of hacks, like `unwrap()` and made up `Span::new()`. This is because this transitions the `Header` structure from being something that was always parsed, into something that gets constructed by `CTParserBuilder` and then merged with what is parsed. This is largely just an experiment with this approach to find out if it is actually viable.

ratmice · 2025-04-04T22:45:45Z

+            .contents_mut()
+            .set_merge_behavior(&"yacckind".to_string(), MergeBehavior::Ours);
+        header.contents_mut().mark_required(&"yacckind".to_string());
+        // FIXME seems like this this should be Eco!


Searching through this patch for eco, it appears that there are
a couple more instances besides this FIXME where the yacc kind seems like it should be Eco
but at some point most likely was accidentally changed to GenericParseTree.

Both this and the copy in firsts.rs both seem to have been using Original since they were added

ratmice · 2025-04-05T01:15:04Z


    #[test]
    fn test_first_no_subsequent_rules() {
+        let mut header = Header::new();


It would be really nice to simplify a lot of this testsuite stuff with a macro.
I think I'll try to make a test_support.rs module, with a macro header_for_yacckind! or something to that effect.

I did that in a7aee25.

ratmice · 2025-04-05T01:17:38Z

    /// then unused to construct a `YaccGrammar`, which will either produce an
    /// `Ok(YaccGrammar)` or an `Err` which includes these errors.
-    pub fn new(yacc_kind_resolver: YaccKindResolver, s: &str) -> Self {
+    pub fn new(header: Header, s: &str) -> Self {


So, I wonder here if we can take a header: &mut Header here,
and then avoid returning it in YaccParser::build(), it seems possible that would simplify things.

But I'm not sure whether the compiler will be able to tell that it isn't capture by the GrammarAST return value.

It appears that we can, this was done in 4c59bb9.

It required adding a lifetime to (now) YaccParser<'a>, but I think that should be a private type.
Seemed like a small win to me.

ratmice · 2025-04-05T04:23:16Z

-                self.header = header;
-                result
+            Ok((parsed_header, i)) => {
+                self.header.merge_from(parsed_header).unwrap();


This is unwrap() is just a temporary hack that was done just to get things working,
These errors should somehow be converted into a grammar error.

I fixed that by trying to map this error to one of the Duplicate errors DuplicateGrmtoolsSectionEntry.

06390fa
This subsequently required me to add a new spans kind for none-or-more spans in 6a7f5c5

I'm not certain that reusing DuplicateGrmtoolsSectionEntry is the right thing, e.g. usually we see duplicate because of something silly like:

%grmtools{yacckind: Grmtools, yacckind: Grmtools}

however this is like

cp_builder.yacckind(YaccKind::Grmtools)

Parsing a source file with:

%grmtools{yacckind: Grmtools}

ratmice · 2025-04-05T04:40:01Z

So, I think this is at a good place to decide if we should put any more effort into it, or cut our losses here.
There is quite a bit of small nuisances, and code organization that can be improved, which I have been ignoring, because I didn't want to fine tune it all and then decide I didn't even like it the basic idea...

Edit: I did start grinding down some of the sharper corners at least. I think most of what is left is just refining the tree API

But from a big picture perspective of having this header kind of tree, it's basic purpose of merging things CTParserBuilder or nimbleparse and the input sources. How it gets passed around and reused across crates because it just holds settings as a bunch of header::Values aka Strings...

I at least find the tree structure moderately interesting, and I can't remember seeing anything generalized of it's sort (Although it isn't really generalized, and not having seen anything like it might be a reason enough to avoid it!)...
Edit: My guess is that there probably is similar trees used for implementing row polymorphism, it would seem very easy to repurpose this tree into a basis for that at least.

Perhaps though it would help if I spent some time trying to get the commit history in a better state first?

Since this series now uses `From` impls to build headers from the command line, and builder values, header's may no longer have a notion of source location. This makes these spans `Option`, and adds a `SpansKind` that may be empty.

…tionEntry`

ratmice · 2025-04-05T17:12:55Z

                out.push('\n');
                let s = match e.spanskind() {
-                    SpansKind::DuplicationError => {
+                    SpansKind::DuplicationError | SpansKind::OptionalDuplicationSpan => {


This is a bit awkward, we can only count the duplicate occurances by their spans, so for OptionalDuplicationSpan, we can't accurately know how many duplicates there are, just how many we see.

This is kind of akin to the old C compiler problem where you can say get C preprocessor variables
cc -DFOO where their source location is on <command-line>.

Except here because we're dealing with Option ↝ Vec, these None values end up dropped on the floor.

Anyhow it seems likely to me that these should be something else besides Option, perhaps something to the effect of:

enum Location { Span(Span), CommandLine, Other(String), }

Allowing Other("From CTParserBuilder".to_string()) or some such.
Edit: I guess the main issue with this idea is we can't actually put one of those in a spans: Vec.

So, I'd suppose the right thing would be to keep HeaderError and friends and YaccGrammarError separate, making parse return an enum

enum ParseError { HeaderError(...), GrammarError(YaccGrammarError), }

then change HeaderError to use Location instead of Optional.
That keeps Location isolated to only those places where command line things can be present
which seems workable.

Attempted a fix in 714d9da anyways

When comparing command-line derived, or values from `CTParserBuilder`, to elements which are parsed from an input source we may or may not have span information. This includes an enum `Location` which contains a span, but can specify `CommandLine`, or `Other` location. The parts of `YaccParser::parse` which compare elements that are lacking spans, return errors that contain `Location`. Other parts of `YaccParser` *only* deal with elements that derive from input sources. which still return `YaccGrammarError`. So the top-level enum changed to one which can contain *either* `YaccGrammarError` *or* these other error kinds.

Previously we stored an `Option<u16>` where a key/value was inserted but it's mark could be `None`. This seems like a bad idea. we often did mark.map when we intended `mark.unwrap_or(0)`, So remove the `Option` and default to `0`.

…shmap.

ratmice · 2025-04-06T05:58:19Z

So, For now I feel like I should probably just close this, while it is finally in a state where it is both pretty much complete and I can't think of any glaring holes, It isn't exactly in a state where I feel it could reasonably be reviewed.

So I'll spend some time reordering the commits, so that free standing things can be committed separately
and squashing out any interim changes... hopefully I'll manage to get something more review friendly...

ltratt · 2025-04-07T06:59:22Z

Thanks!

Initial attempt at adding RecoveryKind to yacc %grmtools section

e3f5894

ratmice added 2 commits April 2, 2025 08:38

Fix flipped condition on unused

a94d084

Route Header value through ASTWithValidityInfo via build()

1577858

ratmice commented Apr 2, 2025

View reviewed changes

ratmice added 5 commits April 2, 2025 10:18

Wire YaccParser up to the Header type.

06458b1

clippy

8e33c10

Wire up CTParserBuilder to Header

1a66ae7

Reduntant if statement

648c9d4

Check for unused header settings.

47132c3

Wire up nimbleparse to Header

e59d852

ratmice commented Apr 2, 2025

View reviewed changes

ratmice commented Apr 3, 2025

View reviewed changes

ratmice added 3 commits April 4, 2025 07:43

Try integrating markmap

cdccd47

Remove unused ToTokens impl

af92420

ratmice commented Apr 4, 2025

View reviewed changes

ratmice commented Apr 5, 2025

View reviewed changes

ratmice added 3 commits April 4, 2025 19:19

Take header by mutable reference

4c59bb9

Use testsuite macros to build header from yacckind

a7aee25

manual whitespace fixes where rustfmt has given up.

ef8789a

ratmice commented Apr 5, 2025

View reviewed changes

ratmice added 3 commits April 5, 2025 08:31

Make header spans optional

7e418bf

Since this series now uses `From` impls to build headers from the command line, and builder values, header's may no longer have a notion of source location. This makes these spans `Option`, and adds a `SpansKind` that may be empty.

Translate merge exclusivity error into an error `DuplicateGrmtoolsSec…

06390fa

…tionEntry`

Rename WrongValueVariant to QueryTypeMismatch

6b055eb

ratmice added 2 commits April 5, 2025 09:50

Add an OptionalDuplicateSpan SpansKind

6a7f5c5

rustfmt

004f07f

ratmice commented Apr 5, 2025

View reviewed changes

ratmice added 3 commits April 5, 2025 14:03

Add insert_entry to VacantEntry, and fix the insert to match ha…

ad62406

…shmap.

ratmice closed this Apr 6, 2025

ratmice mentioned this pull request Apr 7, 2025

Lrpar section parser ctbuilder take2 #549

Merged

ratmice deleted the lrpar_section_parser_ctbuilder branch April 12, 2025 09:42

Conversation

ratmice commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ltratt commented Apr 2, 2025

Uh oh!

ratmice commented Apr 2, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ratmice commented Apr 2, 2025

Uh oh!

ratmice Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ratmice Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ratmice Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ratmice Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ratmice Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ratmice Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ratmice Apr 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ratmice Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ratmice Apr 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ratmice commented Apr 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ratmice Apr 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

ratmice commented Apr 1, 2025 •

edited

Loading

ratmice Apr 2, 2025 •

edited

Loading

ratmice Apr 3, 2025 •

edited

Loading

ratmice Apr 3, 2025 •

edited

Loading

ratmice Apr 3, 2025 •

edited

Loading

ratmice Apr 3, 2025 •

edited

Loading

ratmice Apr 4, 2025 •

edited

Loading

ratmice Apr 5, 2025 •

edited

Loading

ratmice Apr 4, 2025 •

edited

Loading

ratmice Apr 5, 2025 •

edited

Loading

ratmice commented Apr 5, 2025 •

edited

Loading

ratmice Apr 5, 2025 •

edited

Loading