Skip to content

Initial attempt at adding RecoveryKind to yacc %grmtools section#547

Closed
ratmice wants to merge 24 commits into
softdevteam:masterfrom
ratmice:lrpar_section_parser_ctbuilder
Closed

Initial attempt at adding RecoveryKind to yacc %grmtools section#547
ratmice wants to merge 24 commits into
softdevteam:masterfrom
ratmice:lrpar_section_parser_ctbuilder

Conversation

@ratmice
Copy link
Copy Markdown
Collaborator

@ratmice ratmice commented Apr 1, 2025

Here is an initial attempt at adding recoverer to the grmtools section for yacc.

Things I don't like:

  1. It feels to me like quite a bit of boiler plate for adding a single flag (checking all the kinds of value it could be but shouldn't be to throw errors...).
  2. Not checking for unused flags %grmtools{unknown: Foo}
  3. Parsing the section multiple times

I have some thoughts on these last two, for grmtools but lex is perhaps more complicated.

In the same way that ctparserbuilder calls into self.yacckind = ast_validation.yacc_kind(); we could use a similar path to communicate the key/values that went unused by YaccParser.

On the first one, I'll try and think of helper functions, but we have things like YaccKind which can be either a Unitary or a Constructor, so we need to be able to check that it is oneOf those.

I'll try and fiddle with it more, but this is at least a naive draft implementation.

Edit: I have some general thoughts on things to attempt,
largely replacing the HashMap with something that tracks key consumption,
and a small mechanism for querying values based on a repr enum.

#[repr(u8)]
enum ValueRepr {
   Flag = 1 << 0,
   Setting = 1 << 1
}
#[repr(u8)]
enum SettingRepr {
    Num = 1 << 2,
    Unitary = 1 << 3,
    Constructor = 1 << 4,
}

Anyhow the feeling I get is that a key query API that checks the value in the hashmap against one of those in bitmask form. That along with, marking keys as "used", doesn't seem to fall into any complexity traps.

I'll elaborate tomorrow in code form...

@ltratt
Copy link
Copy Markdown
Member

ltratt commented Apr 2, 2025

I'll elaborate tomorrow in code form...

👍

This patch implements (but does not yet use) a wrapper around hashmap.

* Adds an API for marking keys as `used`.
* Provides a mechanism for getting an iterator over `unused` keys.
* A query mechanism that checks whether value for a given key
  matches the specified variant of the of the `Value` type.
  Otherwise producing an error.
@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented Apr 2, 2025

I added basic implementation of what I was thinking in 96d721d It doesn't get used yet anywhere, so is largely untested.

Anyhow that is the basic idea of what I had in mind.

pub fn query_key(
&self,
key: &str,
query_mask: u16,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course there is some borrow checker issues with the combination of query_key/mark_key_unused,
So this interface seems to require some changes. I haven't gotten the entry API to work (since we need to mutate the header not the HashMap. So might have to resort to contains_key -> mutate -> query.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed in 06458b1

I'm not sure it's the best fix, the "mark as used" even on error behavior
is intended to avoid the following situation:

  1. specify %grmtools{yacckind: Foo}
  2. Get an "invalid yacc kind" for Foo error.
  3. also get unused yacckind

@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented Apr 2, 2025

That at least seems like a first try at fixing all 3 of those issues, and I haven't thought of any more either.

Comment thread nimbleparse/src/main.rs Outdated
}
}
let recoverykind = recoverykind.unwrap_or(RecoveryKind::CPCTPlus);
ast_validation.header_mut().mark_key_used("recoverer");
Copy link
Copy Markdown
Collaborator Author

@ratmice ratmice Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth pointing out a difference here between nimbleparse and CTParserBuilder.

In nimbleparse we mark_key_used("recoverer") even when the value is ignored because the value was set on the command line.

But in CTParserBuilder it will complain if you set recoverykind(CPCTPlus) but also have the key in the %grmtools section. The thinking for allowing it to error in CTParserBuilder was it might be confusing if one of the build.rs or %grmtools section went unused.

So how it is, if you specify it in both places you should get an unused in %grmtools section error.

Edit: There is kind of an interaction here with the forcing behavior of lrlex, we should probably consider how to handle it generally? Anyhow I'm not entirely certain what to think about it yet.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is kind of an interaction here with the forcing behavior of lrlex, we should probably consider how to handle it generally? Anyhow I'm not entirely certain what to think about it yet.

In the sense that it's not clear which setting should be forcing which?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is kind of an interaction here with the forcing behavior of lrlex, we should probably consider how to handle it generally? Anyhow I'm not entirely certain what to think about it yet.

In the sense that it's not clear which setting should be forcing which?

I think a good way to explain it is that currently, we have Force, Default, NoDefault, but we lack XOR.

  • Force overrides input text.
  • Default is overridden by input text.
  • NoDefault forces input text to specify.

So we don't have a way to specify "one of the builder, or the input text should specify, but not both"
That is actually the behavior which CTParserBuilder kind of implements for RecoveryKind because RecoveryKind only gets marked used if it is None the CTParserBuilder, so if it is specified in both,
we complain that RecoveryKind is unused in the input file.

This wasn't really an intended thing, I only noticed it when I added my comment, but the behavior as implemented does have a correlation to an XOR forcing mechanism

Copy link
Copy Markdown
Collaborator Author

@ratmice ratmice Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the other thing is we originally added YaccKindResolver, there was only one value yacckind for cfgrammar::YaccParser, but when we got to lrlex there were multiple flags. Now that we integrate CTParserBuilder the forcing mechanism specific to cfgrammar, and it isn't clear how to expand it across these multiple crates. Neither of the forcing mechanisms the way we did it in cfgrammar nor the way we did it in lrlex works when crossing crate boundaries.

I guess that is where my head is most fogged up.

Copy link
Copy Markdown
Collaborator Author

@ratmice ratmice Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of me wonders if I shouldn't be looking towards the mark_key_used mechanism, and adding a flag to it
to make it less specific to used/unused, and then reusing the structure as an input parameter, and doing the whole merge thing we did with lexflags but with all these string arguments.

so:

enum Mark {
  Used: 1 << 0,
  Default: 1 << 1,
  Force: 1 << 3,
  XOR: 1 << 4
}
/// You *may* mark a key that may or may not be in the header itself.
fn mark_key(self: Header, key, mark: Mark)

Edits: added XOR, removed NoDefault (I think it would be just an absent key from a header arg)
I feel like this is probably at least worth experimenting with...

Copy link
Copy Markdown
Collaborator Author

@ratmice ratmice Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw, here is the start of alternative to hashmap I was playing with...
Under the assumption that this has so few keys it can afford to be less than optimal.

https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=9e586aca49a935069c4fa57fcb3c4c45

Edit: By no means is this well tested though, e.g. get_mark has a bug in that link

Copy link
Copy Markdown
Collaborator Author

@ratmice ratmice Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, here is a better tested version that implements the merge, unused checking and adds a required mark.

That is to say, it is approaching the merging behavior or LexFlags except in a stringly typed fashion, and on a per-key basis unlike lex_flags where we had different collections for defaults and forcing.

https://gist.github.com/ratmice/4081fec2d6118682ff7b433cb413943b

Anyhow I still can't tell if it is a terrible idea or not...

There is probably quite a ways to go before it actually feels like the HashMap API (if we even want that).
But essentially it tries to mimic HashMap api which includes these merge, missing, and unused functions.
Which specifies on a per-key basis how to resolve conflicts, or whether they are required but missing, or present but unused.

Edit: Still some issues with it's logic, but it is getting there...

Copy link
Copy Markdown
Collaborator Author

@ratmice ratmice Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and integrated the thing in cdccd47 just to see how far away from getting it working in place of the HashMap we were.

I had to add the Borrow thing to a few places, and add contains_key, and change some method signatures for the used function.

But overall it did drop into place pretty well, this doesn't actually utilize the merge behavior, but I guess that would be next.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the next step making CTParserBuilder use these marks and merge behaviors instead of things YaccKindResolver, is sort of being a painful task, and showing it's warts.

I added things like impl Into<header::Value> for YaccKind and then insert them into a separate header than the one read from the grammar source.

One of the issues is that Header assumes it's going to be read from a file, so It has things like Span.
while the one constructed from the CTParserBuilder doesn't have span data.

Ideally I guess we'd have headers with/without spans and be able to merge them into one with optional spans
Anyhow, I'll press on and maybe this just my distaste for stringly typed nature of it, but I certainly don't have high hopes for this particular experiment.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So c16876c is the first functioning attempt which replaces YaccKindResolver with this whole merge_from behavior, there is quite a few warts in there.

But it does appear to function.

Comment thread cfgrammar/src/lib/header.rs Outdated
pub fn new() -> Self {
Header {
contents: HashMap::new(),
used: vec![],
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the thoughts on extending the mark behavior to include more than used, but also forcing flags.
I've been thinking about ways to get rid of the used value entirely,

For instance whether we can use hashbrown::HashTable to store the mark flags along side the key in a way that allows them to be mutated separately without affecting the Hash.

I don't yet know whether this HashTable API though allows us to add to keys in a way that doesn't also require us to add an element. At the very least relatively popular crates like SlotMap unfortunately don't fall under such a use case as slots are created during insertion.

The other option is just using a sorted vec instead contents: Vec<(String, Mark, Option<(Span, Value)>>
which honestly seems like the simplest option to me.

Copy link
Copy Markdown
Collaborator Author

@ratmice ratmice Apr 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing used was done in cdccd47

ratmice added 3 commits April 4, 2025 07:43
It contains a lot of hacks, like `unwrap()` and made up `Span::new()`.
This is because this transitions the `Header` structure from being
something that was always parsed, into something that gets constructed
by `CTParserBuilder` and then merged with what is parsed.

This is largely just an experiment with this approach to find out
if it is actually viable.
Comment thread lrtable/src/lib/itemset.rs Outdated
.contents_mut()
.set_merge_behavior(&"yacckind".to_string(), MergeBehavior::Ours);
header.contents_mut().mark_required(&"yacckind".to_string());
// FIXME seems like this this should be Eco!
Copy link
Copy Markdown
Collaborator Author

@ratmice ratmice Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Searching through this patch for eco, it appears that there are
a couple more instances besides this FIXME where the yacc kind seems like it should be Eco
but at some point most likely was accidentally changed to GenericParseTree.

Copy link
Copy Markdown
Collaborator Author

@ratmice ratmice Apr 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both this and the copy in firsts.rs both seem to have been using Original since they were added

Comment thread cfgrammar/src/lib/yacc/firsts.rs Outdated

#[test]
fn test_first_no_subsequent_rules() {
let mut header = Header::new();
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be really nice to simplify a lot of this testsuite stuff with a macro.
I think I'll try to make a test_support.rs module, with a macro header_for_yacckind! or something to that effect.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that in a7aee25.

Comment thread cfgrammar/src/lib/yacc/ast.rs Outdated
/// then unused to construct a `YaccGrammar`, which will either produce an
/// `Ok(YaccGrammar)` or an `Err` which includes these errors.
pub fn new(yacc_kind_resolver: YaccKindResolver, s: &str) -> Self {
pub fn new(header: Header, s: &str) -> Self {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I wonder here if we can take a header: &mut Header here,
and then avoid returning it in YaccParser::build(), it seems possible that would simplify things.

But I'm not sure whether the compiler will be able to tell that it isn't capture by the GrammarAST return value.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that we can, this was done in 4c59bb9.

It required adding a lifetime to (now) YaccParser<'a>, but I think that should be a private type.
Seemed like a small win to me.

Comment thread cfgrammar/src/lib/yacc/parser.rs Outdated
self.header = header;
result
Ok((parsed_header, i)) => {
self.header.merge_from(parsed_header).unwrap();
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unwrap() is just a temporary hack that was done just to get things working,
These errors should somehow be converted into a grammar error.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed that by trying to map this error to one of the Duplicate errors DuplicateGrmtoolsSectionEntry.

06390fa
This subsequently required me to add a new spans kind for none-or-more spans in 6a7f5c5

I'm not certain that reusing DuplicateGrmtoolsSectionEntry is the right thing, e.g. usually we see duplicate because of something silly like:

%grmtools{yacckind: Grmtools, yacckind: Grmtools}

however this is like

cp_builder.yacckind(YaccKind::Grmtools)

Parsing a source file with:

%grmtools{yacckind: Grmtools}

@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented Apr 5, 2025

So, I think this is at a good place to decide if we should put any more effort into it, or cut our losses here.
There is quite a bit of small nuisances, and code organization that can be improved, which I have been ignoring, because I didn't want to fine tune it all and then decide I didn't even like it the basic idea...

Edit: I did start grinding down some of the sharper corners at least. I think most of what is left is just refining the tree API

But from a big picture perspective of having this header kind of tree, it's basic purpose of merging things CTParserBuilder or nimbleparse and the input sources. How it gets passed around and reused across crates because it just holds settings as a bunch of header::Values aka Strings...

I at least find the tree structure moderately interesting, and I can't remember seeing anything generalized of it's sort (Although it isn't really generalized, and not having seen anything like it might be a reason enough to avoid it!)...
Edit: My guess is that there probably is similar trees used for implementing row polymorphism, it would seem very easy to repurpose this tree into a basis for that at least.

Perhaps though it would help if I spent some time trying to get the commit history in a better state first?

ratmice added 3 commits April 5, 2025 08:31
Since this series now uses `From` impls to build headers from the command line,
and builder values, header's may no longer have a notion of source location.

This makes these spans `Option<Span>`, and adds a `SpansKind` that may be empty.
out.push('\n');
let s = match e.spanskind() {
SpansKind::DuplicationError => {
SpansKind::DuplicationError | SpansKind::OptionalDuplicationSpan => {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit awkward, we can only count the duplicate occurances by their spans, so for OptionalDuplicationSpan, we can't accurately know how many duplicates there are, just how many we see.

This is kind of akin to the old C compiler problem where you can say get C preprocessor variables
cc -DFOO where their source location is on <command-line>.

Except here because we're dealing with Option<Span>Vec<Span>, these None values end up dropped on the floor.

Copy link
Copy Markdown
Collaborator Author

@ratmice ratmice Apr 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyhow it seems likely to me that these should be something else besides Option<Span>, perhaps something to the effect of:

enum Location {
   Span(Span),
   CommandLine,
   Other(String),
}

Allowing Other("From CTParserBuilder".to_string()) or some such.
Edit: I guess the main issue with this idea is we can't actually put one of those in a spans: Vec<Span>.

So, I'd suppose the right thing would be to keep HeaderError and friends and YaccGrammarError separate, making parse return an enum

enum ParseError {
    HeaderError(...),
    GrammarError(YaccGrammarError),
}

then change HeaderError to use Location instead of Optional<Span>.
That keeps Location isolated to only those places where command line things can be present
which seems workable.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attempted a fix in 714d9da anyways

ratmice added 3 commits April 5, 2025 14:03
When comparing command-line derived, or values from `CTParserBuilder`, to elements
which are parsed from an input source we may or may not have span information.

This includes an enum `Location` which contains a span, but can specify
`CommandLine`, or `Other` location. The parts of `YaccParser::parse` which
compare elements that are lacking spans, return errors that contain `Location`.

Other parts of `YaccParser` *only* deal with elements that derive from input sources.
which still return `YaccGrammarError`.  So the top-level enum changed to one which
can contain *either* `YaccGrammarError` *or* these other error kinds.
Previously we stored an `Option<u16>` where a key/value was
inserted but it's mark could be `None`. This seems like a bad idea.
we often did mark.map when we intended `mark.unwrap_or(0)`,
So remove the `Option` and default to `0`.
@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented Apr 6, 2025

So, For now I feel like I should probably just close this, while it is finally in a state where it is both pretty much complete and I can't think of any glaring holes, It isn't exactly in a state where I feel it could reasonably be reviewed.

So I'll spend some time reordering the commits, so that free standing things can be committed separately
and squashing out any interim changes... hopefully I'll manage to get something more review friendly...

@ratmice ratmice closed this Apr 6, 2025
@ltratt
Copy link
Copy Markdown
Member

ltratt commented Apr 7, 2025

Thanks!

@ratmice ratmice deleted the lrpar_section_parser_ctbuilder branch April 12, 2025 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants