Skip to content

Experiment with RTParserBuilder usage in CTParserBuilder#559

Merged
ltratt merged 4 commits into
softdevteam:masterfrom
ratmice:ct_rt
May 3, 2025
Merged

Experiment with RTParserBuilder usage in CTParserBuilder#559
ltratt merged 4 commits into
softdevteam:masterfrom
ratmice:ct_rt

Conversation

@ratmice
Copy link
Copy Markdown
Collaborator

@ratmice ratmice commented Apr 25, 2025

This is my attempt to experiment with #555, my original thoughts here were a bit more involved than what this patch implements. I.e. making a callback CTLexerBuilder too which would give callers a LexerDef and an RTParserBuilder , and their respective &mut Header objects, so callers could add their own keys into Header.

None of that panned out because I wasn't able to make the ownership work, it would require a nested callback such that the LRLex one would call the user provided one, from within the CTParserBuilder.inspect_rt callback.
There was just no way to do that without moving the Header values, such that we could pass &mut Header.

I ended up trying to see, if we could still use RTParserBuilder within CTLexerBuilder, but just not exposing this to the user -- This seemed as much as I could achieve while avoiding the need for nested callbacks.

This passes the testsuite but only because none of the %grmtools sections have a test_files key, it currently
doesn't have any appropriate value such that we can build the test file reading/globbing. So it just passes an empty string, which not all parsers are capable of parsing -- but none of them have a test_files key, so it doesn't matter.

As such, this is a very much an incomplete draft, but maybe there is enough here to know if we want to continue working on it, or abandon the effort.

The next steps would be:

  • Implement a Value::String or to put a glob into, and "test_files" reading. Done in ca96adb
  • Implement test_files support/mode for nimbleparse, so you could just pass it lex/yacc files and noi input, and have it use the embedded globs to try and find input.

But it definitely does not seem like implement nimbleparse in a way that uses a unified ct/rt parser builder is really in the cards though.

Edit:
I guess it is also worth noting that this is essentially doing something very much like lang_tester, but not quite as smart given there is no mechanism to specify expected output or anything.

@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented Apr 25, 2025

One thing to note is that the test_files for CTParserBuilder isn't very good, in the sense that cargo doesn't rebuild the build.rs if input.txt changes and I'm not sure a good way to force it to, we could somehow include the timestamp of the files matched by the glob in the cache information.

There is kind of an issue where I mostly want this for nimbleparse, and nimbleparse_lsp, but the way the settings are in the grammar file, feels like it should also take effect on CTParserBuilder.

Anyhow it does work in the sense that if you do a cargo clean and enter invalid text into input.txt, cargo build tries to parse input.txt and fails to build. But we don't currently have a way to test this, because cttests/src/ctfails doesn't have a way to write arbitrary input.txt files, that may not be hard to do I'm not good knowledgeable enough about yaml to think know what might be the best syntax for that though.

@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented Apr 27, 2025

One thing to note is that this is admittedly pretty limited in it's functionality, So it seems reasonable to question if that simplicity inhibits this from becoming a more complete feature like lang tester, and even the insta crate.

  1. How to specify tests that should fail, and even the expected output
  2. What extension points do we have available should we take it as-is

I think even as simple as it is there is a couple of extension points, and ways we could specify the expected output

Either inline in comments like lang-tester, likely stripping the comments out, and specifying the comment character in a header (which likely must be stripped before passing to the grammar). Otherwise a separate input_filename.output specifying the expected output.

Similarly we could specify something other than raw input by having a known file extension which includes some metadata perhaps even in a %grmtools header, using that to specify the comment character and subsequently providing some structure to the format.

There is also the "just add a test_fails, or other additional keys" which I would tend only do as a last ditch effort.
I guess my thoughts are that we should probably have a plan towards how we want to approach these problems.
But I do think that there is a little bit of wiggle room even as stupid/simple as it is?

Comment thread lrpar/examples/start_states/src/comment.y Outdated
Comment thread lrlex/src/lib/ctbuilder.rs Outdated
#[doc(hidden)]
pub fn inspect_rt(
mut self,
cb: Box<
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can avoid the Box here with monomorphisation presumably with a trait bound? So we'd have something like:

fn inspect_rts<F>(mut self, cb: F) where F: FnOnce(...) -> Result<(), Box<dyn Error>> { ... }

?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I haven't had any luck in trying to do so. This callback function is extremely finicky if we focus on the caller, we can see that it gets passed a mutable reference to a header, which then gets used only a few lines later.
The only way we can do that is because the strange higher order trait bounds ensure the callback can't capture the reference.

            inspector_rt(&mut header, rtpb, &rule_ids, grmp)?
        }

        let unused_keys = header.unused();

A the same time within the closure it needs to have Ownership of or at least a mutable reference to a LexerDef so it can call set_rule_ids. In addition, we would also need to lift F into a generic where clause on RTParserBuilder since it is stored as a parameter.

I feel like I just look at this callback wrong, and I get into a weird ownership mess :)
I think it would be possible, but I think it pretty much requires moving F out of this function signature all the way to RTParserBuilder, and I don't know it doesn't seem right to have this #[doc(hidden)] callback adding generic parameters to such a prominent type like RTParserBuilder. So the box really is there to hide it in the self.inspect_rt field's signature, and keep it from escaping as a generic param on RTParserBuilder.

So I don't really know how we can do this and maintain that property.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point taken. Since we intend this to be an internal API (I think) (hence #[doc(hidden)]), perfection isn't as important. And we might have a bright idea soon!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The big reason it was hidden is that lrlex calls it, so it would be a bit awkward if we made it documented and then lrlex overwrote what the user set and perhaps vice versa and the feature stopped working.

It just seemed like a complexity that wasn't worth undertaking while still trying to figure out how the mechanism should work, so like I wasn't opposed to making it pub, but there are some complexities to doing so that I don't currently have any ideas towards resolving and I'm not sure it is actually needed as an extension point.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would prefer not to -- at least yet -- make this a part of the API that we commit to.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely agree on that.

@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented Apr 27, 2025

One additional thing we could do, e.g. to avoid feeling like we need to parse this from build.rs, just because we also parse the header within build.rs is we could do something like relax the key naming conventions for instance allowing a dot then having some kind of per-tool keys.

Then with some per-tool setting we wouldn't feel any pressure to support running RTParserBuilder from within CTParserBuilder, and if people want that they could exec("grmtools_test", ....) or something of the sort.

%grmtools {
      nimbleparse.test_files: "input.txt",
      RTParserBuilder.foo: "something",
      grmtools_tester.test_files: "maybe-even-a-new-tool.txt?",
}

we could even add a new tool like grmtools_tester specifically for this.

Anyhow, that per-tool key thought is kind of growing on me, that it may be a better to specify this which makes it more obvious that it only applies to some specific binary rather than to all things that happen to read the header like RTParserBuilder, because I really don't see us being able to reasonably fix how this is impacted by cargo, without fixing long standing cargo issues.

Plus this (at least as written) is a non-breaking change so I don't think we really need to rush into making some decision here. Anyhow at the very least I'm inclined to think perhaps this hasn't cooked enough.

@ltratt
Copy link
Copy Markdown
Member

ltratt commented Apr 28, 2025

Plus this (at least as written) is a non-breaking change so I don't think we really need to rush into making some decision here.

I'm inclined to get this in, and then use that as a forcing function to think about how we can improve it. At worst we can revert it! But, probably, we'll work out further ways of simplifying things. Sound like a plan?

@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented Apr 28, 2025

Plus this (at least as written) is a non-breaking change so I don't think we really need to rush into making some decision here.

I'm inclined to get this in, and then use that as a forcing function to think about how we can improve it. At worst we can revert it! But, probably, we'll work out further ways of simplifying things. Sound like a plan?

Alright, also at the worst we can add a flag that enables the setting, and leave it off by default and or limit it's usage in the examples. Would you prefer I finish up nimbleparse support in this branch or as a follow up?

@ltratt
Copy link
Copy Markdown
Member

ltratt commented Apr 29, 2025

Would you prefer I finish up nimbleparse support in this branch or as a follow up?

I think it might be useful to do it in this PR, just in case it shows up something interesting that causes us to reflect on the API.

@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented Apr 29, 2025

Would you prefer I finish up nimbleparse support in this branch or as a follow up?

I think it might be useful to do it in this PR, just in case it shows up something interesting that causes us to reflect on the API.

I'll get on it,

I don't think it will/can impact the API, since nimbleparse still doesn't use CTParserBuilder, and this patch isn't enough to allow it to do so (At the very least it would still try to generate sources), so nimbleparse is just going to read the key directly out of the header and do. The logic will likely be pretty independent and much simpler than CTParserBuilder.

@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented Apr 29, 2025

So I added it in 8c18126, I don't want to say that that patch isn't pretty. I feel that would be understating things. Because the combination of glob + OSString + HeaderValue introduces a ridiculous number of error conditions that need to be handled. But at least there were no unexpected difficulties.

Anyhow I did kind of get lost in all those branches, I'm hoping there's some way to clean it up, but without ? or try 😢

Comment thread nimbleparse/src/main.rs Outdated
member: (arg_memb, _),
},
}) => {
eprintln!("Expected 'test_files: \"glob\"' found constuctor '{:?}::{:?}' in '%grmtools' from: {:?}", ctor_memb, arg_memb, ctor_memb_loc);
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the only thoughts I've had for cleaning this up is TryFrom<Value<T>> for String,
The only thing being that doesn't allow us to pass the test_files key, nor mention globs.

I suppose we could implement it for a tuple TryFrom<(String, Value<T>)> passing the key -- or even a partial error string

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 621a0df I didn't try and abuse TryFrom which was lacking context, or doing TryFrom for a tuple, which just seemed like a hack...

Instead I added a kind of ad-hoc method expect_string_with_context, It seems like it improved things only a little. I don't have much in the way of ideas for cleaning it up further yet

@ltratt
Copy link
Copy Markdown
Member

ltratt commented May 1, 2025

I wonder if nimbleparse has gotten to the point where it needs to be broken up into more functions? That might help make it clearer what's going on?

@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented May 1, 2025

I wonder if nimbleparse has gotten to the point where it needs to be broken up into more functions? That might help make it clearer what's going on?

I don't know, perhaps but I think one of the big problems is that the glob taking &str and then having the whole OSstring conversion issue when dealing with paths which induces a lot of error cases. But on top of that, having that happen inside a fn main() precluding us from using ? and having to handle each error via branch/print/exit.

Perhaps we can push some of that code down into functions that return a result type making the multitude of error cases less impactful, but for all the source location printing code we're still requiring static dispatch on being able to pull spans out of errors, so just it isn't really clear to me where the right place to separate function boundaries would be, when the that static dispatch precludes things like Box<dyn Error> (because the source context is outside the error type).

Anyhow, I'll have a look through it while keeping splitting things up in mind.

@ltratt
Copy link
Copy Markdown
Member

ltratt commented May 2, 2025

Perhaps we can push some of that code down into functions that return a result type making the multitude of error cases less impactful

It feels like it's worth a try to me.

@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented May 2, 2025

Perhaps we can push some of that code down into functions that return a result type making the multitude of error cases less impactful

It feels like it's worth a try to me.

I tried it in 6442321

I don't know if it's really an improvement, it seems like:

  • alot of the errors are from handling None values, and need custom error text.
  • Most of the printing machinery we aren't benefitting from the custom error handling in order to avoid putting lifetimes on the error type we end up converting to string before all the formatting functions and just dumping out strings.

I'm not entirely sure it really helped at all, maybe the parse_many function is a little easier to follow, but it did end up being twice as many lines of total patch?

Comment thread lrpar/src/lib/dijkstra.rs
///
/// * `neighbours` takes a node `n` and returns an iterator consisting of all `n`'s neighbouring
/// nodes.
/// nodes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently this is the only clippy update, unrelated to this patch.

Seemed excessive to make a PR just for this one change.

@ltratt
Copy link
Copy Markdown
Member

ltratt commented May 2, 2025

I think it might be worth pulling that stuff in top-level methods rather than functions-in-functions? Still, yes, it's more code, but it is easier to follow the control flow without the process:exit(...)s!

@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented May 2, 2025

Yeah, I hadn't considered that, using a Self type also seems worth a shot.

Comment thread nimbleparse/src/main.rs Outdated
for input_path in input_paths {
let input = read_file(&input_path);
let lexer = lexerdef.lexer(&input);
let pb = RTParserBuilder::new(&grm, &stable).recoverer(recoverykind);
Copy link
Copy Markdown
Collaborator Author

@ratmice ratmice May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should try to pull the contruction of RTParserBuilder here up out of the loop.
We really shouldn't need to rebuild it for every input file, seems like something I overlooked.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed in bd406bb, also there was another issue with errors being ascribed to the wrong source file (the grammar.y rather than the input.txt).

@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented May 2, 2025

So in bd406bb I both moved these out of main, and reduced the number of args by moving them into their own type.

@ltratt
Copy link
Copy Markdown
Member

ltratt commented May 2, 2025

I think we're ready to squash + merge. Please squash!

ratmice added 4 commits May 2, 2025 15:22
This will be used in the next commit to implement testing
of by passing input source text to an RTParserBuilder,
during the CTParserBuilder grammar build phase.
… `CTParserBuilder.

This allows quoted string values to be specified within the %grmtools section,
A `test_files` value using it and implements `CTParserBuilder` for that key.
@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented May 2, 2025

Squashed, it isn't up to date with master, but it should merge cleanly afaict.

@ratmice ratmice marked this pull request as ready for review May 2, 2025 22:28
@ratmice ratmice linked an issue May 2, 2025 that may be closed by this pull request
@ratmice
Copy link
Copy Markdown
Collaborator Author

ratmice commented May 2, 2025

So I marked this as closing #555 mostly because I think it is as close to that as I can figure out for now.

I think this is probably everything I set out to do for my recent batch of commits, and then some.
So I'm likely to slow down on the PRs/patches and recharge my energy reserves for a little while,
unless anything comes up that needs my attention.

@ltratt ltratt added this pull request to the merge queue May 3, 2025
@ltratt
Copy link
Copy Markdown
Member

ltratt commented May 3, 2025

Thanks for all this -- it's much appreciated! As and when you find the next things to do, it'll be good to keep things moving :)

Merged via the queue into softdevteam:master with commit 8425187 May 3, 2025
2 checks passed
@ratmice ratmice deleted the ct_rt branch May 3, 2025 07:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unifying RTParserBuilder and CTParserBuilder

2 participants