Experiment with RTParserBuilder usage in CTParserBuilder#559
Conversation
|
One thing to note is that the There is kind of an issue where I mostly want this for Anyhow it does work in the sense that if you do a |
|
One thing to note is that this is admittedly pretty limited in it's functionality, So it seems reasonable to question if that simplicity inhibits this from becoming a more complete feature like lang tester, and even the insta crate.
I think even as simple as it is there is a couple of extension points, and ways we could specify the expected output Either inline in comments like lang-tester, likely stripping the comments out, and specifying the comment character in a header (which likely must be stripped before passing to the grammar). Otherwise a separate Similarly we could specify something other than raw input by having a known file extension which includes some metadata perhaps even in a There is also the "just add a test_fails, or other additional keys" which I would tend only do as a last ditch effort. |
| #[doc(hidden)] | ||
| pub fn inspect_rt( | ||
| mut self, | ||
| cb: Box< |
There was a problem hiding this comment.
I wonder if we can avoid the Box here with monomorphisation presumably with a trait bound? So we'd have something like:
fn inspect_rts<F>(mut self, cb: F) where F: FnOnce(...) -> Result<(), Box<dyn Error>> { ... }?
There was a problem hiding this comment.
I don't know, I haven't had any luck in trying to do so. This callback function is extremely finicky if we focus on the caller, we can see that it gets passed a mutable reference to a header, which then gets used only a few lines later.
The only way we can do that is because the strange higher order trait bounds ensure the callback can't capture the reference.
inspector_rt(&mut header, rtpb, &rule_ids, grmp)?
}
let unused_keys = header.unused();
A the same time within the closure it needs to have Ownership of or at least a mutable reference to a LexerDef so it can call set_rule_ids. In addition, we would also need to lift F into a generic where clause on RTParserBuilder since it is stored as a parameter.
I feel like I just look at this callback wrong, and I get into a weird ownership mess :)
I think it would be possible, but I think it pretty much requires moving F out of this function signature all the way to RTParserBuilder, and I don't know it doesn't seem right to have this #[doc(hidden)] callback adding generic parameters to such a prominent type like RTParserBuilder. So the box really is there to hide it in the self.inspect_rt field's signature, and keep it from escaping as a generic param on RTParserBuilder.
So I don't really know how we can do this and maintain that property.
There was a problem hiding this comment.
Point taken. Since we intend this to be an internal API (I think) (hence #[doc(hidden)]), perfection isn't as important. And we might have a bright idea soon!
There was a problem hiding this comment.
The big reason it was hidden is that lrlex calls it, so it would be a bit awkward if we made it documented and then lrlex overwrote what the user set and perhaps vice versa and the feature stopped working.
It just seemed like a complexity that wasn't worth undertaking while still trying to figure out how the mechanism should work, so like I wasn't opposed to making it pub, but there are some complexities to doing so that I don't currently have any ideas towards resolving and I'm not sure it is actually needed as an extension point.
There was a problem hiding this comment.
I think I would prefer not to -- at least yet -- make this a part of the API that we commit to.
There was a problem hiding this comment.
Definitely agree on that.
|
One additional thing we could do, e.g. to avoid feeling like we need to parse this from Then with some per-tool setting we wouldn't feel any pressure to support running we could even add a new tool like Anyhow, that per-tool key thought is kind of growing on me, that it may be a better to specify this which makes it more obvious that it only applies to some specific binary rather than to all things that happen to read the header like Plus this (at least as written) is a non-breaking change so I don't think we really need to rush into making some decision here. Anyhow at the very least I'm inclined to think perhaps this hasn't cooked enough. |
I'm inclined to get this in, and then use that as a forcing function to think about how we can improve it. At worst we can revert it! But, probably, we'll work out further ways of simplifying things. Sound like a plan? |
Alright, also at the worst we can add a flag that enables the setting, and leave it off by default and or limit it's usage in the examples. Would you prefer I finish up nimbleparse support in this branch or as a follow up? |
I think it might be useful to do it in this PR, just in case it shows up something interesting that causes us to reflect on the API. |
I'll get on it, I don't think it will/can impact the API, since nimbleparse still doesn't use |
|
So I added it in 8c18126, I don't want to say that that patch isn't pretty. I feel that would be understating things. Because the combination of Anyhow I did kind of get lost in all those branches, I'm hoping there's some way to clean it up, but without |
| member: (arg_memb, _), | ||
| }, | ||
| }) => { | ||
| eprintln!("Expected 'test_files: \"glob\"' found constuctor '{:?}::{:?}' in '%grmtools' from: {:?}", ctor_memb, arg_memb, ctor_memb_loc); |
There was a problem hiding this comment.
So, the only thoughts I've had for cleaning this up is TryFrom<Value<T>> for String,
The only thing being that doesn't allow us to pass the test_files key, nor mention globs.
I suppose we could implement it for a tuple TryFrom<(String, Value<T>)> passing the key -- or even a partial error string
There was a problem hiding this comment.
In 621a0df I didn't try and abuse TryFrom which was lacking context, or doing TryFrom for a tuple, which just seemed like a hack...
Instead I added a kind of ad-hoc method expect_string_with_context, It seems like it improved things only a little. I don't have much in the way of ideas for cleaning it up further yet
|
I wonder if nimbleparse has gotten to the point where it needs to be broken up into more functions? That might help make it clearer what's going on? |
I don't know, perhaps but I think one of the big problems is that the glob taking Perhaps we can push some of that code down into functions that return a result type making the multitude of error cases less impactful, but for all the source location printing code we're still requiring static dispatch on being able to pull spans out of errors, so just it isn't really clear to me where the right place to separate function boundaries would be, when the that static dispatch precludes things like Anyhow, I'll have a look through it while keeping splitting things up in mind. |
It feels like it's worth a try to me. |
I tried it in 6442321 I don't know if it's really an improvement, it seems like:
I'm not entirely sure it really helped at all, maybe the |
| /// | ||
| /// * `neighbours` takes a node `n` and returns an iterator consisting of all `n`'s neighbouring | ||
| /// nodes. | ||
| /// nodes. |
There was a problem hiding this comment.
Apparently this is the only clippy update, unrelated to this patch.
Seemed excessive to make a PR just for this one change.
|
I think it might be worth pulling that stuff in top-level methods rather than functions-in-functions? Still, yes, it's more code, but it is easier to follow the control flow without the |
|
Yeah, I hadn't considered that, using a |
| for input_path in input_paths { | ||
| let input = read_file(&input_path); | ||
| let lexer = lexerdef.lexer(&input); | ||
| let pb = RTParserBuilder::new(&grm, &stable).recoverer(recoverykind); |
There was a problem hiding this comment.
Should try to pull the contruction of RTParserBuilder here up out of the loop.
We really shouldn't need to rebuild it for every input file, seems like something I overlooked.
There was a problem hiding this comment.
Should be fixed in bd406bb, also there was another issue with errors being ascribed to the wrong source file (the grammar.y rather than the input.txt).
|
So in bd406bb I both moved these out of main, and reduced the number of args by moving them into their own type. |
|
I think we're ready to squash + merge. Please squash! |
This will be used in the next commit to implement testing of by passing input source text to an RTParserBuilder, during the CTParserBuilder grammar build phase.
… `CTParserBuilder. This allows quoted string values to be specified within the %grmtools section, A `test_files` value using it and implements `CTParserBuilder` for that key.
|
Squashed, it isn't up to date with master, but it should merge cleanly afaict. |
|
So I marked this as closing #555 mostly because I think it is as close to that as I can figure out for now. I think this is probably everything I set out to do for my recent batch of commits, and then some. |
|
Thanks for all this -- it's much appreciated! As and when you find the next things to do, it'll be good to keep things moving :) |
This is my attempt to experiment with #555, my original thoughts here were a bit more involved than what this patch implements. I.e. making a callback
CTLexerBuildertoo which would give callers aLexerDefand anRTParserBuilder, and their respective&mut Headerobjects, so callers could add their own keys intoHeader.None of that panned out because I wasn't able to make the ownership work, it would require a nested callback such that the LRLex one would call the user provided one, from within the
CTParserBuilder.inspect_rtcallback.There was just no way to do that without moving the
Headervalues, such that we could pass&mut Header.I ended up trying to see, if we could still use
RTParserBuilderwithinCTLexerBuilder, but just not exposing this to the user -- This seemed as much as I could achieve while avoiding the need for nested callbacks.This passes the testsuite but only because none of the
%grmtoolssections have atest_fileskey, it currentlydoesn't have any appropriate value such that we can build the test file reading/globbing. So it just passes an empty string, which not all parsers are capable of parsing -- but none of them have a
test_fileskey, so it doesn't matter.As such, this is a very much an incomplete draft, but maybe there is enough here to know if we want to continue working on it, or abandon the effort.
The next steps would be:
Value::Stringor to put a glob into, and"test_files"reading. Done in ca96adbtest_filessupport/mode fornimbleparse, so you could just pass it lex/yacc files and noi input, and have it use the embedded globs to try and find input.But it definitely does not seem like
implement nimbleparse in a way that uses a unified ct/rt parser builderis really in the cards though.Edit:
I guess it is also worth noting that this is essentially doing something very much like
lang_tester, but not quite as smart given there is no mechanism to specify expected output or anything.