Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CTBuilder options for setting behavior of regex. #400

Closed
wants to merge 1 commit into from

Conversation

ratmice
Copy link
Collaborator

@ratmice ratmice commented Mar 12, 2023

This is a patch that I discussed in #399
It is still missing some odds and ends, e.g. it only currently works with CTLexerBuilder and is missing support within the runtime builders. But it seemed a worthwhile place to stop and ponder if this is a viable approach. I tried to do it using a mechanism that allows us to avoid adding a new builder function for each regex setting, It seemed at least worth trying however i'm not sure how I feel about how the actual calls to RegexBuilder now look.

This isn't something that I really need, or seems to be blocking anything to my knowledge, so that alone should maybe induce some skepticism about the idea.

  • works with CTLexerBuilder
  • runtime builders?
  • Should it use the quote::ToTokens trait instead of Debug?

@ltratt
Copy link
Member

ltratt commented Mar 12, 2023

Like you, I think, I'm not 100% sure what the right approach here is. One thing that occurs to me looking at the code is that the enum variants are one more thing for people to remember, because they have to map those over to the regex crate functions (https://docs.rs/regex/1.7.1/regex/struct.RegexBuilder.html). Which makes me wonder: should we just expose function names that are identical to those in RegexBuilder (e.g. pub fn case_insensitive(&mut self, yes: bool) -> ...) ? Warning: this might be a terrible idea!

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 12, 2023

Yeah, I guess the main benefit (which I entirely forgot to implement) of the approach that I took is that we can implement Serialize/Deserialize for the enum which can be helpful if I would expose these options in nimbleparse_lsp, so I think that is the underlying reason I tried that approach first. I can have a look at the normal builder approach tomorrow though.

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 13, 2023

While i've implemented some cleaned up code as per above. I ran into some issues that I hadn't taken into consideration.
In particular RegexBuilder being called twice, first during parsing of the .l file, then again within the generated code.
While it is easy to modify the generated code to use the builder options. It is much much difficult to implement the parsing aspects since they stem from the trait item LexDef::from_str a trait fn.

At the moment don't really see a way to implement the parsing aspects without changing the trait to accept a associated type for options along with the lexer sources. Given the current lack of a motivating use case I'm not sure it's worth breaking compatibility over?

The one thing I guess I note that seems noteworthy is that the call here: https://github.com/softdevteam/grmtools/blob/master/lrlex/src/lib/ctbuilder.rs#L329 is using a concrete type instance rather than e.g. a trait object, so perhaps we could provide a instance like LRNonStreamingLexerDef::from_str_with_options().

I'll have to have a think on those aspects a bit more, but it seems difficult to pull off without breaking semver compatibility.

Edit: I probably didn't do a good job of explaining the issue, e.g. octal setting changes the regex format so we need these options to match at both parse time and code gen time, while options like dot_matches_new_line doesn't change the format but only the interpretation.

@ltratt
Copy link
Member

ltratt commented Mar 13, 2023

Hmm, I hadn't thought about that either!

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 13, 2023

FWIW I put the attempt at using normal builder style here: https://github.com/ratmice/grmtools/tree/regex_options_take_2

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 14, 2023

Think i'll just close this one for now.

@ratmice ratmice closed this Mar 14, 2023
@ltratt
Copy link
Member

ltratt commented Mar 14, 2023

FWIW I put the attempt at using normal builder style here: https://github.com/ratmice/grmtools/tree/regex_options_take_2

Do you think this route can work? I'm not quite sure if it suffers from the problem you mentioned above or not.

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 14, 2023

Yeah, it still does suffer from that problem. The main missing piece of it is that there is no way to call the LexParser::new_with_regex_options function I added passing in options from CTLexerBuilder with regex_options being routed through LexerDef::from_str or LRNonStreamingLexerDef::from_str. So it's mostly just public API decisions. We can probably keep from_str letting it use the default options. But i'm mostly unsure what to do with from_str being a trait function.

@ltratt
Copy link
Member

ltratt commented Mar 14, 2023

Hmm, yes, I don't have an immediately good answer to that :/

@ratmice
Copy link
Collaborator Author

ratmice commented Mar 14, 2023

Yeah, i'm pretty hesitant to make such API changes without a real motivating use case, so I don't think there is any immediate need. But I do plan on tinkering with a few potential things to see if I can come up with anything that seems worthwhile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants