Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Add support for validating Regexp in Ripper #4902
This represents a stab at implementing the remaining Ripper compatibility as mentioned in #4898.
This uses the fact that Regexp tokenization is handled by a single StringTerm, and thus all tSTRING_CONTENT fragments are easily collectable until the tREGEXP_END comes with the options that we need for validation.
The validation itself is a copied/simplified version of what is performed by the main parser, as large parts the validation depended on the AST structure, which we do not have here.
Technically, this doesn't perform the validation at the same point in time as the main parser, as it performs the validation when encountering the tREGEXP_END token rather than when processing the regexp rule.
I speculate that the difference doesn't really matter given that the only thing we could do with the tREGEXP_END token is to apply the regexp rule.
In order to reduce copy-paste between the main parser and Ripper, I opted to shift some Regexp-related code into the Lexer. In theory, the code doesn't belong in the lexer, but putting it in the Lexer has some benefits. First, it is a component that is shared in a reasonable way between the two parsers. Second, it is essentially required by the proposed implementation, as the new validation takes place effectively inside the Lexer.
Unsurprisingly, it turns out that the coverage for Ripper parsing of Regexp isn't very extensive, and I haven't had time to put the code through any additional tests.
Forgot to mention, but this does nothing to address local variables as mentioned in #4898, but some quick testing seemed to indicate that MRI didn't handle this either (as the below output has a vcall rather than a var_ref towards the end.
Really I just want a comment on that null fragment if since it is not obvious what scenario that happens in.