Improve compatibility with MRI's Ripper #4898
After having discovered #4882 and then discovering that (block-)local variable references didn't work, it seemed like the whole process would be quicker if I looked for the solution myself, rather than wait for someone else to solve one problem before I could discover the next one.
And once I had figured out roughly how the code worked, I spent a little time to try to get all the MRI specs to pass. Apart from the tests that rely on the MRI directory layout, the only remaining issue is that JRuby's Ripper doesn't validate regular expressions during parsing. I looked into it briefly, but given that the Ripper parser effectively doesn't have access to a parse tree (as the
In addition to the MRI tests, this adds
Given that this is essentially a collection of a whole bunch of small fixes, I refer to the individual commit messages for additional information, instead of trying to summarize the changes here.
First, this ignores the entire file that relies on the MRI file structure, as it seems unlikely that any of those tests will ever be applicable. Second, it adds the test_sexp.rb file which was absent from the index for some unknown reason. It also introduces a couple of excludes for failing tests in that file. Third, it clears the excludes for a couple of tests that are currently passing, and updates the description for one of the excludes.
The previous implementation called assignable on a bunch of things that were unlikely to be strings, while only supporting string arguments. This changes the logic so that it relies on getting the identifier value from the lexer instead. In addition, this inlines the logic from assignable in all cases where the implementation is known, and splits the cases for identifiers and constants, as they have very little to do with each other. Given that the parser has inlined the definition of MRI's user_varaible and keyword variable, it seems to make sense to also inline the assignable actions. While inlining the assignable implementation, it seamed to make sense to also inline is_id_var (and remove its unused argument).
Instead of ending the heredoc before emitting the final string content part, this instead just emits the final string content part and positions the cursor right before the end marker, such that the next call to the heredoc term should end the heredoc using the early return. The motivation for doing this was that with ripper the final string content of a heredoc wouldn't be dispatched to any token handler, but instead just propagated. The old behavior was clearly wrong for this corner-case, but this approach might have other problems.
As an added bonus, the parsing of magic comment key values no longer stops at the first unrecognized key, and thus writing # -*- warn_past_scope: true; coding: ISO-8859-1 -*- will now apply the encoding even if JRuby doesn't support the magic comment attribute warn_past_scope.
@grddev This is fantastic.
If you have interest in 9.1.x supporting these changes then you might maybe be up for the challenge of opening up another PR for jruby-9.1 branch so our current 9.1.x can get these changes. Note though that a few of these commits will not cherry-pick over (notably RipperParser.y) since ruby 2.4 added new productions 2.3 does not have like arg_rhs (ah also master changed Tokens to use keyword_XXX).
If you don't want to do that work I can try cp'ing as much as is reasonable and then fix the rest.
Another test case I frequently use is Yard although we pass their specs so as you can see ripper coverage is not fantastic.
For regexp checking did you see ParserSupport.regexpFragmentCheck? I have not spent any time looking at this but that method will check syntax but only require a ByteList. For named captures as lvars perhaps that is more involved than that?
I did see that, but the main part that the fragment validation does is ensure that the encoding is valid in each fragment, which isn't really possible to do before seeing the flags for the regexp, which in turn would require the Parser/Lexer to keep track of the regular expression fragments until getting to the options. Once the fragments are recorded, it would also be relatively simple to perform full validation/compilation of the regexp if the expression is static.
I didn't even think about variable captures, and haven't looked into how MRI Ripper deals with that, so that might require some investigation as well.
Its definitely solvable, but I believe it would be complicated enough to warrant its own PR (and I'm also not sure I have the time to look into it right now).