Fix kDO_LAMBDA token incompatibility for Prism::Translation::Parser::Lexer

koic · koic · commit 2ee480654c8b · 2024-09-17T17:02:16.000+09:00
## Summary

This PR fixes `kDO_LAMBDA` token incompatibility between Parser gem and `Prism::Translation::Parser` for lambda `do` block.

### Parser gem (Expected)

Returns `kDO_LAMBDA` token:

```console
$ bundle exec ruby -Ilib -rparser/ruby33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "-&gt; do end"; p Parser::Ruby33.new.tokenize(buf)[2]'
ruby 3.4.0dev (2024-09-01T11:00:13Z master eb144ef91e) [x86_64-darwin23]
[[:tLAMBDA, ["-&gt;", #&lt;Parser::Source::Range example.rb 0...2&gt;]], [:kDO_LAMBDA, ["do", #&lt;Parser::Source::Range example.rb 3...5&gt;]],
[:kEND, ["end", #&lt;Parser::Source::Range example.rb 6...9&gt;]]]
```

### `Prism::Translation::Parser` (Actual)

Previously, the parser returned `kDO` token when parsing the following:

```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "-&gt; do end"; p Prism::Translation::Parser33.new.tokenize(buf)[2]'
ruby 3.4.0dev (2024-09-01T11:00:13Z master eb144ef91e) [x86_64-darwin23]
[[:tLAMBDA, ["-&gt;", #&lt;Parser::Source::Range example.rb 0...2&gt;]], [:kDO, ["do", #&lt;Parser::Source::Range example.rb 3...5&gt;]],
[:kEND, ["end", #&lt;Parser::Source::Range example.rb 6...9&gt;]]]
```

After the update, the parser now returns `kDO_LAMBDA` token for the same input:

```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "-&gt; do end"; p Prism::Translation::Parser33.new.tokenize(buf)[2]'
ruby 3.4.0dev (2024-09-01T11:00:13Z master eb144ef91e) [x86_64-darwin23]
[[:tLAMBDA, ["-&gt;", #&lt;Parser::Source::Range example.rb 0...2&gt;]], [:kDO_LAMBDA, ["do", #&lt;Parser::Source::Range example.rb 3...5&gt;]],
[:kEND, ["end", #&lt;Parser::Source::Range example.rb 6...9&gt;]]]
```

## Additional Information

Unfortunately, this kind of edge case doesn't work as expected; `kDO` is returned instead of `kDO_LAMBDA`.
However, since `kDO` is already being returned in this case, there is no change in behavior.

### Parser gem

Returns `tLAMBDA` token:

```console
$ bundle exec ruby -Ilib -rparser/ruby33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "-&gt; (foo = -&gt; (bar) {}) do end"; p Parser::Ruby33.new.tokenize(buf)[2]'
ruby 3.3.5 (2024-09-03 revision ef084cc8f4) [x86_64-darwin23]
[[:tLAMBDA, ["-&gt;", #&lt;Parser::Source::Range example.rb 0...2&gt;]], [:tLPAREN2, ["(", #&lt;Parser::Source::Range example.rb 3...4&gt;]],
[:tIDENTIFIER, ["foo", #&lt;Parser::Source::Range example.rb 4...7&gt;]], [:tEQL, ["=", #&lt;Parser::Source::Range example.rb 8...9&gt;]],
[:tLAMBDA, ["-&gt;", #&lt;Parser::Source::Range example.rb 10...12&gt;]], [:tLPAREN2, ["(", #&lt;Parser::Source::Range example.rb 13...14&gt;]],
[:tIDENTIFIER, ["bar", #&lt;Parser::Source::Range example.rb 14...17&gt;]], [:tRPAREN, [")", #&lt;Parser::Source::Range example.rb 17...18&gt;]],
[:tLAMBEG, ["{", #&lt;Parser::Source::Range example.rb 19...20&gt;]], [:tRCURLY, ["}", #&lt;Parser::Source::Range example.rb 20...21&gt;]],
[:tRPAREN, [")", #&lt;Parser::Source::Range example.rb 21...22&gt;]], [:kDO_LAMBDA, ["do", #&lt;Parser::Source::Range example.rb 23...25&gt;]],
[:kEND, ["end", #&lt;Parser::Source::Range example.rb 26...29&gt;]]]
```

### `Prism::Translation::Parser`

Returns `kDO` token:

```console
$ bundle exec ruby -Ilib -rprism -rprism/translation/parser33 -ve \
'buf = Parser::Source::Buffer.new("example.rb"); buf.source = "-&gt; (foo = -&gt; (bar) {}) do end"; p Prism::Translation::Parser33.new.tokenize(buf)[2]'
ruby 3.3.5 (2024-09-03 revision ef084cc8f4) [x86_64-darwin23]
[[:tLAMBDA, ["-&gt;", #&lt;Parser::Source::Range example.rb 0...2&gt;]], [:tLPAREN2, ["(", #&lt;Parser::Source::Range example.rb 3...4&gt;]],
[:tIDENTIFIER, ["foo", #&lt;Parser::Source::Range example.rb 4...7&gt;]], [:tEQL, ["=", #&lt;Parser::Source::Range example.rb 8...9&gt;]],
[:tLAMBDA, ["-&gt;", #&lt;Parser::Source::Range example.rb 10...12&gt;]], [:tLPAREN2, ["(", #&lt;Parser::Source::Range example.rb 13...14&gt;]],
[:tIDENTIFIER, ["bar", #&lt;Parser::Source::Range example.rb 14...17&gt;]], [:tRPAREN, [")", #&lt;Parser::Source::Range example.rb 17...18&gt;]],
[:tLAMBEG, ["{", #&lt;Parser::Source::Range example.rb 19...20&gt;]], [:tRCURLY, ["}", #&lt;Parser::Source::Range example.rb 20...21&gt;]],
[:tRPAREN, [")", #&lt;Parser::Source::Range example.rb 21...22&gt;]], [:kDO, ["do", #&lt;Parser::Source::Range example.rb 23...25&gt;]],
[:kEND, ["end", #&lt;Parser::Source::Range example.rb 26...29&gt;]]]
```

As the intention is not to address such special cases at this point, a comment has been left indicating that this case still returns `kDO`.
In other words, `kDO_LAMBDA` will now be returned except for edge cases after this PR.
diff --git a/lib/prism/translation/parser/lexer.rb b/lib/prism/translation/parser/lexer.rb
@@ -187,14 +187,20 @@ class Lexer
         EXPR_BEG = 0x1 # :nodoc:
         EXPR_LABEL = 0x400 # :nodoc:
 
+        # It is used to determine whether `do` is of the token type `kDO` or `kDO_LAMBDA`.
+        #
+        # NOTE: In edge cases like `-> (foo = -> (bar) {}) do end`, please note that `kDO` is still returned
+        # instead of `kDO_LAMBDA`, which is expected: https://github.com/ruby/prism/pull/3046
+        LAMBDA_TOKEN_TYPES = [:kDO_LAMBDA, :tLAMBDA, :tLAMBEG]
+
         # The `PARENTHESIS_LEFT` token in Prism is classified as either `tLPAREN` or `tLPAREN2` in the Parser gem.
         # The following token types are listed as those classified as `tLPAREN`.
         LPAREN_CONVERSION_TOKEN_TYPES = [
           :kBREAK, :kCASE, :tDIVIDE, :kFOR, :kIF, :kNEXT, :kRETURN, :kUNTIL, :kWHILE, :tAMPER, :tANDOP, :tBANG, :tCOMMA, :tDOT2, :tDOT3,
           :tEQL, :tLPAREN, :tLPAREN2, :tLSHFT, :tNL, :tOP_ASGN, :tOROP, :tPIPE, :tSEMI, :tSTRING_DBEG, :tUMINUS, :tUPLUS
         ]
 
-        private_constant :TYPES, :EXPR_BEG, :EXPR_LABEL, :LPAREN_CONVERSION_TOKEN_TYPES
+        private_constant :TYPES, :EXPR_BEG, :EXPR_LABEL, :LAMBDA_TOKEN_TYPES, :LPAREN_CONVERSION_TOKEN_TYPES
 
         # The Parser::Source::Buffer that the tokens were lexed from.
         attr_reader :source_buffer
@@ -236,6 +242,13 @@ def to_a
             location = Range.new(source_buffer, offset_cache[token.location.start_offset], offset_cache[token.location.end_offset])
 
             case type
+            when :kDO
+              types = tokens.map(&:first)
+              nearest_lambda_token_type = types.reverse.find { |type| LAMBDA_TOKEN_TYPES.include?(type) }
+
+              if nearest_lambda_token_type == :tLAMBDA
+                type = :kDO_LAMBDA
+              end
             when :tCHARACTER
               value.delete_prefix!("?")
             when :tCOMMENT
diff --git a/test/prism/ruby/parser_test.rb b/test/prism/ruby/parser_test.rb
@@ -268,7 +268,7 @@ def assert_equal_tokens(expected_tokens, actual_tokens)
           # There are a lot of tokens that have very specific meaning according
           # to the context of the parser. We don't expose that information in
           # prism, so we need to normalize these tokens a bit.
-          if actual_token[0] == :kDO && %i[kDO_BLOCK kDO_LAMBDA].include?(expected_token[0])
+          if expected_token[0] == :kDO_BLOCK && actual_token[0] == :kDO
             actual_token[0] = expected_token[0]
           end