Skip to content

Commit 8cdec80

Browse files
committed
Add parser translation
1 parent d69b9da commit 8cdec80

File tree

14 files changed

+2564
-11
lines changed

14 files changed

+2564
-11
lines changed

Gemfile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,4 @@ group :memcheck do
1212
gem "ruby_memcheck", platform: %i[mri mswin mingw x64_mingw]
1313
end
1414
gem "rbs", platform: %i[mri mswin mingw x64_mingw]
15+
gem "parser"

Gemfile.lock

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,19 +7,24 @@ GEM
77
remote: https://rubygems.org/
88
specs:
99
abbrev (0.1.2)
10-
ffi (1.15.5)
10+
ast (2.4.2)
11+
ffi (1.16.3)
1112
mini_portile2 (2.8.5)
12-
nokogiri (1.15.4)
13+
nokogiri (1.16.0)
1314
mini_portile2 (~> 2.8.2)
1415
racc (~> 1.4)
16+
parser (3.3.0.5)
17+
ast (~> 2.4.1)
18+
racc
1519
power_assert (2.0.3)
1620
racc (1.7.3)
17-
rake (13.0.6)
18-
rake-compiler (1.2.5)
21+
racc (1.7.3-java)
22+
rake (13.1.0)
23+
rake-compiler (1.2.6)
1924
rake
20-
rbs (3.4.2)
25+
rbs (3.4.3)
2126
abbrev
22-
ruby_memcheck (2.2.1)
27+
ruby_memcheck (2.3.0)
2328
nokogiri
2429
test-unit (3.6.1)
2530
power_assert
@@ -31,6 +36,7 @@ PLATFORMS
3136

3237
DEPENDENCIES
3338
ffi
39+
parser
3440
prism!
3541
rake
3642
rake-compiler

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,7 @@ See the [CONTRIBUTING.md](CONTRIBUTING.md) file for more information. We additio
8888
* [JavaScript](docs/javascript.md)
8989
* [Local variable depth](docs/local_variable_depth.md)
9090
* [Mapping](docs/mapping.md)
91+
* [Parser translation](docs/parser_translation.md)
9192
* [Parsing rules](docs/parsing_rules.md)
9293
* [Releasing](docs/releasing.md)
9394
* [Ripper](docs/ripper.md)

docs/parser_translation.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# parser translation
2+
3+
Prism ships with the ability to translate its syntax tree into the syntax tree used by the [whitequark/parser](https://github.com/whitequark/parser) gem. This allows you to use tools built on top of the `parser` gem with the `prism` parser.
4+
5+
## Usage
6+
7+
The `parser` gem provides multiple parsers to support different versions of the Ruby grammar. This includes all of the Ruby versions going back to 1.8, as well as third-party parsers like MacRuby and RubyMotion. The `prism` gem provides another parser that uses the `prism` parser to build the syntax tree.
8+
9+
You can use the `prism` parser like you would any other. After requiring the parser, you should be able to call any of the regular `Parser::Base` APIs that you would normally use.
10+
11+
```ruby
12+
require "prism/translation/parser"
13+
14+
Prism::Translation::Parser.parse_file("path/to/file.rb")
15+
```
16+
17+
### RuboCop
18+
19+
To run RuboCop using the `prism` gem as the parser, you will need to require the `prism/translation/parser/rubocop` file. This file injects `prism` into the known options for both `rubocop` and `rubocop-ast`, such that you can specify it in your `.rubocop.yml` file. Unfortunately `rubocop` doesn't support any direct way to do this, so we have to get a bit hacky.
20+
21+
First, set the `TargetRubyVersion` in your RuboCop configuration file to `80_82_73_83_77.33`. This is the version of Ruby that `prism` reports itself as. (The leading numbers are the ASCII values for `PRISM`.)
22+
23+
```yaml
24+
AllCops:
25+
TargetRubyVersion: 80_82_73_83_77.33
26+
```
27+
28+
Now when you run `rubocop` you will need to require the `prism/translation/parser/rubocop` file before executing so that it can inject the `prism` parser into the known options.
29+
30+
```
31+
bundle exec ruby -rprism/translation/parser/rubocop $(bundle exec which rubocop)
32+
```
33+
34+
This should run RuboCop using the `prism` parser.

lib/prism.rb

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ module Prism
2626
autoload :Pack, "prism/pack"
2727
autoload :Pattern, "prism/pattern"
2828
autoload :Serialize, "prism/serialize"
29+
autoload :Translation, "prism/translation"
2930
autoload :Visitor, "prism/visitor"
3031

3132
# Some of these constants are not meant to be exposed, so marking them as

lib/prism/node_ext.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ def value
8181
class RationalNode < Node
8282
# Returns the value of the node as a Ruby Rational.
8383
def value
84-
Rational(numeric.is_a?(IntegerNode) && !numeric.decimal? ? numeric.value : slice.chomp("r"))
84+
Rational(numeric.is_a?(IntegerNode) ? numeric.value : slice.chomp("r"))
8585
end
8686
end
8787

lib/prism/translation.rb

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# frozen_string_literal: true
2+
3+
module Prism
4+
# This module is responsible for converting the prism syntax tree into other
5+
# syntax trees. At the moment it only supports converting to the
6+
# whitequark/parser gem's syntax tree, but support is planned for the
7+
# seattlerb/ruby_parser gem's syntax tree as well.
8+
module Translation
9+
autoload :Parser, "prism/translation/parser"
10+
end
11+
end

lib/prism/translation/parser.rb

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
# frozen_string_literal: true
2+
3+
require "parser"
4+
5+
module Prism
6+
module Translation
7+
# This class is the entry-point for converting a prism syntax tree into the
8+
# whitequark/parser gem's syntax tree. It inherits from the base parser for
9+
# the parser gem, and overrides the parse* methods to parse with prism and
10+
# then translate.
11+
class Parser < ::Parser::Base
12+
Racc_debug_parser = false # :nodoc:
13+
14+
def version # :nodoc:
15+
33
16+
end
17+
18+
# The default encoding for Ruby files is UTF-8.
19+
def default_encoding
20+
Encoding::UTF_8
21+
end
22+
23+
def yyerror # :nodoc:
24+
end
25+
26+
# Parses a source buffer and returns the AST.
27+
def parse(source_buffer)
28+
@source_buffer = source_buffer
29+
source = source_buffer.source
30+
31+
build_ast(
32+
Prism.parse(source, filepath: source_buffer.name).value,
33+
build_offset_cache(source)
34+
)
35+
ensure
36+
@source_buffer = nil
37+
end
38+
39+
# Parses a source buffer and returns the AST and the source code comments.
40+
def parse_with_comments(source_buffer)
41+
@source_buffer = source_buffer
42+
source = source_buffer.source
43+
44+
offset_cache = build_offset_cache(source)
45+
result = Prism.parse(source, filepath: source_buffer.name)
46+
47+
[
48+
build_ast(result.value, offset_cache),
49+
build_comments(result.comments, offset_cache)
50+
]
51+
ensure
52+
@source_buffer = nil
53+
end
54+
55+
# Parses a source buffer and returns the AST, the source code comments,
56+
# and the tokens emitted by the lexer.
57+
def tokenize(source_buffer, _recover = false)
58+
@source_buffer = source_buffer
59+
source = source_buffer.source
60+
61+
offset_cache = build_offset_cache(source)
62+
result = Prism.parse_lex(source, filepath: source_buffer.name)
63+
program, tokens = result.value
64+
65+
[
66+
build_ast(program, offset_cache),
67+
build_comments(result.comments, offset_cache),
68+
build_tokens(tokens, offset_cache)
69+
]
70+
ensure
71+
@source_buffer = nil
72+
end
73+
74+
# Since prism resolves num params for us, we don't need to support this
75+
# kind of logic here.
76+
def try_declare_numparam(node)
77+
node.children[0].match?(/\A_[1-9]\z/)
78+
end
79+
80+
private
81+
82+
# Prism deals with offsets in bytes, while the parser gem deals with
83+
# offsets in characters. We need to handle this conversion in order to
84+
# build the parser gem AST.
85+
#
86+
# If the bytesize of the source is the same as the length, then we can
87+
# just use the offset directly. Otherwise, we build a hash that functions
88+
# as a cache for the conversion.
89+
#
90+
# This is a good opportunity for some optimizations. If the source file
91+
# has any multi-byte characters, this can tank the performance of the
92+
# translator. We could make this significantly faster by using a
93+
# different data structure for the cache.
94+
def build_offset_cache(source)
95+
if source.bytesize == source.length
96+
-> (offset) { offset }
97+
else
98+
Hash.new do |hash, offset|
99+
hash[offset] = source.byteslice(0, offset).length
100+
end
101+
end
102+
end
103+
104+
# Build the parser gem AST from the prism AST.
105+
def build_ast(program, offset_cache)
106+
program.accept(Compiler.new(self, offset_cache))
107+
end
108+
109+
# Build the parser gem comments from the prism comments.
110+
def build_comments(comments, offset_cache)
111+
comments.map do |comment|
112+
location = comment.location
113+
114+
::Parser::Source::Comment.new(
115+
::Parser::Source::Range.new(
116+
source_buffer,
117+
offset_cache[location.start_offset],
118+
offset_cache[location.end_offset]
119+
)
120+
)
121+
end
122+
end
123+
124+
# Build the parser gem tokens from the prism tokens.
125+
def build_tokens(tokens, offset_cache)
126+
Lexer.new(source_buffer, tokens.map(&:first), offset_cache).to_a
127+
end
128+
129+
require_relative "parser/compiler"
130+
require_relative "parser/lexer"
131+
132+
private_constant :Compiler
133+
private_constant :Lexer
134+
end
135+
end
136+
end

0 commit comments

Comments
 (0)