Skip to content

Commit e50754f

Browse files
vinistockkddnewton
authored andcommitted
[ruby/prism] Avoid breaking code units offset on binary encoding
ruby/prism@25a4cf6794 Co-authored-by: Kevin Newton <kddnewton@users.noreply.github.com>
1 parent 615a087 commit e50754f

File tree

2 files changed

+20
-1
lines changed

2 files changed

+20
-1
lines changed

lib/prism/parse_result.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ def character_column(byte_offset)
9090
# concept of code units that differs from the number of characters in other
9191
# encodings, it is not captured here.
9292
def code_units_offset(byte_offset, encoding)
93-
byteslice = (source.byteslice(0, byte_offset) or raise).encode(encoding)
93+
byteslice = (source.byteslice(0, byte_offset) or raise).encode(encoding, invalid: :replace, undef: :replace)
9494

9595
if encoding == Encoding::UTF_16LE || encoding == Encoding::UTF_16BE
9696
byteslice.bytesize / 2

test/prism/ruby/location_test.rb

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,25 @@ def test_code_units
140140
assert_equal 7, location.end_code_units_column(Encoding::UTF_32LE)
141141
end
142142

143+
def test_code_units_handles_binary_encoding_with_multibyte_characters
144+
# If the encoding is set to binary and the source contains multibyte
145+
# characters, we avoid breaking the code unit offsets, but they will
146+
# still be incorrect.
147+
148+
program = Prism.parse(<<~RUBY).value
149+
# -*- encoding: binary -*-
150+
151+
😀 + 😀
152+
RUBY
153+
154+
# first 😀
155+
location = program.statements.body.first.receiver.location
156+
157+
assert_equal 4, location.end_code_units_column(Encoding::UTF_8)
158+
assert_equal 4, location.end_code_units_column(Encoding::UTF_16LE)
159+
assert_equal 4, location.end_code_units_column(Encoding::UTF_32LE)
160+
end
161+
143162
def test_chop
144163
location = Prism.parse("foo").value.location
145164

0 commit comments

Comments
 (0)