Skip to content

Commit

Permalink
Fix a problem that parse exception message can't be generated for inv…
Browse files Browse the repository at this point in the history
…alid encoding XML (#123)

## Why?

If the XML tag contains Unicode characters and an error is occurred for
the tag, an incompatible encoding error is raised. Because our parse
exception message parts have an UTF-8 part (that includes the target tag
information) and an ASCII-8BIT part (that includes error context input).

Fix GH-29

Reported by DuKewu. Thanks!!!
  • Loading branch information
naitoh committed May 3, 2024
1 parent 06be5cf commit d78118d
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 0 deletions.
1 change: 1 addition & 0 deletions lib/rexml/parseexception.rb
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ def to_s
err << "\nLine: #{line}\n"
err << "Position: #{position}\n"
err << "Last 80 unconsumed characters:\n"
err.force_encoding("ASCII-8BIT")
err << @source.buffer[0..80].force_encoding("ASCII-8BIT").gsub(/\n/, ' ')
end

Expand Down
13 changes: 13 additions & 0 deletions test/parse/test_element.rb
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,19 @@ def test_empty_namespace_attribute_name
DETAIL
end

def test_empty_namespace_attribute_name_with_utf8_character
exception = assert_raise(REXML::ParseException) do
parse("<x :\xE2\x80\x8B>") # U+200B ZERO WIDTH SPACE
end
assert_equal(<<-DETAIL.chomp.force_encoding("ASCII-8BIT"), exception.to_s)
Invalid attribute name: <:\xE2\x80\x8B>
Line: 1
Position: 8
Last 80 unconsumed characters:
:\xE2\x80\x8B>
DETAIL
end

def test_garbage_less_than_before_root_element_at_line_start
exception = assert_raise(REXML::ParseException) do
parse("<\n<x/>")
Expand Down

0 comments on commit d78118d

Please sign in to comment.