Skip to content

Commit

Permalink
Parse unicode characters above \uFFFF
Browse files Browse the repository at this point in the history
The regular expression matching identifiers was incomplete for unicode
characters.  Now 饾枓 can be parsed in an identifier.

Ruby Bug #7524
  • Loading branch information
drbrain committed Feb 24, 2013
1 parent 5544853 commit 78ef23e
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 1 deletion.
2 changes: 1 addition & 1 deletion lib/rdoc/ruby_lex.rb
Original file line number Diff line number Diff line change
Expand Up @@ -857,7 +857,7 @@ def identify_gvar
end

IDENT_RE = if defined? Encoding then
/[\w\u0080-\uFFFF]/u
eval '/[\w\u{0080}-\u{FFFFF}]/u' # 1.8 can't parse \u{}
else
/[\w\x80-\xFF]/
end
Expand Down
10 changes: 10 additions & 0 deletions test/test_rdoc_ruby_lex.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# coding: UTF-8

require 'rdoc/test_case'

class TestRDocRubyLex < RDoc::TestCase
Expand Down Expand Up @@ -133,6 +135,14 @@ def test_class_tokenize_heredoc_percent_N
assert_equal expected, tokens
end

def test_class_tokenize_identifier_high_unicode
tokens = RDoc::RubyLex.tokenize '饾枓', nil

expected = @TK::TkIDENTIFIER.new(0, 1, 0, '饾枓')

assert_equal expected, tokens.first
end

def test_class_tokenize_percent_1
tokens = RDoc::RubyLex.tokenize 'v%10==10', nil

Expand Down

0 comments on commit 78ef23e

Please sign in to comment.