Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Browse files

Parse unicode characters above \uFFFF

The regular expression matching identifiers was incomplete for unicode
characters.  Now ๐–’ can be parsed in an identifier.

Ruby Bug #7524
  • Loading branch information...
commit 27370844e3b93d5683ed505f3365cbcd0646ab70 1 parent b4d48f6
@drbrain drbrain authored
Showing with 11 additions and 1 deletion.
  1. +1 โˆ’1  lib/rdoc/ruby_lex.rb
  2. +10 โˆ’0 test/test_rdoc_ruby_lex.rb
View
2  lib/rdoc/ruby_lex.rb
@@ -857,7 +857,7 @@ def identify_gvar
end
IDENT_RE = if defined? Encoding then
- /[\w\u0080-\uFFFF]/u
+ eval '/[\w\u{0080}-\u{FFFFF}]/u' # 1.8 can't parse \u{}
else
/[\w\x80-\xFF]/
end
View
10 test/test_rdoc_ruby_lex.rb
@@ -1,3 +1,5 @@
+# coding: UTF-8
+
require 'rdoc/test_case'
class TestRDocRubyLex < RDoc::TestCase
@@ -133,6 +135,14 @@ def test_class_tokenize_heredoc_percent_N
assert_equal expected, tokens
end
+ def test_class_tokenize_identifier_high_unicode
+ tokens = RDoc::RubyLex.tokenize '๐–’', nil
+
+ expected = @TK::TkIDENTIFIER.new(0, 1, 0, '๐–’')
+
+ assert_equal expected, tokens.first
+ end
+
def test_class_tokenize_percent_1
tokens = RDoc::RubyLex.tokenize 'v%10==10', nil
Please sign in to comment.
Something went wrong with that request. Please try again.