Skip to content

Commit

Permalink
Update Unicode data to 13.0.0.
Browse files Browse the repository at this point in the history
Change-Id: Iff57514e09d6e4e141384dd0cf138314eb1435f1
Reviewed-on: https://code-review.googlesource.com/c/re2/+/53330
Reviewed-by: Paul Wankadia <junyer@google.com>
  • Loading branch information
junyer committed Mar 15, 2020
1 parent 2695ecf commit ca93436
Show file tree
Hide file tree
Showing 5 changed files with 251 additions and 132 deletions.
4 changes: 4 additions & 0 deletions doc/syntax.html
Original file line number Diff line number Diff line change
Expand Up @@ -264,13 +264,15 @@ <h1>RE2 regular expression syntax reference</h1>
<tr><td colspan=2>Chakma</td></tr>
<tr><td colspan=2>Cham</td></tr>
<tr><td colspan=2>Cherokee</td></tr>
<tr><td colspan=2>Chorasmian</td></tr>
<tr><td colspan=2>Common</td></tr>
<tr><td colspan=2>Coptic</td></tr>
<tr><td colspan=2>Cuneiform</td></tr>
<tr><td colspan=2>Cypriot</td></tr>
<tr><td colspan=2>Cyrillic</td></tr>
<tr><td colspan=2>Deseret</td></tr>
<tr><td colspan=2>Devanagari</td></tr>
<tr><td colspan=2>Dives_Akuru</td></tr>
<tr><td colspan=2>Dogra</td></tr>
<tr><td colspan=2>Duployan</td></tr>
<tr><td colspan=2>Egyptian_Hieroglyphs</td></tr>
Expand Down Expand Up @@ -302,6 +304,7 @@ <h1>RE2 regular expression syntax reference</h1>
<tr><td colspan=2>Katakana</td></tr>
<tr><td colspan=2>Kayah_Li</td></tr>
<tr><td colspan=2>Kharoshthi</td></tr>
<tr><td colspan=2>Khitan_Small_Script</td></tr>
<tr><td colspan=2>Khmer</td></tr>
<tr><td colspan=2>Khojki</td></tr>
<tr><td colspan=2>Khudawadi</td></tr>
Expand Down Expand Up @@ -391,6 +394,7 @@ <h1>RE2 regular expression syntax reference</h1>
<tr><td colspan=2>Vai</td></tr>
<tr><td colspan=2>Wancho</td></tr>
<tr><td colspan=2>Warang_Citi</td></tr>
<tr><td colspan=2>Yezidi</td></tr>
<tr><td colspan=2>Yi</td></tr>
<tr><td colspan=2>Zanabazar_Square</td></tr>
<tr><td></td></tr>
Expand Down
4 changes: 4 additions & 0 deletions doc/syntax.txt
Original file line number Diff line number Diff line change
Expand Up @@ -253,13 +253,15 @@ Caucasian_Albanian
Chakma
Cham
Cherokee
Chorasmian
Common
Coptic
Cuneiform
Cypriot
Cyrillic
Deseret
Devanagari
Dives_Akuru
Dogra
Duployan
Egyptian_Hieroglyphs
Expand Down Expand Up @@ -291,6 +293,7 @@ Kannada
Katakana
Kayah_Li
Kharoshthi
Khitan_Small_Script
Khmer
Khojki
Khudawadi
Expand Down Expand Up @@ -380,6 +383,7 @@ Ugaritic
Vai
Wancho
Warang_Citi
Yezidi
Yi
Zanabazar_Square

Expand Down
2 changes: 1 addition & 1 deletion re2/unicode.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
from six.moves import urllib

# Directory or URL where Unicode tables reside.
_UNICODE_DIR = "https://www.unicode.org/Public/12.1.0/ucd"
_UNICODE_DIR = "https://www.unicode.org/Public/13.0.0/ucd"

# Largest valid Unicode code value.
_RUNE_MAX = 0x10FFFF
Expand Down
12 changes: 8 additions & 4 deletions re2/unicode_casefold.cc
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
namespace re2 {


// 1381 groups, 2792 pairs, 356 ranges
// 1384 groups, 2798 pairs, 358 ranges
const CaseFold unicode_casefold[] = {
{ 65, 90, 32 },
{ 97, 106, -32 },
Expand Down Expand Up @@ -349,6 +349,8 @@ const CaseFold unicode_casefold[] = {
{ 42948, 42948, -48 },
{ 42949, 42949, -42307 },
{ 42950, 42950, -35384 },
{ 42951, 42954, OddEven },
{ 42997, 42998, OddEven },
{ 43859, 43859, -928 },
{ 43888, 43967, -38864 },
{ 65313, 65338, 32 },
Expand All @@ -366,9 +368,9 @@ const CaseFold unicode_casefold[] = {
{ 125184, 125217, 34 },
{ 125218, 125251, -34 },
};
const int num_unicode_casefold = 356;
const int num_unicode_casefold = 358;

// 1381 groups, 1411 pairs, 198 ranges
// 1384 groups, 1414 pairs, 200 ranges
const CaseFold unicode_tolower[] = {
{ 65, 90, 32 },
{ 181, 181, 775 },
Expand Down Expand Up @@ -560,6 +562,8 @@ const CaseFold unicode_tolower[] = {
{ 42948, 42948, -48 },
{ 42949, 42949, -42307 },
{ 42950, 42950, -35384 },
{ 42951, 42953, OddEvenSkip },
{ 42997, 42997, OddEven },
{ 43888, 43967, -38864 },
{ 65313, 65338, 32 },
{ 66560, 66599, 40 },
Expand All @@ -569,7 +573,7 @@ const CaseFold unicode_tolower[] = {
{ 93760, 93791, 32 },
{ 125184, 125217, 34 },
};
const int num_unicode_tolower = 198;
const int num_unicode_tolower = 200;



Expand Down
Loading

0 comments on commit ca93436

Please sign in to comment.