Skip to content

Commit 900ece7

Browse files
committed
prepare Unicode normalization for Unicode 16.0.0
1 parent ab22f39 commit 900ece7

File tree

2 files changed

+12
-0
lines changed

2 files changed

+12
-0
lines changed

lib/unicode_normalize/normalize.rb

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,7 @@ def self.nfc_one(string)
114114
last_class = accent_class
115115
end
116116
end
117+
accents = nfc_one(accents) if accents.length>1 # TODO: change from recursion to loop
117118
hangul_comp_one(start+accents)
118119
end
119120

template/unicode_norm_gen.tmpl

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,17 @@ accent_array = combining_class.keys + composition_table.keys.collect {|key| key.
112112

113113
composition_starters = composition_table.keys.collect {|key| key.first}
114114

115+
# Special treatment for Unicode 16.0.0
116+
# Add characters that can be decomposed (even indirectly) so that
117+
# the first character in the decomposition is a an accent to accents.
118+
# We do this here up to two levels deep.
119+
# In the future, there may be even deeper levels.
120+
starter_accents = composition_starters & accent_array
121+
decomposition_table.each do |k, v|
122+
accent_array << k if starter_accents.include? v.first
123+
accent_array << k if starter_accents.include? decomposition_table[v.first]&.first
124+
end
125+
115126
hangul_no_trailing = []
116127
0xAC00.step(0xD7A3, 28) {|c| hangul_no_trailing << c}
117128

0 commit comments

Comments
 (0)