From b342f94579201966fb196f208e609baea3c37eb3 Mon Sep 17 00:00:00 2001 From: Jan Lelis Date: Mon, 18 Nov 2024 09:11:21 +0100 Subject: [PATCH 01/27] Improve README Emoji mode table --- CHANGELOG.md | 1 + README.md | 35 +++++++++++----------- lib/unicode/display_width/emoji_support.rb | 2 +- 3 files changed, 19 insertions(+), 19 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 6f19408..6bb6433 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -16,6 +16,7 @@ (Emoji themselves might display differently) - Set default Emoji mode for unknown/unsupported terminals to `:none` (instead of `:basic`) +- Rename `:basic` mode to `:vs16` ## 3.0.1 diff --git a/README.md b/README.md index 0d0ad51..2dc2a5f 100644 --- a/README.md +++ b/README.md @@ -100,35 +100,34 @@ Emoji Type | Width / Comment ------------|---------------- Basic/Single Emoji character without Variation Selector | No special handling Basic/Single Emoji character with VS15 (Text) | No special handling -Basic/Single Emoji character with VS16 (Emoji) | 2 (except with `emoji: :none` or `emoji: :all_no_vs16`) -Emoji Sequence | 2 if Emoji belongs to configured Emoji set +Basic/Single Emoji character with VS16 (Emoji) | 2 or East Asian Width (see table below) +Emoji Sequence | 2 if Emoji belongs to configured Emoji set (see table below) + +#### Emoji Modes The `emoji:` option can be used to configure which type of Emoji should be considered to have a width of 2 and if VS16-Emoji should be widened. Other sequences are treated as non-combined Emoji, so the widths of all partial Emoji add up (e.g. width of one basic Emoji + one skin tone modifier + another basic Emoji). The following Emoji settings can be used: -Option | Description | Example Terminals --------|-------------|------------------ -`emoji: true` or `emoji: :auto` | Automatically use recommended Emoji setting for your terminal | - -`emoji: false` or `emoji: :none` | No Emoji adjustments, Emoji characters with VS16 not handled | Gnome Terminal, many older terminals -`emoji: :basic` | Full-width VS16-Emoji, but no width adjustments for Emoji sequences: All partial Emoji treated separately with a width of 2 | ? -`emoji: :rgi` | Full-width VS16-Emoji, all RGI Emoji sequences are considered to have a width of 2 | Apple Terminal -`emoji: :possible`| Full-width VS16-Emoji, all possible/well-formed Emoji sequences are considered to have a width of 2 | ? -`emoji: :all` | Full-width VS16-Emoji, all ZWJ/modifier/keycap sequences have a width of 2, even if they are not well-formed Emoji sequences | foot, Contour -`emoji: :all_no_vs16` | VS16-Emoji not handled, all ZWJ/modifier/keycap sequences to have a width of 2, even if they are not well-formed Emoji sequences | WezTerm +`emoji:` Option | VS16-Emoji Width | Emoji Sequences Width / Comment | Example Terminals +----------------|------------------|---------------------------------|------------------ +`true` or `:auto` | - | Automatically use recommended Emoji setting for your terminal | - +`:all` | 2 | 2 for all ZWJ/modifier/keycap sequences, even if they are not well-formed Emoji sequences | iTerm, foot, Contour +`:all_no_vs16` | EAW (1 or 2) | 2 for all ZWJ/modifier/keycap sequences, even if they are not well-formed Emoji sequences | WezTerm +`:possible`| 2 | 2 for all possible/well-formed Emoji sequences | ? +`:rgi` | 2 | 2 for all [RGI Emoji](https://www.unicode.org/reports/tr51/#def_rgi_set) sequences | ? +`:rgi_at` | EAW (1 or 2) | 1 or 2: Like `:rgi`, but Emoji sequences starting with a default-text Emoji have width 1 | Apple Terminal +`:vs16` | 2 | 2 * number of partial Emoji (sequences never considered to represent a combined Emoji) | kitty +`false` or `:none` | EAW (1 or 2) | No Emoji adjustments | gnome-terminal, many older terminals - *RGI Emoji:* Emoji Recommended for General Interchange - *ZWJ:* Zero-width Joiner: Codepoint `U+200D`,used in many Emoji sequences #### Emoji Support in Terminals -Unfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` or `emoji: :auto` is used, the gem will attempt to set the best fitting Emoji setting for you (e.g. `:rgi` on "Apple_Terminal" or `:none` on Gnome's terminal widget). - -Note that Emoji display and number of terminal columns used might differs a lot. For example, it might be the case that a terminal does not understand which Emoji to display, but still manages to calculate the proper amount of terminal cells. The automatic Emoji support level per terminal only considers the latter (cursor position), not the actual Emoji image(s) displayed. Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value. Also see the [ucs-detect project], which is a great resource that compares various terminal's Unicode/Emoji capabilities. - ---- +Unfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` or `emoji: :auto` is used, the gem will attempt to set the best fitting Emoji setting for you (e.g. `:rgi_at` on "Apple_Terminal" or `:none` on Gnome's terminal widget). -To terminal implementors reading this: Although handling Emoji/ZWJ sequences as always having a width of 2 (`:all` mode described above) has some advantages, it does not lead to a particularly good developer experience. Since there is always the possibility of well-formed Emoji that are currently not supported (non-RGI / future Unicode) appearing, those sequences will take more cells. Instead of overflowing, cutting off sequences or displaying placeholder-Emoji, could it be worthwile to implement the `:rgi` option (see table above) and just give those unknown Emoji the space they need? It is painful to implement, I know, but it kind of underlines the idea that the meaning of an unknown Emoji sequence can still be conveyed (without messing up the terminal at the same time). Just a thought… +Please note that Emoji display and number of terminal columns used might differs a lot. For example, it might be the case that a terminal does not understand which Emoji to display, but still manages to calculate the proper amount of terminal cells. The automatic Emoji support level per terminal only considers the latter (cursor position), not the actual Emoji image(s) displayed. Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value. Also see the [ucs-detect project](https://ucs-detect.readthedocs.io/results.html), which is a great resource that compares various terminal's Unicode/Emoji capabilities. ---- +**To terminal implementors reading this:** Although the practice of giving all Emoji/ZWJ sequences a width of 2 (`:all` mode described above) has some advantages, it does not lead to a particularly good developer experience. Since there is always the possibility of well-formed Emoji that are currently not supported (non-RGI / future Unicode) appearing, those sequences will take more cells. Instead of overflowing, cutting off sequences or displaying placeholder-Emoji, could it be worthwile to implement the `:rgi` option (only known Emoji get width 2) and give those unknown Emoji the space they need? This would support the idea that the meaning of an unknown Emoji sequence can still be conveyed (without messing up the terminal at the same time). Just a thought… ### Usage with String Extension diff --git a/lib/unicode/display_width/emoji_support.rb b/lib/unicode/display_width/emoji_support.rb index 704204e..643c14b 100644 --- a/lib/unicode/display_width/emoji_support.rb +++ b/lib/unicode/display_width/emoji_support.rb @@ -14,7 +14,7 @@ module EmojiSupport # maybe CSI queries can help? def self.recommended if ENV["CI"] - return :rqi_uqe + return :rqi end case ENV["TERM_PROGRAM"] From 5088d983f430f7220c7bd4d04d0052e1d996dc27 Mon Sep 17 00:00:00 2001 From: Jan Lelis Date: Mon, 18 Nov 2024 10:55:24 +0100 Subject: [PATCH 02/27] Rename :basic mode to :vs16 --- lib/unicode/display_width.rb | 2 +- lib/unicode/display_width/emoji_support.rb | 4 ++-- spec/display_width_spec.rb | 26 ++++++++++++++++++---- 3 files changed, 25 insertions(+), 7 deletions(-) diff --git a/lib/unicode/display_width.rb b/lib/unicode/display_width.rb index a9ead54..e4c4f45 100644 --- a/lib/unicode/display_width.rb +++ b/lib/unicode/display_width.rb @@ -171,7 +171,7 @@ def self.emoji_width(string, mode = :all) emoji_width_via_possible(string, Unicode::Emoji.const_get(emoji_set_regex)) elsif mode == :all_no_vs16 emoji_width_all(string) - elsif mode == :basic + elsif mode == :vs16 emoji_width_basic(string) elsif mode == :all res_all, string = emoji_width_all(string) diff --git a/lib/unicode/display_width/emoji_support.rb b/lib/unicode/display_width/emoji_support.rb index 643c14b..4ec7f14 100644 --- a/lib/unicode/display_width/emoji_support.rb +++ b/lib/unicode/display_width/emoji_support.rb @@ -31,11 +31,11 @@ def self.recommended # konsole: all, how to detect? return :all when /kitty/ - return :basic + return :vs16 end if ENV["WT_SESSION"] # Windows Terminal - return :basic + return :vs16 end # As of last time checked: gnome-terminal, vscode, alacritty diff --git a/spec/display_width_spec.rb b/spec/display_width_spec.rb index 6c31e60..391edf0 100644 --- a/spec/display_width_spec.rb +++ b/spec/display_width_spec.rb @@ -245,11 +245,15 @@ end end - describe ':basic' do + describe ':vs16' do it 'will ignore shorter width of all Emoji sequences' do # Please note that this is different from emoji: false / emoji: :none # -> Basic Emoji with VS16 still get normalized - expect( "πŸ€ΎπŸ½β€β™€οΈ".display_width(1, emoji: :basic) ).to eq 6 + expect( "πŸ€ΎπŸ½β€β™€οΈ".display_width(1, emoji: :vs16) ).to eq 6 + end + + it 'counts default-text presentation Emoji with Emoji Presentation (VS16) as 2' do + expect( "❣️".display_width(emoji: :vs16) ).to eq 2 end end @@ -261,6 +265,10 @@ expect( "πŸ€ β€πŸ€’".display_width(1, emoji: :rgi) ).to eq 4 # Non-RGI/well-formed expect( "πŸš„πŸΎβ€β–ΆοΈ".display_width(1, emoji: :rgi) ).to eq 6 # Invalid/non-Emoji sequence end + + it 'counts default-text presentation Emoji with Emoji Presentation (VS16) as 2' do + expect( "❣️".display_width(emoji: :rgi) ).to eq 2 + end end describe ':possible' do @@ -271,6 +279,10 @@ expect( "πŸ€ β€πŸ€’".display_width(1, emoji: :possible) ).to eq 2 # Non-RGI/well-formed expect( "πŸš„πŸΎβ€β–ΆοΈ".display_width(1, emoji: :possible) ).to eq 6 # Invalid/non-Emoji sequence end + + it 'counts default-text presentation Emoji with Emoji Presentation (VS16) as 2' do + expect( "❣️".display_width(emoji: :possible) ).to eq 2 + end end describe ':all' do @@ -280,7 +292,10 @@ expect( "β€β€πŸ©Ή".display_width(1, emoji: :all) ).to eq 2 # UQE expect( "πŸ€ β€πŸ€’".display_width(1, emoji: :all) ).to eq 2 # Non-RGI/well-formed expect( "πŸš„πŸΎβ€β–ΆοΈ".display_width(1, emoji: :all) ).to eq 2 # Invalid/non-Emoji sequence - expect( "❣️".display_width(emoji: :all) ).to eq 2 # VS16 + end + + it 'counts default-text presentation Emoji with Emoji Presentation (VS16) as 2' do + expect( "❣️".display_width(emoji: :all) ).to eq 2 end end @@ -291,7 +306,10 @@ expect( "β€β€πŸ©Ή".display_width(1, emoji: :all_no_vs16) ).to eq 2 # UQE expect( "πŸ€ β€πŸ€’".display_width(1, emoji: :all_no_vs16) ).to eq 2 # Non-RGI/well-formed expect( "πŸš„πŸΎβ€β–ΆοΈ".display_width(1, emoji: :all_no_vs16) ).to eq 2 # Invalid/non-Emoji sequence - expect( "❣️".display_width(emoji: :all_no_vs16) ).to eq 1 # No VS16 + end + + it 'uses EAW for default-text presentation Emoji with Emoji Presentation (VS16)' do + expect( "❣️".display_width(emoji: :all_no_vs16) ).to eq 1 end end end From 7ba1c3af6b67f4925d0bfb66ca2e8d8a2d32a2d2 Mon Sep 17 00:00:00 2001 From: Jan Lelis Date: Mon, 18 Nov 2024 13:36:30 +0100 Subject: [PATCH 03/27] Implement :rgi_at mode for Apple_Terminal --- CHANGELOG.md | 15 +++-- README.md | 6 +- lib/unicode/display_width.rb | 46 +++++++++------ lib/unicode/display_width/emoji_support.rb | 4 +- spec/display_width_spec.rb | 65 ++++++++++++++-------- 5 files changed, 83 insertions(+), 53 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 6bb6433..0d50997 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,23 +2,22 @@ ## 3.1.0 (unreleased) -**Further Emoji improvements:** +**Improve Emoji support:** - Emoji modes: Differentiate between well-formed Emoji (`:possible`) and any ZWJ/modifier sequence (`:all`). The latter is more common and more efficient to implement. -- Add alias `emoji: :auto` for `emoji: true` and `emoji: :none` for `emoji: false` -- Unify `rgi_*` options to just `rgi` to keep things simpler (corresponds to +- Unify `rgi_{fqe,mqe,uqe}` options to just `:rgi` to keep things simpler (corresponds to the former `:rgi_uqe` option). Most terminals that want to support the RGI set will probably want to catch Emoji sequences with missing VS16s. -- Add new `:all_no_vs16` mode -- Only consider terminal cells needed when recommending Emoji support level +- Add new `:all_no_vs16` and `:rgi_at` modes to be able to support some terminals + that needs these quirks +- Add alias `emoji: :auto` for `emoji: true` and `emoji: :none` for `emoji: false` +- `:auto` mode: Only consider terminal cells when recommending Emoji support level (Emoji themselves might display differently) -- Set default Emoji mode for unknown/unsupported terminals to `:none` - (instead of `:basic`) +- `:auto` mode: Set default Emoji mode for unknown/unsupported terminals to `:none` - Rename `:basic` mode to `:vs16` - ## 3.0.1 - Add WezTerm and foot as good Emoji terminals diff --git a/README.md b/README.md index 2dc2a5f..dd15d14 100644 --- a/README.md +++ b/README.md @@ -110,12 +110,12 @@ The `emoji:` option can be used to configure which type of Emoji should be consi `emoji:` Option | VS16-Emoji Width | Emoji Sequences Width / Comment | Example Terminals ----------------|------------------|---------------------------------|------------------ `true` or `:auto` | - | Automatically use recommended Emoji setting for your terminal | - -`:all` | 2 | 2 for all ZWJ/modifier/keycap sequences, even if they are not well-formed Emoji sequences | iTerm, foot, Contour +`:all` | 2 | 2 for all ZWJ/modifier/keycap sequences, even if they are not well-formed Emoji sequences | iTerm, foot `:all_no_vs16` | EAW (1 or 2) | 2 for all ZWJ/modifier/keycap sequences, even if they are not well-formed Emoji sequences | WezTerm `:possible`| 2 | 2 for all possible/well-formed Emoji sequences | ? `:rgi` | 2 | 2 for all [RGI Emoji](https://www.unicode.org/reports/tr51/#def_rgi_set) sequences | ? `:rgi_at` | EAW (1 or 2) | 1 or 2: Like `:rgi`, but Emoji sequences starting with a default-text Emoji have width 1 | Apple Terminal -`:vs16` | 2 | 2 * number of partial Emoji (sequences never considered to represent a combined Emoji) | kitty +`:vs16` | 2 | 2 * number of partial Emoji (sequences never considered to represent a combined Emoji) | kitty? `false` or `:none` | EAW (1 or 2) | No Emoji adjustments | gnome-terminal, many older terminals - *RGI Emoji:* Emoji Recommended for General Interchange @@ -123,7 +123,7 @@ The `emoji:` option can be used to configure which type of Emoji should be consi #### Emoji Support in Terminals -Unfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` or `emoji: :auto` is used, the gem will attempt to set the best fitting Emoji setting for you (e.g. `:rgi_at` on "Apple_Terminal" or `:none` on Gnome's terminal widget). +Unfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` or `emoji: :auto` is used, the gem will attempt to set the best fitting Emoji setting for you (e.g. `:rgi_at` on "Apple_Terminal" or `false` on Gnome's terminal widget). Please note that Emoji display and number of terminal columns used might differs a lot. For example, it might be the case that a terminal does not understand which Emoji to display, but still manages to calculate the proper amount of terminal cells. The automatic Emoji support level per terminal only considers the latter (cursor position), not the actual Emoji image(s) displayed. Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value. Also see the [ucs-detect project](https://ucs-detect.readthedocs.io/results.html), which is a great resource that compares various terminal's Unicode/Emoji capabilities. diff --git a/lib/unicode/display_width.rb b/lib/unicode/display_width.rb index e4c4f45..6d3efc5 100644 --- a/lib/unicode/display_width.rb +++ b/lib/unicode/display_width.rb @@ -8,6 +8,7 @@ module Unicode class DisplayWidth + DEFAULT_AMBIGUOUS = 1 INITIAL_DEPTH = 0x10000 ASCII_NON_ZERO_REGEX = /[\0\x05\a\b\n\v\f\r\x0E\x0F]/ ASCII_NON_ZERO_STRING = "\0\x05\a\b\n\v\f\r\x0E\x0F" @@ -26,6 +27,7 @@ class DisplayWidth } EMOJI_SEQUENCES_REGEX_MAPPING = { rgi: :REGEX_INCLUDE_MQE_UQE, + rgi_at: :REGEX_INCLUDE_MQE_UQE, possible: :REGEX_WELL_FORMED, } REGEX_EMOJI_BASIC_OR_KEYCAP = Regexp.union(Unicode::Emoji::REGEX_BASIC, Unicode::Emoji::REGEX_EMOJI_KEYCAP) @@ -40,7 +42,7 @@ def self.of(string, ambiguous = nil, overwrite = nil, old_options = {}, **option end options[:ambiguous] = ambiguous if ambiguous - options[:ambiguous] ||= 1 + options[:ambiguous] ||= DEFAULT_AMBIGUOUS if options[:ambiguous] != 1 && options[:ambiguous] != 2 raise ArgumentError, "Unicode::DisplayWidth: Ambiguous width must be 1 or 2" @@ -92,6 +94,7 @@ def self.width_frame(string, options) res, string = emoji_width( string, options[:emoji], + options[:ambiguous], ) end @@ -162,13 +165,18 @@ def self.width_all_features(string, index_full, index_low, first_ambiguous, over end - def self.emoji_width(string, mode = :all) + def self.emoji_width(string, mode = :all, ambiguous = DEFAULT_AMBIGUOUS) res = 0 string = string.encode(Encoding::UTF_8) unless string.encoding.name == "utf-8" if emoji_set_regex = EMOJI_SEQUENCES_REGEX_MAPPING[mode] - emoji_width_via_possible(string, Unicode::Emoji.const_get(emoji_set_regex)) + emoji_width_via_possible( + string, + Unicode::Emoji.const_get(emoji_set_regex), + mode == :rgi_at, + ambiguous, + ) elsif mode == :all_no_vs16 emoji_width_all(string) elsif mode == :vs16 @@ -211,7 +219,7 @@ def self.emoji_width_all(string) end # Match possible Emoji first, then refine - def self.emoji_width_via_possible(string, emoji_set_regex) + def self.emoji_width_via_possible(string, emoji_set_regex, strict_eaw = false, ambiguous = DEFAULT_AMBIGUOUS) res = 0 # For each string possibly an emoji @@ -220,22 +228,28 @@ def self.emoji_width_via_possible(string, emoji_set_regex) if REGEX_EMOJI_NOT_POSSIBLE.match?(emoji_candidate) emoji_candidate - # Check if we have a combined Emoji with width 2 + # Check if we have a combined Emoji with width 2 (or EAW an Apple Terminal) elsif emoji_candidate == emoji_candidate[emoji_set_regex] - res += 2 + if strict_eaw + res += self.of(emoji_candidate[0], ambiguous, emoji: false) + else + res += 2 + end "" # We are dealing with a default text presentation emoji or a well-formed sequence not matching the above Emoji set else - # Ensure all explicit VS16 sequences have width 2 - emoji_candidate.gsub!(Unicode::Emoji::REGEX_BASIC){ |basic_emoji| - if basic_emoji.size == 2 # VS16 present - res += 2 - "" - else - basic_emoji - end - } + if !strict_eaw + # Ensure all explicit VS16 sequences have width 2 + emoji_candidate.gsub!(Unicode::Emoji::REGEX_BASIC){ |basic_emoji| + if basic_emoji.size == 2 # VS16 present + res += 2 + "" + else + basic_emoji + end + } + end emoji_candidate end @@ -244,7 +258,7 @@ def self.emoji_width_via_possible(string, emoji_set_regex) [res, no_emoji_string] end - def initialize(ambiguous: 1, overwrite: {}, emoji: true) + def initialize(ambiguous: DEFAULT_AMBIGUOUS, overwrite: {}, emoji: true) @ambiguous = ambiguous @overwrite = overwrite @emoji = emoji diff --git a/lib/unicode/display_width/emoji_support.rb b/lib/unicode/display_width/emoji_support.rb index 4ec7f14..5106342 100644 --- a/lib/unicode/display_width/emoji_support.rb +++ b/lib/unicode/display_width/emoji_support.rb @@ -20,8 +20,8 @@ def self.recommended case ENV["TERM_PROGRAM"] when "iTerm.app" return :all - when "Apple_Terminal" # Also: If first Emoji part is EAW 1, gives whole ZWJ seqs width 1 - return :rgi + when "Apple_Terminal" + return :rgi_at when "WezTerm" return :all_no_vs16 end diff --git a/spec/display_width_spec.rb b/spec/display_width_spec.rb index 391edf0..302da11 100644 --- a/spec/display_width_spec.rb +++ b/spec/display_width_spec.rb @@ -175,7 +175,7 @@ end it 'can be passed as :overwrite option' do - expect( "\t".display_width(1, overwrite: { 0x09 => 12 }) ).to eq 12 + expect( "\t".display_width(overwrite: { 0x09 => 12 }) ).to eq 12 end end @@ -240,7 +240,7 @@ describe '(modes)' do describe 'false / :none' do it 'does no Emoji adjustments when emoji suport is disabled' do - expect( "πŸ€ΎπŸ½β€β™€οΈ".display_width(1, emoji: false) ).to eq 5 + expect( "πŸ€ΎπŸ½β€β™€οΈ".display_width(emoji: false) ).to eq 5 expect( "❣️".display_width(emoji: :none) ).to eq 1 end end @@ -249,7 +249,7 @@ it 'will ignore shorter width of all Emoji sequences' do # Please note that this is different from emoji: false / emoji: :none # -> Basic Emoji with VS16 still get normalized - expect( "πŸ€ΎπŸ½β€β™€οΈ".display_width(1, emoji: :vs16) ).to eq 6 + expect( "πŸ€ΎπŸ½β€β™€οΈ".display_width(emoji: :vs16) ).to eq 6 end it 'counts default-text presentation Emoji with Emoji Presentation (VS16) as 2' do @@ -258,12 +258,12 @@ end describe ':rgi' do - it 'will ignore shorter width of non-RQI sequences' do - expect( "πŸ€ΎπŸ½β€β™€οΈ".display_width(1, emoji: :rgi) ).to eq 2 # FQE - expect( "πŸ€ΎπŸ½β€β™€".display_width(1, emoji: :rgi) ).to eq 2 # MQE - expect( "β€β€πŸ©Ή".display_width(1, emoji: :rgi) ).to eq 2 # UQE - expect( "πŸ€ β€πŸ€’".display_width(1, emoji: :rgi) ).to eq 4 # Non-RGI/well-formed - expect( "πŸš„πŸΎβ€β–ΆοΈ".display_width(1, emoji: :rgi) ).to eq 6 # Invalid/non-Emoji sequence + it 'will ignore shorter width of non-RGI sequences' do + expect( "πŸ€ΎπŸ½β€β™€οΈ".display_width(emoji: :rgi) ).to eq 2 # FQE + expect( "πŸ€ΎπŸ½β€β™€".display_width(emoji: :rgi) ).to eq 2 # MQE + expect( "β€β€πŸ©Ή".display_width(emoji: :rgi) ).to eq 2 # UQE + expect( "πŸ€ β€πŸ€’".display_width(emoji: :rgi) ).to eq 4 # Non-RGI/well-formed + expect( "πŸš„πŸΎβ€β–ΆοΈ".display_width(emoji: :rgi) ).to eq 6 # Invalid/non-Emoji sequence end it 'counts default-text presentation Emoji with Emoji Presentation (VS16) as 2' do @@ -271,13 +271,30 @@ end end + describe ':rgi_at' do + it 'will assign width based on EAW of first partial Emoji to whole sequence' do + expect( "πŸ€ΎπŸ½β€β™€οΈ".display_width(emoji: :rgi_at) ).to eq 2 + expect( "⛹️‍♀️".display_width(emoji: :rgi_at) ).to eq 1 + expect( "β€β€πŸ©Ή".display_width(emoji: :rgi_at) ).to eq 1 + end + + it 'will count partial emoji for non-RGI sequences' do + expect( "πŸ€ β€πŸ€’".display_width(emoji: :rgi_at) ).to eq 4 # Non-RGI/well-formed + expect( "πŸš„πŸΎβ€β–ΆοΈ".display_width(emoji: :rgi_at) ).to eq 5 # Invalid/non-Emoji sequence + end + + it 'uses EAW for default-text presentation Emoji with Emoji Presentation (VS16)' do + expect( "❣️".display_width(emoji: :rgi_at) ).to eq 1 + end + end + describe ':possible' do it 'will treat possible/well-formed Emoji sequence as width 2' do - expect( "πŸ€ΎπŸ½β€β™€οΈ".display_width(1, emoji: :possible) ).to eq 2 # FQE - expect( "πŸ€ΎπŸ½β€β™€".display_width(1, emoji: :possible) ).to eq 2 # MQE - expect( "β€β€πŸ©Ή".display_width(1, emoji: :possible) ).to eq 2 # UQE - expect( "πŸ€ β€πŸ€’".display_width(1, emoji: :possible) ).to eq 2 # Non-RGI/well-formed - expect( "πŸš„πŸΎβ€β–ΆοΈ".display_width(1, emoji: :possible) ).to eq 6 # Invalid/non-Emoji sequence + expect( "πŸ€ΎπŸ½β€β™€οΈ".display_width(emoji: :possible) ).to eq 2 # FQE + expect( "πŸ€ΎπŸ½β€β™€".display_width(emoji: :possible) ).to eq 2 # MQE + expect( "β€β€πŸ©Ή".display_width(emoji: :possible) ).to eq 2 # UQE + expect( "πŸ€ β€πŸ€’".display_width(emoji: :possible) ).to eq 2 # Non-RGI/well-formed + expect( "πŸš„πŸΎβ€β–ΆοΈ".display_width(emoji: :possible) ).to eq 6 # Invalid/non-Emoji sequence end it 'counts default-text presentation Emoji with Emoji Presentation (VS16) as 2' do @@ -287,11 +304,11 @@ describe ':all' do it 'will treat any ZWJ/modifier/keycap sequences sequence as width 2' do - expect( "πŸ€ΎπŸ½β€β™€οΈ".display_width(1, emoji: :all) ).to eq 2 # FQE - expect( "πŸ€ΎπŸ½β€β™€".display_width(1, emoji: :all) ).to eq 2 # MQE - expect( "β€β€πŸ©Ή".display_width(1, emoji: :all) ).to eq 2 # UQE - expect( "πŸ€ β€πŸ€’".display_width(1, emoji: :all) ).to eq 2 # Non-RGI/well-formed - expect( "πŸš„πŸΎβ€β–ΆοΈ".display_width(1, emoji: :all) ).to eq 2 # Invalid/non-Emoji sequence + expect( "πŸ€ΎπŸ½β€β™€οΈ".display_width(emoji: :all) ).to eq 2 # FQE + expect( "πŸ€ΎπŸ½β€β™€".display_width(emoji: :all) ).to eq 2 # MQE + expect( "β€β€πŸ©Ή".display_width(emoji: :all) ).to eq 2 # UQE + expect( "πŸ€ β€πŸ€’".display_width(emoji: :all) ).to eq 2 # Non-RGI/well-formed + expect( "πŸš„πŸΎβ€β–ΆοΈ".display_width(emoji: :all) ).to eq 2 # Invalid/non-Emoji sequence end it 'counts default-text presentation Emoji with Emoji Presentation (VS16) as 2' do @@ -301,11 +318,11 @@ describe ':all_no_vs16' do it 'will treat any ZWJ/modifier/keycap sequences sequence as width 2' do - expect( "πŸ€ΎπŸ½β€β™€οΈ".display_width(1, emoji: :all_no_vs16) ).to eq 2 # FQE - expect( "πŸ€ΎπŸ½β€β™€".display_width(1, emoji: :all_no_vs16) ).to eq 2 # MQE - expect( "β€β€πŸ©Ή".display_width(1, emoji: :all_no_vs16) ).to eq 2 # UQE - expect( "πŸ€ β€πŸ€’".display_width(1, emoji: :all_no_vs16) ).to eq 2 # Non-RGI/well-formed - expect( "πŸš„πŸΎβ€β–ΆοΈ".display_width(1, emoji: :all_no_vs16) ).to eq 2 # Invalid/non-Emoji sequence + expect( "πŸ€ΎπŸ½β€β™€οΈ".display_width(emoji: :all_no_vs16) ).to eq 2 # FQE + expect( "πŸ€ΎπŸ½β€β™€".display_width(emoji: :all_no_vs16) ).to eq 2 # MQE + expect( "β€β€πŸ©Ή".display_width(emoji: :all_no_vs16) ).to eq 2 # UQE + expect( "πŸ€ β€πŸ€’".display_width(emoji: :all_no_vs16) ).to eq 2 # Non-RGI/well-formed + expect( "πŸš„πŸΎβ€β–ΆοΈ".display_width(emoji: :all_no_vs16) ).to eq 2 # Invalid/non-Emoji sequence end it 'uses EAW for default-text presentation Emoji with Emoji Presentation (VS16)' do From c2953c6350a3cbe9ed1e3c0d4ff8cc3b67c391e8 Mon Sep 17 00:00:00 2001 From: Jan Lelis Date: Mon, 18 Nov 2024 13:55:13 +0100 Subject: [PATCH 04/27] Don't mix raw and escapd Unicode chars in regex --- lib/unicode/display_width.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/unicode/display_width.rb b/lib/unicode/display_width.rb index 6d3efc5..d763659 100644 --- a/lib/unicode/display_width.rb +++ b/lib/unicode/display_width.rb @@ -31,7 +31,7 @@ class DisplayWidth possible: :REGEX_WELL_FORMED, } REGEX_EMOJI_BASIC_OR_KEYCAP = Regexp.union(Unicode::Emoji::REGEX_BASIC, Unicode::Emoji::REGEX_EMOJI_KEYCAP) - REGEX_EMOJI_ALL_SEQUENCES = Regexp.union(/.[🏻-🏿\u{FE0F}]?(\u{200D}.[🏻-🏿\u{FE0F}]?)+/, Unicode::Emoji::REGEX_EMOJI_KEYCAP) + REGEX_EMOJI_ALL_SEQUENCES = Regexp.union(/.[\u{1F3FB}-\u{1F3FF}\u{FE0F}]?(\u{200D}.[\u{1F3FB}-\u{1F3FF}\u{FE0F}]?)+/, Unicode::Emoji::REGEX_EMOJI_KEYCAP) REGEX_EMOJI_NOT_POSSIBLE = /\A[#*0-9]\z/ # Returns monospace display width of string From 82b17bdb99ddfe1d1030002e2756e62d421bcc99 Mon Sep 17 00:00:00 2001 From: Jan Lelis Date: Mon, 18 Nov 2024 13:58:23 +0100 Subject: [PATCH 05/27] Release v3.1.0 --- CHANGELOG.md | 2 +- lib/unicode/display_width/constants.rb | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 0d50997..c67d671 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,6 +1,6 @@ # CHANGELOG -## 3.1.0 (unreleased) +## 3.1.0 **Improve Emoji support:** diff --git a/lib/unicode/display_width/constants.rb b/lib/unicode/display_width/constants.rb index aa04c39..02bf20f 100644 --- a/lib/unicode/display_width/constants.rb +++ b/lib/unicode/display_width/constants.rb @@ -2,7 +2,7 @@ module Unicode class DisplayWidth - VERSION = "3.0.1" + VERSION = "3.1.0" UNICODE_VERSION = "16.0.0" DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/") INDEX_FILENAME = DATA_DIRECTORY + "/display_width.marshal.gz" From b79047cc4c2507fb85c00f9a9b035f1c9ad98ee8 Mon Sep 17 00:00:00 2001 From: Jan Lelis Date: Tue, 19 Nov 2024 09:52:02 +0100 Subject: [PATCH 06/27] README: rgi_at uses EAW --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index dd15d14..5c26f85 100644 --- a/README.md +++ b/README.md @@ -114,10 +114,11 @@ The `emoji:` option can be used to configure which type of Emoji should be consi `:all_no_vs16` | EAW (1 or 2) | 2 for all ZWJ/modifier/keycap sequences, even if they are not well-formed Emoji sequences | WezTerm `:possible`| 2 | 2 for all possible/well-formed Emoji sequences | ? `:rgi` | 2 | 2 for all [RGI Emoji](https://www.unicode.org/reports/tr51/#def_rgi_set) sequences | ? -`:rgi_at` | EAW (1 or 2) | 1 or 2: Like `:rgi`, but Emoji sequences starting with a default-text Emoji have width 1 | Apple Terminal +`:rgi_at` | EAW (1 or 2) | 1 or 2: Like `:rgi`, but Emoji sequences starting with a default-text Emoji have EAW | Apple Terminal `:vs16` | 2 | 2 * number of partial Emoji (sequences never considered to represent a combined Emoji) | kitty? `false` or `:none` | EAW (1 or 2) | No Emoji adjustments | gnome-terminal, many older terminals +- *EAW:* East Asian Width - *RGI Emoji:* Emoji Recommended for General Interchange - *ZWJ:* Zero-width Joiner: Codepoint `U+200D`,used in many Emoji sequences From 5d13cb189b8cd9494b5e745bea0866b4f91b40a7 Mon Sep 17 00:00:00 2001 From: Jan Lelis Date: Tue, 19 Nov 2024 10:24:21 +0100 Subject: [PATCH 07/27] Improve VS16 matching --- CHANGELOG.md | 4 +++ lib/unicode/display_width.rb | 65 ++++++++++++----------------------- spec/display_width_spec.rb | 4 +++ unicode-display_width.gemspec | 2 +- 4 files changed, 31 insertions(+), 44 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index c67d671..6f1b36d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,9 @@ # CHANGELOG +## 3.1.1 + +- Performance improvements + ## 3.1.0 **Improve Emoji support:** diff --git a/lib/unicode/display_width.rb b/lib/unicode/display_width.rb index d763659..5914ace 100644 --- a/lib/unicode/display_width.rb +++ b/lib/unicode/display_width.rb @@ -30,9 +30,17 @@ class DisplayWidth rgi_at: :REGEX_INCLUDE_MQE_UQE, possible: :REGEX_WELL_FORMED, } - REGEX_EMOJI_BASIC_OR_KEYCAP = Regexp.union(Unicode::Emoji::REGEX_BASIC, Unicode::Emoji::REGEX_EMOJI_KEYCAP) - REGEX_EMOJI_ALL_SEQUENCES = Regexp.union(/.[\u{1F3FB}-\u{1F3FF}\u{FE0F}]?(\u{200D}.[\u{1F3FB}-\u{1F3FF}\u{FE0F}]?)+/, Unicode::Emoji::REGEX_EMOJI_KEYCAP) REGEX_EMOJI_NOT_POSSIBLE = /\A[#*0-9]\z/ + REGEX_EMOJI_VS16 = Regexp.union( + Regexp.compile( + Unicode::Emoji::REGEX_TEXT_PRESENTATION.source + + "(?= 2 # VS16 present - res += 2 - "" - else - basic_emoji - end - } - - [res, no_emoji_string] - end - - # Use simplistic ZWJ/modifier/kecap sequence matching - def self.emoji_width_all(string) - res = 0 - - no_emoji_string = string.gsub(REGEX_EMOJI_ALL_SEQUENCES){ - res += 2 - "" - } - - [res, no_emoji_string] + end end # Match possible Emoji first, then refine @@ -241,14 +227,7 @@ def self.emoji_width_via_possible(string, emoji_set_regex, strict_eaw = false, a else if !strict_eaw # Ensure all explicit VS16 sequences have width 2 - emoji_candidate.gsub!(Unicode::Emoji::REGEX_BASIC){ |basic_emoji| - if basic_emoji.size == 2 # VS16 present - res += 2 - "" - else - basic_emoji - end - } + emoji_candidate.gsub!(REGEX_EMOJI_VS16){ res += 2; "" } end emoji_candidate diff --git a/spec/display_width_spec.rb b/spec/display_width_spec.rb index 302da11..5c33e75 100644 --- a/spec/display_width_spec.rb +++ b/spec/display_width_spec.rb @@ -255,6 +255,10 @@ it 'counts default-text presentation Emoji with Emoji Presentation (VS16) as 2' do expect( "❣️".display_width(emoji: :vs16) ).to eq 2 end + + it 'works with keycaps: width 2' do + expect( "1️⃣".display_width(emoji: :vs16) ).to eq 2 + end end describe ':rgi' do diff --git a/unicode-display_width.gemspec b/unicode-display_width.gemspec index a5b4244..1cd2caf 100644 --- a/unicode-display_width.gemspec +++ b/unicode-display_width.gemspec @@ -13,7 +13,7 @@ Gem::Specification.new do |s| s.extra_rdoc_files = ["README.md", "MIT-LICENSE.txt", "CHANGELOG.md"] s.license = 'MIT' s.required_ruby_version = '>= 2.5.0' - s.add_dependency 'unicode-emoji', '~> 4.0' + s.add_dependency 'unicode-emoji', '~> 4.0', '>= 4.0.4' s.add_development_dependency 'rspec', '~> 3.4' s.add_development_dependency 'rake', '~> 13.0' From 3f93a24a45b9361b8f56cad7a161d302724b5e02 Mon Sep 17 00:00:00 2001 From: Jan Lelis Date: Tue, 19 Nov 2024 10:55:35 +0100 Subject: [PATCH 08/27] Release v3.1.1 --- lib/unicode/display_width/constants.rb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/unicode/display_width/constants.rb b/lib/unicode/display_width/constants.rb index 02bf20f..def1937 100644 --- a/lib/unicode/display_width/constants.rb +++ b/lib/unicode/display_width/constants.rb @@ -2,7 +2,7 @@ module Unicode class DisplayWidth - VERSION = "3.1.0" + VERSION = "3.1.1" UNICODE_VERSION = "16.0.0" DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/") INDEX_FILENAME = DATA_DIRECTORY + "/display_width.marshal.gz" From d67859ed4291b20c04b64c962ab3db8afdf6c5ff Mon Sep 17 00:00:00 2001 From: Jan Lelis Date: Wed, 20 Nov 2024 13:32:41 +0100 Subject: [PATCH 09/27] Refactor: Improve code quality, handle overwrite option differently --- lib/unicode/display_width.rb | 160 +++++++++++++++-------------------- 1 file changed, 70 insertions(+), 90 deletions(-) diff --git a/lib/unicode/display_width.rb b/lib/unicode/display_width.rb index 5914ace..6da4c58 100644 --- a/lib/unicode/display_width.rb +++ b/lib/unicode/display_width.rb @@ -10,8 +10,8 @@ module Unicode class DisplayWidth DEFAULT_AMBIGUOUS = 1 INITIAL_DEPTH = 0x10000 - ASCII_NON_ZERO_REGEX = /[\0\x05\a\b\n\v\f\r\x0E\x0F]/ - ASCII_NON_ZERO_STRING = "\0\x05\a\b\n\v\f\r\x0E\x0F" + ASCII_NON_ZERO_REGEX = /[\0\x05\a\b\n-\x0F]/ + ASCII_NON_ZERO_STRING = "\0\x05\a\b\n-\x0F" ASCII_BACKSPACE = "\b" AMBIGUOUS_MAP = { 1 => :WIDTH_ONE, @@ -30,6 +30,7 @@ class DisplayWidth rgi_at: :REGEX_INCLUDE_MQE_UQE, possible: :REGEX_WELL_FORMED, } + # REGEX_NEEDS_EMOJI_HANDLING: ZWJ, VS16, MODIFIER REGEX_EMOJI_NOT_POSSIBLE = /\A[#*0-9]\z/ REGEX_EMOJI_VS16 = Regexp.union( Regexp.compile( @@ -44,93 +45,44 @@ class DisplayWidth # Returns monospace display width of string def self.of(string, ambiguous = nil, overwrite = nil, old_options = {}, **options) - unless old_options.empty? - warn "Unicode::DisplayWidth: Please migrate to keyword arguments - #{old_options.inspect}" - options.merge! old_options - end - - options[:ambiguous] = ambiguous if ambiguous - options[:ambiguous] ||= DEFAULT_AMBIGUOUS - - if options[:ambiguous] != 1 && options[:ambiguous] != 2 - raise ArgumentError, "Unicode::DisplayWidth: Ambiguous width must be 1 or 2" - end - - if overwrite && !overwrite.empty? - warn "Unicode::DisplayWidth: Please migrate to keyword arguments - overwrite: #{overwrite.inspect}" - options[:overwrite] = overwrite - end - options[:overwrite] ||= {} - - if [nil, true, :auto].include?(options[:emoji]) - options[:emoji] = EmojiSupport.recommended - end + string = string.encode(Encoding::UTF_8) unless string.encoding == Encoding::UTF_8 + options = normalize_options(string, ambiguous, overwrite, old_options, **options) - # # # + width = 0 - if !options[:overwrite].empty? - return width_frame(string, options) do |string, index_full, index_low, first_ambiguous| - width_all_features(string, index_full, index_low, first_ambiguous, options[:overwrite]) - end - end - - if !string.ascii_only? - return width_frame(string, options) do |string, index_full, index_low, first_ambiguous| - width_no_overwrite(string, index_full, index_low, first_ambiguous) - end + unless options[:overwrite].empty? + width, string = width_custom(string, options[:overwrite]) end - width_ascii(string) - end - - def self.width_ascii(string) - # Optimization for ASCII-only strings without certain control symbols - if string.match?(ASCII_NON_ZERO_REGEX) - res = string.delete(ASCII_NON_ZERO_STRING).size - string.count(ASCII_BACKSPACE) - return res < 0 ? 0 : res + if string.ascii_only? + return width + width_ascii(string) end - # Pure ASCII - string.size - end - - def self.width_frame(string, options) # Retrieve Emoji width - if options[:emoji] == false || options[:emoji] == :none - res = 0 - else - res, string = emoji_width( + # TODO add quick emoji check + if options[:emoji] != :none + e_width, string = emoji_width( string, options[:emoji], options[:ambiguous], ) + width += e_width end - # Prepare indexes ambiguous_index_name = AMBIGUOUS_MAP[options[:ambiguous]] - - # Get general width - res += yield(string, INDEX[ambiguous_index_name], FIRST_4096[ambiguous_index_name], FIRST_AMBIGUOUS[ambiguous_index_name]) - - # Return result + prevent negative lengths - res < 0 ? 0 : res - end - - def self.width_no_overwrite(string, index_full, index_low, first_ambiguous, _ = {}) - res = 0 - - # Make sure we have UTF-8 - string = string.encode(Encoding::UTF_8) unless string.encoding.name == "utf-8" + index_full = INDEX[ambiguous_index_name] + index_low = FIRST_4096[ambiguous_index_name] + first_ambiguous = FIRST_AMBIGUOUS[ambiguous_index_name] string.scan(/.{,80}/m){ |batch| if batch.ascii_only? - res += batch.size + width += width_ascii(batch) else batch.each_codepoint{ |codepoint| if codepoint > 15 && codepoint < first_ambiguous - res += 1 + width += 1 elsif codepoint < 0x1001 - res += index_low[codepoint] || 1 + width += index_low[codepoint] || 1 else d = INITIAL_DEPTH w = index_full[codepoint / d] @@ -138,46 +90,46 @@ def self.width_no_overwrite(string, index_full, index_low, first_ambiguous, _ = w = w[(codepoint %= d) / (d /= 16)] end - res += w || 1 + width += w || 1 end } end } - res + # Return result + prevent negative lengths + width < 0 ? 0 : width end - # Same as .width_no_overwrite - but with applying overwrites for each char - def self.width_all_features(string, index_full, index_low, first_ambiguous, overwrite) - res = 0 + # Returns width of custom overwrites and remaining string + def self.width_custom(string, overwrite) + width = 0 - string.each_codepoint{ |codepoint| + string = string.each_codepoint.select{ |codepoint| if overwrite[codepoint] - res += overwrite[codepoint] - elsif codepoint > 15 && codepoint < first_ambiguous - res += 1 - elsif codepoint < 0x1001 - res += index_low[codepoint] || 1 + width += overwrite[codepoint] + nil else - d = INITIAL_DEPTH - w = index_full[codepoint / d] - while w.instance_of? Array - w = w[(codepoint %= d) / (d /= 16)] - end - - res += w || 1 + codepoint end - } + }.pack("U*") - res + [width, string] end + # Returns width for ASCII-only strings. Will consider zero-width control symbols. + def self.width_ascii(string) + if string.match?(ASCII_NON_ZERO_REGEX) + res = string.delete(ASCII_NON_ZERO_STRING).size - string.count(ASCII_BACKSPACE) + return res < 0 ? 0 : res + end + + string.size + end + # Returns width of all considered Emoji and remaining string def self.emoji_width(string, mode = :all, ambiguous = DEFAULT_AMBIGUOUS) res = 0 - string = string.encode(Encoding::UTF_8) unless string.encoding.name == "utf-8" - if emoji_set_regex = EMOJI_SEQUENCES_REGEX_MAPPING[mode] emoji_width_via_possible( string, @@ -237,6 +189,34 @@ def self.emoji_width_via_possible(string, emoji_set_regex, strict_eaw = false, a [res, no_emoji_string] end + def self.normalize_options(string, ambiguous = nil, overwrite = nil, old_options = {}, **options) + unless old_options.empty? + warn "Unicode::DisplayWidth: Please migrate to keyword arguments - #{old_options.inspect}" + options.merge! old_options + end + + options[:ambiguous] = ambiguous if ambiguous + options[:ambiguous] ||= DEFAULT_AMBIGUOUS + + if options[:ambiguous] != 1 && options[:ambiguous] != 2 + raise ArgumentError, "Unicode::DisplayWidth: Ambiguous width must be 1 or 2" + end + + if overwrite && !overwrite.empty? + warn "Unicode::DisplayWidth: Please migrate to keyword arguments - overwrite: #{overwrite.inspect}" + options[:overwrite] = overwrite + end + options[:overwrite] ||= {} + + if [nil, true, :auto].include?(options[:emoji]) + options[:emoji] = EmojiSupport.recommended + elsif options[:emoji] == false + options[:emoji] = :none + end + + options + end + def initialize(ambiguous: DEFAULT_AMBIGUOUS, overwrite: {}, emoji: true) @ambiguous = ambiguous @overwrite = overwrite From 16d299aa5245493034816f484358ec9087b4d952 Mon Sep 17 00:00:00 2001 From: Jan Lelis Date: Wed, 20 Nov 2024 16:55:17 +0100 Subject: [PATCH 10/27] Performance: Speed up if string is only common narrow characters --- data/display_width.marshal.gz | Bin 2055 -> 2055 bytes lib/unicode/display_width.rb | 47 +++++++++++++++++++--------------- 2 files changed, 27 insertions(+), 20 deletions(-) diff --git a/data/display_width.marshal.gz b/data/display_width.marshal.gz index c7cda7b6c2f9ce08a768b5a0fee47c09461187a8..258d84d7bb8ba32eaf8fb468725bb7074f881ec0 100644 GIT binary patch delta 15 WcmZn{Xcu6Y@8)25$!xcfodW 15 && codepoint < first_ambiguous + width += 1 + elsif codepoint < 0x1001 + width += index_low[codepoint] || 1 else - batch.each_codepoint{ |codepoint| - if codepoint > 15 && codepoint < first_ambiguous - width += 1 - elsif codepoint < 0x1001 - width += index_low[codepoint] || 1 - else - d = INITIAL_DEPTH - w = index_full[codepoint / d] - while w.instance_of? Array - w = w[(codepoint %= d) / (d /= 16)] - end - - width += w || 1 - end - } + d = INITIAL_DEPTH + w = index_full[codepoint / d] + while w.instance_of? Array + w = w[(codepoint %= d) / (d /= 16)] + end + + width += w || 1 end } From fc78784c208a75c592e3ee350e1ed42e91b977d1 Mon Sep 17 00:00:00 2001 From: Jan Lelis Date: Wed, 20 Nov 2024 16:57:50 +0100 Subject: [PATCH 11/27] Performance: Use bytesize for an extra boost when string is only ASCII --- CHANGELOG.md | 4 ++++ lib/unicode/display_width.rb | 9 ++++----- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 6f1b36d..28b1249 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,9 @@ # CHANGELOG +## 3.1.2 (unreleased) + +- Performance improvements + ## 3.1.1 - Performance improvements diff --git a/lib/unicode/display_width.rb b/lib/unicode/display_width.rb index 1bec267..0d4f219 100644 --- a/lib/unicode/display_width.rb +++ b/lib/unicode/display_width.rb @@ -34,7 +34,7 @@ class DisplayWidth rgi_at: :REGEX_INCLUDE_MQE_UQE, possible: :REGEX_WELL_FORMED, } - # REGEX_NEEDS_EMOJI_HANDLING: ZWJ, VS16, MODIFIER, keycaps + # REGEX_NEEDS_EMOJI_HANDLING: ZWJ, VS16, MODIFIER, KEYCAP REGEX_EMOJI_NOT_POSSIBLE = /\A[#*0-9]\z/ REGEX_EMOJI_VS16 = Regexp.union( Regexp.compile( @@ -68,8 +68,7 @@ def self.of(string, ambiguous = nil, overwrite = nil, old_options = {}, **option return width + string.size end - # Retrieve Emoji width - # TODO add quick emoji check + # Retrieve Emoji width, maybe: add quick check using REGEX_NEEDS_EMOJI_HANDLING if options[:emoji] != :none e_width, string = emoji_width( string, @@ -126,11 +125,11 @@ def self.width_custom(string, overwrite) # Returns width for ASCII-only strings. Will consider zero-width control symbols. def self.width_ascii(string) if string.match?(ASCII_NON_ZERO_REGEX) - res = string.delete(ASCII_NON_ZERO_STRING).size - string.count(ASCII_BACKSPACE) + res = string.delete(ASCII_NON_ZERO_STRING).bytesize - string.count(ASCII_BACKSPACE) return res < 0 ? 0 : res end - string.size + string.bytesize end # Returns width of all considered Emoji and remaining string From d5109424e097f2040777535db4f47ece1b8014f0 Mon Sep 17 00:00:00 2001 From: Jan Lelis Date: Wed, 20 Nov 2024 22:02:38 +0100 Subject: [PATCH 12/27] Use :all regex for Emoji pre-selection --- lib/unicode/display_width.rb | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/lib/unicode/display_width.rb b/lib/unicode/display_width.rb index 0d4f219..490a111 100644 --- a/lib/unicode/display_width.rb +++ b/lib/unicode/display_width.rb @@ -34,8 +34,6 @@ class DisplayWidth rgi_at: :REGEX_INCLUDE_MQE_UQE, possible: :REGEX_WELL_FORMED, } - # REGEX_NEEDS_EMOJI_HANDLING: ZWJ, VS16, MODIFIER, KEYCAP - REGEX_EMOJI_NOT_POSSIBLE = /\A[#*0-9]\z/ REGEX_EMOJI_VS16 = Regexp.union( Regexp.compile( Unicode::Emoji::REGEX_TEXT_PRESENTATION.source + @@ -68,7 +66,7 @@ def self.of(string, ambiguous = nil, overwrite = nil, old_options = {}, **option return width + string.size end - # Retrieve Emoji width, maybe: add quick check using REGEX_NEEDS_EMOJI_HANDLING + # Retrieve Emoji width if options[:emoji] != :none e_width, string = emoji_width( string, @@ -167,13 +165,9 @@ def self.emoji_width_via_possible(string, emoji_set_regex, strict_eaw = false, a res = 0 # For each string possibly an emoji - no_emoji_string = string.gsub(Unicode::Emoji::REGEX_POSSIBLE){ |emoji_candidate| - # Skip notorious false positives - if REGEX_EMOJI_NOT_POSSIBLE.match?(emoji_candidate) - emoji_candidate - + no_emoji_string = string.gsub(REGEX_EMOJI_ALL_SEQUENCES_AND_VS16){ |emoji_candidate| # Check if we have a combined Emoji with width 2 (or EAW an Apple Terminal) - elsif emoji_candidate == emoji_candidate[emoji_set_regex] + if emoji_candidate == emoji_candidate[emoji_set_regex] if strict_eaw res += self.of(emoji_candidate[0], ambiguous, emoji: false) else From 49f2b742ec6ca4d383b92a2e597efb2ec9943e22 Mon Sep 17 00:00:00 2001 From: Jan Lelis Date: Wed, 20 Nov 2024 22:08:05 +0100 Subject: [PATCH 13/27] Release v3.1.2 --- CHANGELOG.md | 4 ++-- lib/unicode/display_width/constants.rb | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 28b1249..9fd8967 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,6 +1,6 @@ # CHANGELOG -## 3.1.2 (unreleased) +## 3.1.2 - Performance improvements @@ -15,7 +15,7 @@ - Emoji modes: Differentiate between well-formed Emoji (`:possible`) and any ZWJ/modifier sequence (`:all`). The latter is more common and more efficient to implement. -- Unify `rgi_{fqe,mqe,uqe}` options to just `:rgi` to keep things simpler (corresponds to +- Unify `:rgi_{fqe,mqe,uqe}` options to just `:rgi` to keep things simpler (corresponds to the former `:rgi_uqe` option). Most terminals that want to support the RGI set will probably want to catch Emoji sequences with missing VS16s. - Add new `:all_no_vs16` and `:rgi_at` modes to be able to support some terminals diff --git a/lib/unicode/display_width/constants.rb b/lib/unicode/display_width/constants.rb index def1937..3c95e28 100644 --- a/lib/unicode/display_width/constants.rb +++ b/lib/unicode/display_width/constants.rb @@ -2,7 +2,7 @@ module Unicode class DisplayWidth - VERSION = "3.1.1" + VERSION = "3.1.2" UNICODE_VERSION = "16.0.0" DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/") INDEX_FILENAME = DATA_DIRECTORY + "/display_width.marshal.gz" From b00c5bf8bbb9e4039e0e970e7e85ef8e54673665 Mon Sep 17 00:00:00 2001 From: Jan Lelis Date: Thu, 21 Nov 2024 10:09:49 +0100 Subject: [PATCH 14/27] Add link to terminal-emoji-width.rb --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 5c26f85..dd9fa2d 100644 --- a/README.md +++ b/README.md @@ -126,7 +126,7 @@ The `emoji:` option can be used to configure which type of Emoji should be consi Unfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` or `emoji: :auto` is used, the gem will attempt to set the best fitting Emoji setting for you (e.g. `:rgi_at` on "Apple_Terminal" or `false` on Gnome's terminal widget). -Please note that Emoji display and number of terminal columns used might differs a lot. For example, it might be the case that a terminal does not understand which Emoji to display, but still manages to calculate the proper amount of terminal cells. The automatic Emoji support level per terminal only considers the latter (cursor position), not the actual Emoji image(s) displayed. Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value. Also see the [ucs-detect project](https://ucs-detect.readthedocs.io/results.html), which is a great resource that compares various terminal's Unicode/Emoji capabilities. +Please note that Emoji display and number of terminal columns used might differs a lot. For example, it might be the case that a terminal does not understand which Emoji to display, but still manages to calculate the proper amount of terminal cells. The automatic Emoji support level per terminal only considers the latter (cursor position), not the actual Emoji image(s) displayed. Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value. Also see the [ucs-detect project](https://ucs-detect.readthedocs.io/results.html), which is a great resource that compares various terminal's Unicode/Emoji capabilities. You can checkout how your terminals renders different kind of Emoji types with this [terminal-emoji-width.rb script](https://github.com/janlelis/unicode-display_width/blob/main/misc/terminal-emoji-width.rb). **To terminal implementors reading this:** Although the practice of giving all Emoji/ZWJ sequences a width of 2 (`:all` mode described above) has some advantages, it does not lead to a particularly good developer experience. Since there is always the possibility of well-formed Emoji that are currently not supported (non-RGI / future Unicode) appearing, those sequences will take more cells. Instead of overflowing, cutting off sequences or displaying placeholder-Emoji, could it be worthwile to implement the `:rgi` option (only known Emoji get width 2) and give those unknown Emoji the space they need? This would support the idea that the meaning of an unknown Emoji sequence can still be conveyed (without messing up the terminal at the same time). Just a thought… From bc47d28fa29e8b48792958515fc00f22b20fe9d4 Mon Sep 17 00:00:00 2001 From: Earlopain <14981592+Earlopain@users.noreply.github.com> Date: Wed, 25 Dec 2024 18:22:00 +0100 Subject: [PATCH 15/27] Handle invalid encoded strings When trying to calculate the width of such strings, it would previously crash with either `Encoding::InvalidByteSequenceError` or `Encoding::UndefinedConversionError`. Totally invalid characters are now simply replaced with a replacement character when converting to UTF8. Especially binary encoded strings (i.e. no encoding) don't make much sense but at least it doesn't crash now and tries to return a sensible default (assume the string is actually valid UTF8 --- lib/unicode/display_width.rb | 10 ++++++++-- spec/display_width_spec.rb | 11 +++++++++++ 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/lib/unicode/display_width.rb b/lib/unicode/display_width.rb index 490a111..4e7e89a 100644 --- a/lib/unicode/display_width.rb +++ b/lib/unicode/display_width.rb @@ -47,7 +47,14 @@ class DisplayWidth # Returns monospace display width of string def self.of(string, ambiguous = nil, overwrite = nil, old_options = {}, **options) - string = string.encode(Encoding::UTF_8) unless string.encoding == Encoding::UTF_8 + # Binary strings don't make much sense when calculating display width. + # Assume it's valid UTF-8 + if string.encoding == Encoding::BINARY && !string.force_encoding(Encoding::UTF_8).valid_encoding? + # Didn't work out, go back to binary + string.force_encoding(Encoding::BINARY) + end + + string = string.encode(Encoding::UTF_8, invalid: :replace, undef: :replace) unless string.encoding == Encoding::UTF_8 options = normalize_options(string, ambiguous, overwrite, old_options, **options) width = 0 @@ -236,4 +243,3 @@ def of(string, **kwargs) end end end - diff --git a/spec/display_width_spec.rb b/spec/display_width_spec.rb index 5c33e75..37ee149 100644 --- a/spec/display_width_spec.rb +++ b/spec/display_width_spec.rb @@ -183,6 +183,17 @@ it 'works with non-utf8 Unicode encodings' do expect( 'Γ€'.encode("UTF-16LE").display_width ).to eq 1 end + + it 'works with a string that is invalid in its encoding' do + s = "\x81\x39".dup.force_encoding(Encoding::SHIFT_JIS) + + # Would print as οΏ½9 on the terminal + expect( s.display_width ).to eq 2 + end + + it 'works with a binary encoded string that is valid in UTF-8' do + expect( '€'.b.display_width ).to eq 1 + end end describe '[emoji]' do From 620454c3cb6c8cd8e2fbe00045befb6be4431515 Mon Sep 17 00:00:00 2001 From: Jan Lelis Date: Thu, 26 Dec 2024 18:07:33 +0100 Subject: [PATCH 16/27] Add Encoding note to README and CHANGELOG --- CHANGELOG.md | 7 +++++++ README.md | 5 +++++ 2 files changed, 12 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 9fd8967..ff76fb6 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,12 @@ # CHANGELOG +## 3.1.3 (unreleased) + +Better handling of non-UTF-8 strings: + +- Data with *BINARY* encoding is interpreted as UTF-8, if possible +- Use `invalid: :replace` and `undef: :replace` options when converting to UTF-8 + ## 3.1.2 - Performance improvements diff --git a/README.md b/README.md index dd9fa2d..9b8f593 100644 --- a/README.md +++ b/README.md @@ -71,6 +71,11 @@ Unicode::DisplayWidth.of("Β·", 1) # => 1 Unicode::DisplayWidth.of("Β·", 2) # => 2 ``` +### Encoding Notes + +- Data with *BINARY* encoding is interpreted as UTF-8, if possible +- Non-UTF-8 strings are converted to UTF-8 before measuring, using the [`{invalid: :replace, undef: :replace}`) options](https://ruby-doc.org/3.3.5/encodings_rdoc.html#label-Encoding+Options) + ### Custom Overwrites You can overwrite how to handle specific code points by passing a hash (or even a proc) as `overwrite:` parameter: From 5ed64f9f8ca53200797d1ae79caef0d72c5da59a Mon Sep 17 00:00:00 2001 From: Jan Lelis Date: Thu, 26 Dec 2024 18:09:16 +0100 Subject: [PATCH 17/27] CI: Add Ruby 3.4 --- .github/workflows/test.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index 425af10..9942ff4 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -9,6 +9,7 @@ jobs: strategy: matrix: ruby: + - '3.4' - '3.3' - '3.2' - '3.1' @@ -36,6 +37,7 @@ jobs: strategy: matrix: ruby: + #- '3.4' - '3.3' - '3.2' - '3.1' From 2fbc7a70e2c84d9fa10c23855deb34c1867388b0 Mon Sep 17 00:00:00 2001 From: Jan Lelis Date: Thu, 26 Dec 2024 18:18:15 +0100 Subject: [PATCH 18/27] CI: Deactivate jruby till jar-dependencies issue is sorted out --- .github/workflows/test.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index 9942ff4..f974877 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -15,7 +15,7 @@ jobs: - '3.1' - '3.0' - '2.7' - - jruby + #- jruby - truffleruby os: - ubuntu-latest @@ -43,7 +43,7 @@ jobs: - '3.1' - '3.0' - '2.7' - - jruby + #- jruby runs-on: windows-latest steps: - uses: actions/checkout@v4 From 893f9a9424528db8ba662186af5a3a4a5f27a356 Mon Sep 17 00:00:00 2001 From: Jan Lelis Date: Thu, 26 Dec 2024 18:21:38 +0100 Subject: [PATCH 19/27] Release v3.1.3 --- CHANGELOG.md | 5 +++-- lib/unicode/display_width/constants.rb | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index ff76fb6..9b6040a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,8 +1,8 @@ # CHANGELOG -## 3.1.3 (unreleased) +## 3.1.3 -Better handling of non-UTF-8 strings: +Better handling of non-UTF-8 strings, patch by @Earlopain: - Data with *BINARY* encoding is interpreted as UTF-8, if possible - Use `invalid: :replace` and `undef: :replace` options when converting to UTF-8 @@ -35,6 +35,7 @@ Better handling of non-UTF-8 strings: ## 3.0.1 + - Add WezTerm and foot as good Emoji terminals ## 3.0.0 diff --git a/lib/unicode/display_width/constants.rb b/lib/unicode/display_width/constants.rb index 3c95e28..45be16f 100644 --- a/lib/unicode/display_width/constants.rb +++ b/lib/unicode/display_width/constants.rb @@ -2,7 +2,7 @@ module Unicode class DisplayWidth - VERSION = "3.1.2" + VERSION = "3.1.3" UNICODE_VERSION = "16.0.0" DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/") INDEX_FILENAME = DATA_DIRECTORY + "/display_width.marshal.gz" From dc64170c3b89cc95b0254af97aaeec9c92dd52a8 Mon Sep 17 00:00:00 2001 From: Jan Lelis Date: Mon, 13 Jan 2025 12:56:51 +0100 Subject: [PATCH 20/27] Fix and improve handling of Skin Tone Modifiers: - Fix that modifiers were ignored when not part of a larger sequence #29 - Only check for valid base characters when in Emoji level is RGI - Improve docs and specs --- CHANGELOG.md | 6 ++++++ README.md | 12 ++++++++---- lib/unicode/display_width.rb | 4 +++- misc/terminal-emoji-width.rb | 8 ++++++++ spec/display_width_spec.rb | 20 +++++++++++++++----- 5 files changed, 40 insertions(+), 10 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 9b6040a..dddf07a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,11 @@ # CHANGELOG +## 3.1.4 + +- Fix that skin tone modifiers were ignored when used in a non-ZWJ sequence + context (= single emoji char + modifier) #29 +- Add more docs and specs about modifier handling + ## 3.1.3 Better handling of non-UTF-8 strings, patch by @Earlopain: diff --git a/README.md b/README.md index 9b8f593..ca80ab7 100644 --- a/README.md +++ b/README.md @@ -101,12 +101,16 @@ There are many Emoji which get constructed by combining other Emoji in a sequenc Another aspect where terminals disagree is whether Emoji characters which have a text presentation by default (width 1) should be turned into full-width (width 2) when combined with Variation Selector 16 (*U+FEOF*). +Finally, it varies if Skin Tone Modifiers can be applied to all characters or just to those with the "Emoji Base" property. + Emoji Type | Width / Comment ------------|---------------- -Basic/Single Emoji character without Variation Selector | No special handling -Basic/Single Emoji character with VS15 (Text) | No special handling -Basic/Single Emoji character with VS16 (Emoji) | 2 or East Asian Width (see table below) -Emoji Sequence | 2 if Emoji belongs to configured Emoji set (see table below) +Basic/Single Emoji character without Variation Selector | No special handling +Basic/Single Emoji character with VS15 (Text) | No special handling +Basic/Single Emoji character with VS16 (Emoji) | 2 or East Asian Width (see table below) +Single Emoji character with Skin Tone Modifier | 2 +Skin Tone Modifier used in isolation or with invalid base | 2 if Emoji mode is configured to RGI +Emoji Sequence | 2 if Emoji belongs to configured Emoji set (see table below) #### Emoji Modes diff --git a/lib/unicode/display_width.rb b/lib/unicode/display_width.rb index 4e7e89a..a95a77e 100644 --- a/lib/unicode/display_width.rb +++ b/lib/unicode/display_width.rb @@ -42,7 +42,9 @@ class DisplayWidth ), Unicode::Emoji::REGEX_EMOJI_KEYCAP ) - REGEX_EMOJI_ALL_SEQUENCES = Regexp.union(/.[\u{1F3FB}-\u{1F3FF}\u{FE0F}]?(\u{200D}.[\u{1F3FB}-\u{1F3FF}\u{FE0F}]?)+/, Unicode::Emoji::REGEX_EMOJI_KEYCAP) + + # ebase = Unicode::Emoji::REGEX_PROP_MODIFIER_BASE.source + REGEX_EMOJI_ALL_SEQUENCES = Regexp.union(/.[\u{1F3FB}-\u{1F3FF}\u{FE0F}]?(\u{200D}.[\u{1F3FB}-\u{1F3FF}\u{FE0F}]?)+|.[\u{1F3FB}-\u{1F3FF}]/, Unicode::Emoji::REGEX_EMOJI_KEYCAP) REGEX_EMOJI_ALL_SEQUENCES_AND_VS16 = Regexp.union(REGEX_EMOJI_ALL_SEQUENCES, REGEX_EMOJI_VS16) # Returns monospace display width of string diff --git a/misc/terminal-emoji-width.rb b/misc/terminal-emoji-width.rb index 75fe894..09fd2cb 100755 --- a/misc/terminal-emoji-width.rb +++ b/misc/terminal-emoji-width.rb @@ -11,6 +11,14 @@ puts puts RULER + "⛹️" + ABC +puts "1C) BASE EMOJI CHARACTER + MODIFIER" +puts +puts RULER + "πŸƒπŸ½" + ABC + +puts "1D) MODIFIER IN ISOLATION" +puts +puts RULER + "Z🏽" + ABC + puts "2) RGI EMOJI SEQ" puts puts RULER + "πŸƒπŸΌβ€β™€β€βž‘" + ABC diff --git a/spec/display_width_spec.rb b/spec/display_width_spec.rb index 37ee149..067dd97 100644 --- a/spec/display_width_spec.rb +++ b/spec/display_width_spec.rb @@ -221,10 +221,6 @@ end describe '(special emoji / emoji sequences)' do - it 'works with singleton skin tone modifiers: width 2' do - expect( "🏿".display_width(emoji: :all) ).to eq 2 - end - it 'works with flags: width 2' do expect( "πŸ‡΅πŸ‡Ή".display_width(emoji: :all) ).to eq 2 end @@ -239,8 +235,12 @@ end describe '(modifiers and zwj sequences)' do + it 'applies simple skin tone modifiers' do + expect( "πŸ‘πŸ½".display_width(emoji: :rgi) ).to eq 2 + end + it 'counts RGI Emoji ZWJ sequence as width 2' do - expect( "πŸ€ΎπŸ½β€β™€οΈ".display_width(1, emoji: :rgi) ).to eq 2 + expect( "πŸ€ΎπŸ½β€β™€οΈ".display_width(emoji: :rgi) ).to eq 2 end it 'works for emoji involving characters which are east asian ambiguous' do @@ -253,6 +253,7 @@ it 'does no Emoji adjustments when emoji suport is disabled' do expect( "πŸ€ΎπŸ½β€β™€οΈ".display_width(emoji: false) ).to eq 5 expect( "❣️".display_width(emoji: :none) ).to eq 1 + expect( "πŸ‘πŸ½".display_width(emoji: :none) ).to eq 4 end end @@ -277,6 +278,8 @@ expect( "πŸ€ΎπŸ½β€β™€οΈ".display_width(emoji: :rgi) ).to eq 2 # FQE expect( "πŸ€ΎπŸ½β€β™€".display_width(emoji: :rgi) ).to eq 2 # MQE expect( "β€β€πŸ©Ή".display_width(emoji: :rgi) ).to eq 2 # UQE + expect( "πŸ‘πŸ½".display_width(emoji: :rgi) ).to eq 2 # Modifier + expect( "J🏽".display_width(emoji: :rgi) ).to eq 3 # Modifier with invalid base expect( "πŸ€ β€πŸ€’".display_width(emoji: :rgi) ).to eq 4 # Non-RGI/well-formed expect( "πŸš„πŸΎβ€β–ΆοΈ".display_width(emoji: :rgi) ).to eq 6 # Invalid/non-Emoji sequence end @@ -308,6 +311,8 @@ expect( "πŸ€ΎπŸ½β€β™€οΈ".display_width(emoji: :possible) ).to eq 2 # FQE expect( "πŸ€ΎπŸ½β€β™€".display_width(emoji: :possible) ).to eq 2 # MQE expect( "β€β€πŸ©Ή".display_width(emoji: :possible) ).to eq 2 # UQE + expect( "πŸ‘πŸ½".display_width(emoji: :possible) ).to eq 2 # Modifier + expect( "J🏽".display_width(emoji: :possible) ).to eq 3 # Modifier with invalid base expect( "πŸ€ β€πŸ€’".display_width(emoji: :possible) ).to eq 2 # Non-RGI/well-formed expect( "πŸš„πŸΎβ€β–ΆοΈ".display_width(emoji: :possible) ).to eq 6 # Invalid/non-Emoji sequence end @@ -322,6 +327,9 @@ expect( "πŸ€ΎπŸ½β€β™€οΈ".display_width(emoji: :all) ).to eq 2 # FQE expect( "πŸ€ΎπŸ½β€β™€".display_width(emoji: :all) ).to eq 2 # MQE expect( "β€β€πŸ©Ή".display_width(emoji: :all) ).to eq 2 # UQE + expect( "πŸ‘πŸ½".display_width(emoji: :all) ).to eq 2 # Modifier + expect( "πŸ‘πŸ½".display_width(emoji: :all) ).to eq 2 # Modifier + expect( "J🏽".display_width(emoji: :all) ).to eq 2 # Modifier with invalid base expect( "πŸ€ β€πŸ€’".display_width(emoji: :all) ).to eq 2 # Non-RGI/well-formed expect( "πŸš„πŸΎβ€β–ΆοΈ".display_width(emoji: :all) ).to eq 2 # Invalid/non-Emoji sequence end @@ -336,6 +344,8 @@ expect( "πŸ€ΎπŸ½β€β™€οΈ".display_width(emoji: :all_no_vs16) ).to eq 2 # FQE expect( "πŸ€ΎπŸ½β€β™€".display_width(emoji: :all_no_vs16) ).to eq 2 # MQE expect( "β€β€πŸ©Ή".display_width(emoji: :all_no_vs16) ).to eq 2 # UQE + expect( "πŸ‘πŸ½".display_width(emoji: :all_no_vs16) ).to eq 2 # Modifier + expect( "J🏽".display_width(emoji: :all_no_vs16) ).to eq 2 # Modifier with wrong base expect( "πŸ€ β€πŸ€’".display_width(emoji: :all_no_vs16) ).to eq 2 # Non-RGI/well-formed expect( "πŸš„πŸΎβ€β–ΆοΈ".display_width(emoji: :all_no_vs16) ).to eq 2 # Invalid/non-Emoji sequence end From 4bcbf6ae58b83fbde4a86bd99d8663c5a0d3b8ba Mon Sep 17 00:00:00 2001 From: Jan Lelis Date: Mon, 13 Jan 2025 13:02:39 +0100 Subject: [PATCH 21/27] CI: Add jruby and Ruby 3.4 on Windows --- .github/workflows/test.yml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index f974877..da4f178 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -15,7 +15,7 @@ jobs: - '3.1' - '3.0' - '2.7' - #- jruby + - jruby - truffleruby os: - ubuntu-latest @@ -37,13 +37,13 @@ jobs: strategy: matrix: ruby: - #- '3.4' + - '3.4' - '3.3' - '3.2' - '3.1' - '3.0' - '2.7' - #- jruby + - jruby runs-on: windows-latest steps: - uses: actions/checkout@v4 From a515fa2c6898a2a886702c012cad6541fa2386e1 Mon Sep 17 00:00:00 2001 From: Jan Lelis Date: Mon, 13 Jan 2025 13:07:31 +0100 Subject: [PATCH 22/27] Release v3.1.4 --- README.md | 4 ++-- lib/unicode/display_width/constants.rb | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index ca80ab7..9dfa176 100644 --- a/README.md +++ b/README.md @@ -109,7 +109,7 @@ Basic/Single Emoji character without Variation Selector | No special handling Basic/Single Emoji character with VS15 (Text) | No special handling Basic/Single Emoji character with VS16 (Emoji) | 2 or East Asian Width (see table below) Single Emoji character with Skin Tone Modifier | 2 -Skin Tone Modifier used in isolation or with invalid base | 2 if Emoji mode is configured to RGI +Skin Tone Modifier used in isolation or with invalid base | 2 if Emoji mode is configured to `:rgi` / `:rgi_at` Emoji Sequence | 2 if Emoji belongs to configured Emoji set (see table below) #### Emoji Modes @@ -135,7 +135,7 @@ The `emoji:` option can be used to configure which type of Emoji should be consi Unfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` or `emoji: :auto` is used, the gem will attempt to set the best fitting Emoji setting for you (e.g. `:rgi_at` on "Apple_Terminal" or `false` on Gnome's terminal widget). -Please note that Emoji display and number of terminal columns used might differs a lot. For example, it might be the case that a terminal does not understand which Emoji to display, but still manages to calculate the proper amount of terminal cells. The automatic Emoji support level per terminal only considers the latter (cursor position), not the actual Emoji image(s) displayed. Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value. Also see the [ucs-detect project](https://ucs-detect.readthedocs.io/results.html), which is a great resource that compares various terminal's Unicode/Emoji capabilities. You can checkout how your terminals renders different kind of Emoji types with this [terminal-emoji-width.rb script](https://github.com/janlelis/unicode-display_width/blob/main/misc/terminal-emoji-width.rb). +Please note that Emoji display and number of terminal columns used might differs a lot. For example, it might be the case that a terminal does not understand which Emoji to display, but still manages to calculate the proper amount of terminal cells. The automatic Emoji support level per terminal only considers the latter (cursor position), not the actual Emoji image(s) displayed. Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value. Also see the [ucs-detect project](https://ucs-detect.readthedocs.io/results.html), which is a great resource that compares various terminal's Unicode/Emoji capabilities. You can visually check how your terminals renders different kind of Emoji types with the [terminal-emoji-width.rb script](https://github.com/janlelis/unicode-display_width/blob/main/misc/terminal-emoji-width.rb). **To terminal implementors reading this:** Although the practice of giving all Emoji/ZWJ sequences a width of 2 (`:all` mode described above) has some advantages, it does not lead to a particularly good developer experience. Since there is always the possibility of well-formed Emoji that are currently not supported (non-RGI / future Unicode) appearing, those sequences will take more cells. Instead of overflowing, cutting off sequences or displaying placeholder-Emoji, could it be worthwile to implement the `:rgi` option (only known Emoji get width 2) and give those unknown Emoji the space they need? This would support the idea that the meaning of an unknown Emoji sequence can still be conveyed (without messing up the terminal at the same time). Just a thought… diff --git a/lib/unicode/display_width/constants.rb b/lib/unicode/display_width/constants.rb index 45be16f..dae60e3 100644 --- a/lib/unicode/display_width/constants.rb +++ b/lib/unicode/display_width/constants.rb @@ -2,7 +2,7 @@ module Unicode class DisplayWidth - VERSION = "3.1.3" + VERSION = "3.1.4" UNICODE_VERSION = "16.0.0" DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/") INDEX_FILENAME = DATA_DIRECTORY + "/display_width.marshal.gz" From 85692f4f83d914fde6ca160134c0913eb4604984 Mon Sep 17 00:00:00 2001 From: Jan Lelis Date: Mon, 13 Jan 2025 13:16:51 +0100 Subject: [PATCH 23/27] Improve README --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 9dfa176..e577ebe 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # Unicode::DisplayWidth [![[version]](https://badge.fury.io/rb/unicode-display_width.svg)](https://badge.fury.io/rb/unicode-display_width) [](https://github.com/janlelis/unicode-display_width/actions?query=workflow%3ATest) -Determines the monospace display width of a string in Ruby, which is useful for all kinds of terminal-based applications. The implementation is based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt), the [Emoji specfication](https://www.unicode.org/reports/tr51/) and other data, 100% in Ruby. It does not rely on the OS vendor ([wcwidth()](https://github.com/janlelis/wcswidth-ruby)) to provide an up-to-date method for measuring string width in terminals. +Determines the monospace display width of a string in Ruby, which is useful for all kinds of terminal-based applications. The implementation is based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt), the [Emoji specfication](https://www.unicode.org/reports/tr51/) and other data, 100% in Ruby. It does not rely on the OS vendor ([wcwidth](https://github.com/janlelis/wcswidth-ruby)) to provide an up-to-date method for measuring string width in terminals. Unicode version: **16.0.0** (September 2024) @@ -108,8 +108,8 @@ Emoji Type | Width / Comment Basic/Single Emoji character without Variation Selector | No special handling Basic/Single Emoji character with VS15 (Text) | No special handling Basic/Single Emoji character with VS16 (Emoji) | 2 or East Asian Width (see table below) -Single Emoji character with Skin Tone Modifier | 2 -Skin Tone Modifier used in isolation or with invalid base | 2 if Emoji mode is configured to `:rgi` / `:rgi_at` +Single Emoji character with Skin Tone Modifier | 2 unless Emoji mode is `:none` or `vs16` +Skin Tone Modifier used in isolation or with invalid base | 2 if Emoji mode is `:rgi` / `:rgi_at` Emoji Sequence | 2 if Emoji belongs to configured Emoji set (see table below) #### Emoji Modes From 6632fe0235ace6f5aec8d8294b9b196fe6ca8358 Mon Sep 17 00:00:00 2001 From: Earlopain <14981592+Earlopain@users.noreply.github.com> Date: Mon, 10 Mar 2025 14:19:37 +0100 Subject: [PATCH 24/27] Memoize `EmojiSupport.recommended` It's a simple function and I believe the result is not supposed to change at runtime. Strings were not frozen, so it made a bunch of useless allocations too. I found this method while benchmarking, which was unexpected. Some basic numbers: ``` # frozen_string_literal: true require 'unicode/display_width' require 'benchmark/ips' Benchmark.ips do |x| x.report do Unicode::DisplayWidth.of("foo") end x.compare! end ``` Old gives `207.779k` iterations per second, while new one is `427.344k`. So more than 2x faster for basic cases --- lib/unicode/display_width/emoji_support.rb | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/lib/unicode/display_width/emoji_support.rb b/lib/unicode/display_width/emoji_support.rb index 5106342..46f927a 100644 --- a/lib/unicode/display_width/emoji_support.rb +++ b/lib/unicode/display_width/emoji_support.rb @@ -1,5 +1,4 @@ -# require "rbconfig" -# RbConfig::CONFIG["host_os"] =~ /mswin|mingw/ # windows +# frozen_string_literal: true module Unicode class DisplayWidth @@ -13,6 +12,10 @@ module EmojiSupport # Please note: Many terminals do not set any ENV vars, # maybe CSI queries can help? def self.recommended + @recommended ||= _recommended + end + + def self._recommended if ENV["CI"] return :rqi end From 1352b288ca6b4e474c84411ae385a4680d337af0 Mon Sep 17 00:00:00 2001 From: Jan Lelis Date: Fri, 15 Aug 2025 13:59:46 +0200 Subject: [PATCH 25/27] Release v3.1.5 --- CHANGELOG.md | 4 ++++ lib/unicode/display_width/constants.rb | 2 +- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index dddf07a..8402b7c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,9 @@ # CHANGELOG +## 3.1.5 + +- Cache Emoji support level for performance reasons #30, patch by @Earlopain: + ## 3.1.4 - Fix that skin tone modifiers were ignored when used in a non-ZWJ sequence diff --git a/lib/unicode/display_width/constants.rb b/lib/unicode/display_width/constants.rb index dae60e3..17a7b77 100644 --- a/lib/unicode/display_width/constants.rb +++ b/lib/unicode/display_width/constants.rb @@ -2,7 +2,7 @@ module Unicode class DisplayWidth - VERSION = "3.1.4" + VERSION = "3.1.5" UNICODE_VERSION = "16.0.0" DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/") INDEX_FILENAME = DATA_DIRECTORY + "/display_width.marshal.gz" From 8965d625448f157dc2ab3e023d8aa6739b758921 Mon Sep 17 00:00:00 2001 From: Jan Lelis Date: Sun, 7 Sep 2025 21:35:43 +0200 Subject: [PATCH 26/27] Unicode 17 --- CHANGELOG.md | 4 ++++ README.md | 4 ++-- data/display_width.marshal.gz | Bin 2055 -> 2068 bytes lib/unicode/display_width/constants.rb | 2 +- 4 files changed, 7 insertions(+), 3 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 8402b7c..9ccdd63 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,9 @@ # CHANGELOG +## 3.2.0 (unreleased) + +- Unicode 17 + ## 3.1.5 - Cache Emoji support level for performance reasons #30, patch by @Earlopain: diff --git a/README.md b/README.md index e577ebe..0bbff43 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ Determines the monospace display width of a string in Ruby, which is useful for all kinds of terminal-based applications. The implementation is based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt), the [Emoji specfication](https://www.unicode.org/reports/tr51/) and other data, 100% in Ruby. It does not rely on the OS vendor ([wcwidth](https://github.com/janlelis/wcswidth-ruby)) to provide an up-to-date method for measuring string width in terminals. -Unicode version: **16.0.0** (September 2024) +Unicode version: **17.0.0** (September 2025) ## Gem Version 3 β€” Improved Emoji Support @@ -188,7 +188,7 @@ See [unicode-x](https://github.com/janlelis/unicode-x) for more Unicode related ## Copyright & Info -- Copyright (c) 2011, 2015-2024 Jan Lelis, https://janlelis.com, released under the MIT +- Copyright (c) 2011, 2015-2025 Jan Lelis, https://janlelis.com, released under the MIT license - Early versions based on runpaint's unicode-data interface: Copyright (c) 2009 Run Paint Run Run - Unicode data: https://www.unicode.org/copyright.html#Exhibit1 diff --git a/data/display_width.marshal.gz b/data/display_width.marshal.gz index 258d84d7bb8ba32eaf8fb468725bb7074f881ec0..6aa6969e060c4249594156a6f413656f8d9e9b12 100644 GIT binary patch literal 2068 zcmV+v2I&$Xru^msRG16TqJG|als-8 zl0%^83|t&mu{Ng_<@Zh?;)HtY0Kn5yV0|aP|>@B3NvlN$Ha0d|x4{g)i>9 zXb?@*aFk=Qp}nM-oA4YI7Dk;xo+QlDp;rST-bogR+Ceam_Gt{qujB|q2J;rB%x(m;)0$PWhcMR z)Eh{^HW*!G*%qQEvG@jZMD7RQ8G=hwEiD2<=X=4_mQfqGoaRYy$iayrD*P|dQm&D< zqsEUZz%d5*B*{$ZsTt#Vd_y~N<($$5QP9|Wi{cpqT`DdCYid_!@N5P5h7gPVn#^Kw zC^kZ4 zVWabgpLd{`grNhPbyUea->MDa2i&FVZ!xfG2}FJDLF+|q7Sb7O`mO~Cr*#^cV#lkC z4$+HOhJr;$^qy6KnEn@OhC1Prt7W%|`%aQmuErc(&#d56iHNV0D}!N&>4PIS;{xO&PcfZVh!4&YFTOPbLDv{8ptmxlqf zmXDeL3A-{N;BSUfCb9%fR=g~9cTJd$yCskcAQ*AfRXbvbJiZ;$D>P-dsK911eIj0l zvx2Ebqfk3dm9F$D58L-xheETl3~Xq`zG4A}O^DJXo<5YQ9wjIbv>Tj(Bt?^Rf>YMW zwHlQ~q9+Q1Du%(y6B&r2LT?noPi(kl`AlRQHLiAc0mk&$8X zq;=8+wG~xe)>LI&p#xoJYmA6)_@-4?@dssR+V(Ab8?W~H3Be*o%v4zWZGpaaUXQGZ z)iPzYXrV7+@t9wF(i0u|!Cvt4L8slapy7D}!@*L!_>vGZ9E?oP5!*snK3dd56gxg^ z3TpvA!NgY$EDp*KqZpdx@RIHC(14i72rTIeOE4}|Mv(=6o3X^COS zFxC?Q$RedJLoh(>DNau9cd987&zXQ=ww zvH&mr1@>R);x4W|SU8x_=}v7BOC3+=0mZVc@f_4g!g?a=RYi9NB}vv!<;j3_<+uT- z^h=s~WPKm+70M^?s#(u8-V$906Q<$#dy9>H-~{Jox=&g#;FZ>edacK}cu=P*xKbj~ z6<4{SHCx4zhFcC-M)@_^)&?D}*Wv*`>4mMNDk5eFiIq~SOl_kp4bIs@K!WoJl%=Aa z5x8>Y?Km#7%K*!omxOcLp0a{)d=8hh%4Rl6c=8H(y993(phXJ7UPEdNB*j~VF4j;9 zV=m5`DFxx_8OlO$rZF*<^tc}HWZ1HXQyeT;PzfOdCmX~IsUp+cTD(EjDxA0=ta|s{%9xsP>$>L?Os`yoPwR zfyDzrInWjIbT+{Cr6ZOOEvu^2!LA3dSGs|rsEL#7!#Cyk@80Hb%E$IM<>bKm7MI;@ zDotBL+;U+s)u28HqSvv-!G{2nizck_9p2a?dvSno&+gjDUgPk<1o?wb$ zZj6{4Bj(14xiLa7pBp1G-nlViZj4|SY&yih)ff@7Ly?WaD-?Ckwr$E+F}cnEGYujU z4@DH>PzO#&Mw3!5O18JSZ_IvKdKwe7c%O?IKl0MA0&K*N@$53HXbH|V+1SB|rKv^ov8G&=h5WdbGLtHm=$I#p{l;v}E$521X yvgYf}&Xx>g-dEACOiwFSe13qT}1MM2SZW~8XMZhCPiahe}QV0G3V~qeQlqx_BL`C98h>I

-hQ-pEl`d({_Z5~}^x~?k z2Ghh1Cpi@h+9<``Oy{VmFzyWYBx9Bh{Td-p95Sl1RB#gDMmx?Xt2${@Eal`%i~?Z! zu`tTamMghpOMy5bR3hxRd2SxKumIDoW~ZVte*)Vs7NI7y_y&4J=?C8#ic3>1D*{31dm+@0aT}MM=gDp;!N>>|@fUb0ais64 z@l#50Ouz$aGBY|ha~#ia=qIk6Q??)u8cXj`JwxD2 zMg$e9S(FYQ{@Hv?XnHU@WKF=M9KzLJph01FuACoYAL z-xF-4aKumn)0`*~gHe)0k&kI#FdaB@Y3kJFOc0adhK5sA%3UF5J14wCEKB$~wv%HJ zCE9EFUI#`=6guEnXOz71jam?XAXKWc7L%HmKHA~{7BBKLqdTc>o%>}aIXG?4C=}aT zT?~d^yfP9jGh*ay0>u2kC^6LTmclHDOk8)?oC-4*+6?!l#8r^l9ek9+dU=4#?=x?1rVG#+NzzcLk-_f=M|oESX5xMnx09Q z(X3Ev@hHqrQ)MVU=V9xf>QLx5mV*tA+*crA+Jr1U;m$)v>Pdp~M7zZqC{Q#zBkZt7 zoz*x+Bz~eI=(I36xf=shbjX`T@)Ha0SS=Gp5+!9(v%9p7I7ta2GnM{>L=i=nI$&g2 z+-03~LG4A=kTp#iF?6EKVoj0I6<@R(D*mYaO#8m&Z_|@wekQO=88aQK{jMNi+pi~f zfYmx>ylADb;=wV$^kgU6@`JzN#e*)n*7ly$Y?M&xkPLUed;Myjc9v5 zXR2xi5y6yK9wQ4hb!n5mU)EK)e_<@j=%DhSA)GLPjc|wJ_F7ma6%Ryfw0Ra<1<^&V`wfy#u!Du~<_zb3 zh8Eyuw7~i+eXNVPM~eniI^CuX0@d+i9!RXq)18Bpk?@{KdQ-7o!Aa7!Q&ln$-8k;R zsrZs^9z{RI8-?=8t7_3Rk2l2E(S)hro%w?yhRX^!rj&t&C9TJ}T~OO6&`jL8;=UFP zWLq=xaKqusbbbwPw8hx!wI(1Wdf^u7xDd;LBuc43W}(TIzU6EsAjA0s>QZse7z`d% zX48yh*ZY>^u8HQnJZ=ABpByfikIk}?=$NC%d9tVo*lVa%fpEMO=;Dzm)eZTq)vOSR zPR~dd)-hd%DZWR1yk22w4c9+f4zKDv1WulYcS}X#wzFh|G*pCY!C3PO56j%^{I{}` zT>2tqPG9?G)~E%8pwc6nuBj}4w!(rN+1#=0BB+3^)Jjb#EZK@8GFiAcY}3!hOum?C ztQ&kD@AnF^!j(oVc=g?b71;Fxz7hYh(&i}$nbnPi1j%b;FW(L{c>g4s*i}e9D<;+o zoyOtvBCO|dSl)MZCso+o=9NvzvphNi9LF4u6NXwp-@zK%(f$=Px{8P=sMFa1Pe0mW zabLTf67m(-KP2e|dn*)|QU!cVegF1N{+4=be@mTz!WS~>7M0yBDnnXI+Hql->QRb; zOzIeN2oZo}tH#Ss-{FlZ(+FGuA`+gr$sObHK&y$h<98jMu@ERgIZ*IcSNzL)GVzEE z<}Xh=DJIKdN=CK2r6RKJR|mrlKM`y1`B0W(X-aT(3bsy6#>Q@{VQETOni7_#grzA# zqc2SfneNh*urwtY0m2QaCbKP`*_03}Ls5yrE1cJy?blSRVqu&AXC6cm9?nNdLv1!4 zI?R~LEh@BkxNfX`S!o)Jvv`||)jr1Ou@;%B@GD0-pC$-TlkxRBvq`tMq19kQ1)?6) z)Z`TFnevFmv8YC|qQ-&znKTD0uN@vpI5_(qSY%c`2=py5Z#Hklm=?#co&m^vj(Q6U z#){Va&s`S=}_OBY})w_Oc|yTkSSH zs%nyEhLzSJBtNDbXL8KUz-!T)qxLOAZJf4{M;q)?caGkMyUX1=HD^m}Kc|uBpru$E zdOnwio~J%bgN~@6q9T?C9rln!Rha)ntIlvPJObRhTe@`sZRyq#Trb@^%!25m*2N!5 zCu0jvgiE*1(yddNjC&W*Ox(DoTZePb7+kt_mTny_+tRJ`nRV;*vvz7R+UzRHFy@6~ lo@<;EArP0ApO2A=U6w2@KLGLhwfuzl{{Z~hzj@|D006mc@(=(3 diff --git a/lib/unicode/display_width/constants.rb b/lib/unicode/display_width/constants.rb index 17a7b77..5618b26 100644 --- a/lib/unicode/display_width/constants.rb +++ b/lib/unicode/display_width/constants.rb @@ -3,7 +3,7 @@ module Unicode class DisplayWidth VERSION = "3.1.5" - UNICODE_VERSION = "16.0.0" + UNICODE_VERSION = "17.0.0" DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/") INDEX_FILENAME = DATA_DIRECTORY + "/display_width.marshal.gz" end From 215328593f2e510923147880ac029b9e8cdc499c Mon Sep 17 00:00:00 2001 From: Jan Lelis Date: Tue, 9 Sep 2025 17:12:00 +0200 Subject: [PATCH 27/27] Release v3.2.0 --- CHANGELOG.md | 4 ++-- lib/unicode/display_width/constants.rb | 2 +- unicode-display_width.gemspec | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 9ccdd63..70a8fe9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,8 +1,8 @@ # CHANGELOG -## 3.2.0 (unreleased) +## 3.2.0 -- Unicode 17 +- Unicode 17.0 ## 3.1.5 diff --git a/lib/unicode/display_width/constants.rb b/lib/unicode/display_width/constants.rb index 5618b26..d14edfe 100644 --- a/lib/unicode/display_width/constants.rb +++ b/lib/unicode/display_width/constants.rb @@ -2,7 +2,7 @@ module Unicode class DisplayWidth - VERSION = "3.1.5" + VERSION = "3.2.0" UNICODE_VERSION = "17.0.0" DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/") INDEX_FILENAME = DATA_DIRECTORY + "/display_width.marshal.gz" diff --git a/unicode-display_width.gemspec b/unicode-display_width.gemspec index 1cd2caf..e42c207 100644 --- a/unicode-display_width.gemspec +++ b/unicode-display_width.gemspec @@ -13,7 +13,7 @@ Gem::Specification.new do |s| s.extra_rdoc_files = ["README.md", "MIT-LICENSE.txt", "CHANGELOG.md"] s.license = 'MIT' s.required_ruby_version = '>= 2.5.0' - s.add_dependency 'unicode-emoji', '~> 4.0', '>= 4.0.4' + s.add_dependency 'unicode-emoji', '~> 4.1' s.add_development_dependency 'rspec', '~> 3.4' s.add_development_dependency 'rake', '~> 13.0'