diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index 425af10..da4f178 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -9,6 +9,7 @@ jobs: strategy: matrix: ruby: + - '3.4' - '3.3' - '3.2' - '3.1' @@ -36,6 +37,7 @@ jobs: strategy: matrix: ruby: + - '3.4' - '3.3' - '3.2' - '3.1' diff --git a/CHANGELOG.md b/CHANGELOG.md index 6f19408..70a8fe9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,25 +1,55 @@ # CHANGELOG -## 3.1.0 (unreleased) +## 3.2.0 -**Further Emoji improvements:** +- Unicode 17.0 + +## 3.1.5 + +- Cache Emoji support level for performance reasons #30, patch by @Earlopain: + +## 3.1.4 + +- Fix that skin tone modifiers were ignored when used in a non-ZWJ sequence + context (= single emoji char + modifier) #29 +- Add more docs and specs about modifier handling + +## 3.1.3 + +Better handling of non-UTF-8 strings, patch by @Earlopain: + +- Data with *BINARY* encoding is interpreted as UTF-8, if possible +- Use `invalid: :replace` and `undef: :replace` options when converting to UTF-8 + +## 3.1.2 + +- Performance improvements + +## 3.1.1 + +- Performance improvements + +## 3.1.0 + +**Improve Emoji support:** - Emoji modes: Differentiate between well-formed Emoji (`:possible`) and any ZWJ/modifier sequence (`:all`). The latter is more common and more efficient to implement. -- Add alias `emoji: :auto` for `emoji: true` and `emoji: :none` for `emoji: false` -- Unify `rgi_*` options to just `rgi` to keep things simpler (corresponds to +- Unify `:rgi_{fqe,mqe,uqe}` options to just `:rgi` to keep things simpler (corresponds to the former `:rgi_uqe` option). Most terminals that want to support the RGI set will probably want to catch Emoji sequences with missing VS16s. -- Add new `:all_no_vs16` mode -- Only consider terminal cells needed when recommending Emoji support level +- Add new `:all_no_vs16` and `:rgi_at` modes to be able to support some terminals + that needs these quirks +- Add alias `emoji: :auto` for `emoji: true` and `emoji: :none` for `emoji: false` +- `:auto` mode: Only consider terminal cells when recommending Emoji support level (Emoji themselves might display differently) -- Set default Emoji mode for unknown/unsupported terminals to `:none` - (instead of `:basic`) - +- `:auto` mode: Set default Emoji mode for unknown/unsupported terminals to `:none` +- Rename `:basic` mode to `:vs16` ## 3.0.1 + - Add WezTerm and foot as good Emoji terminals ## 3.0.0 diff --git a/README.md b/README.md index 0d0ad51..0bbff43 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,8 @@ # Unicode::DisplayWidth [![[version]](https://badge.fury.io/rb/unicode-display_width.svg)](https://badge.fury.io/rb/unicode-display_width) [](https://github.com/janlelis/unicode-display_width/actions?query=workflow%3ATest) -Determines the monospace display width of a string in Ruby, which is useful for all kinds of terminal-based applications. The implementation is based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt), the [Emoji specfication](https://www.unicode.org/reports/tr51/) and other data, 100% in Ruby. It does not rely on the OS vendor ([wcwidth()](https://github.com/janlelis/wcswidth-ruby)) to provide an up-to-date method for measuring string width in terminals. +Determines the monospace display width of a string in Ruby, which is useful for all kinds of terminal-based applications. The implementation is based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt), the [Emoji specfication](https://www.unicode.org/reports/tr51/) and other data, 100% in Ruby. It does not rely on the OS vendor ([wcwidth](https://github.com/janlelis/wcswidth-ruby)) to provide an up-to-date method for measuring string width in terminals. -Unicode version: **16.0.0** (September 2024) +Unicode version: **17.0.0** (September 2025) ## Gem Version 3 — Improved Emoji Support @@ -71,6 +71,11 @@ Unicode::DisplayWidth.of("·", 1) # => 1 Unicode::DisplayWidth.of("·", 2) # => 2 ``` +### Encoding Notes + +- Data with *BINARY* encoding is interpreted as UTF-8, if possible +- Non-UTF-8 strings are converted to UTF-8 before measuring, using the [`{invalid: :replace, undef: :replace}`) options](https://ruby-doc.org/3.3.5/encodings_rdoc.html#label-Encoding+Options) + ### Custom Overwrites You can overwrite how to handle specific code points by passing a hash (or even a proc) as `overwrite:` parameter: @@ -96,39 +101,43 @@ There are many Emoji which get constructed by combining other Emoji in a sequenc Another aspect where terminals disagree is whether Emoji characters which have a text presentation by default (width 1) should be turned into full-width (width 2) when combined with Variation Selector 16 (*U+FEOF*). +Finally, it varies if Skin Tone Modifiers can be applied to all characters or just to those with the "Emoji Base" property. + Emoji Type | Width / Comment ------------|---------------- -Basic/Single Emoji character without Variation Selector | No special handling -Basic/Single Emoji character with VS15 (Text) | No special handling -Basic/Single Emoji character with VS16 (Emoji) | 2 (except with `emoji: :none` or `emoji: :all_no_vs16`) -Emoji Sequence | 2 if Emoji belongs to configured Emoji set +Basic/Single Emoji character without Variation Selector | No special handling +Basic/Single Emoji character with VS15 (Text) | No special handling +Basic/Single Emoji character with VS16 (Emoji) | 2 or East Asian Width (see table below) +Single Emoji character with Skin Tone Modifier | 2 unless Emoji mode is `:none` or `vs16` +Skin Tone Modifier used in isolation or with invalid base | 2 if Emoji mode is `:rgi` / `:rgi_at` +Emoji Sequence | 2 if Emoji belongs to configured Emoji set (see table below) -The `emoji:` option can be used to configure which type of Emoji should be considered to have a width of 2 and if VS16-Emoji should be widened. Other sequences are treated as non-combined Emoji, so the widths of all partial Emoji add up (e.g. width of one basic Emoji + one skin tone modifier + another basic Emoji). The following Emoji settings can be used: +#### Emoji Modes -Option | Description | Example Terminals --------|-------------|------------------ -`emoji: true` or `emoji: :auto` | Automatically use recommended Emoji setting for your terminal | - -`emoji: false` or `emoji: :none` | No Emoji adjustments, Emoji characters with VS16 not handled | Gnome Terminal, many older terminals -`emoji: :basic` | Full-width VS16-Emoji, but no width adjustments for Emoji sequences: All partial Emoji treated separately with a width of 2 | ? -`emoji: :rgi` | Full-width VS16-Emoji, all RGI Emoji sequences are considered to have a width of 2 | Apple Terminal -`emoji: :possible`| Full-width VS16-Emoji, all possible/well-formed Emoji sequences are considered to have a width of 2 | ? -`emoji: :all` | Full-width VS16-Emoji, all ZWJ/modifier/keycap sequences have a width of 2, even if they are not well-formed Emoji sequences | foot, Contour -`emoji: :all_no_vs16` | VS16-Emoji not handled, all ZWJ/modifier/keycap sequences to have a width of 2, even if they are not well-formed Emoji sequences | WezTerm +The `emoji:` option can be used to configure which type of Emoji should be considered to have a width of 2 and if VS16-Emoji should be widened. Other sequences are treated as non-combined Emoji, so the widths of all partial Emoji add up (e.g. width of one basic Emoji + one skin tone modifier + another basic Emoji). The following Emoji settings can be used: +`emoji:` Option | VS16-Emoji Width | Emoji Sequences Width / Comment | Example Terminals +----------------|------------------|---------------------------------|------------------ +`true` or `:auto` | - | Automatically use recommended Emoji setting for your terminal | - +`:all` | 2 | 2 for all ZWJ/modifier/keycap sequences, even if they are not well-formed Emoji sequences | iTerm, foot +`:all_no_vs16` | EAW (1 or 2) | 2 for all ZWJ/modifier/keycap sequences, even if they are not well-formed Emoji sequences | WezTerm +`:possible`| 2 | 2 for all possible/well-formed Emoji sequences | ? +`:rgi` | 2 | 2 for all [RGI Emoji](https://www.unicode.org/reports/tr51/#def_rgi_set) sequences | ? +`:rgi_at` | EAW (1 or 2) | 1 or 2: Like `:rgi`, but Emoji sequences starting with a default-text Emoji have EAW | Apple Terminal +`:vs16` | 2 | 2 * number of partial Emoji (sequences never considered to represent a combined Emoji) | kitty? +`false` or `:none` | EAW (1 or 2) | No Emoji adjustments | gnome-terminal, many older terminals + +- *EAW:* East Asian Width - *RGI Emoji:* Emoji Recommended for General Interchange - *ZWJ:* Zero-width Joiner: Codepoint `U+200D`,used in many Emoji sequences #### Emoji Support in Terminals -Unfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` or `emoji: :auto` is used, the gem will attempt to set the best fitting Emoji setting for you (e.g. `:rgi` on "Apple_Terminal" or `:none` on Gnome's terminal widget). - -Note that Emoji display and number of terminal columns used might differs a lot. For example, it might be the case that a terminal does not understand which Emoji to display, but still manages to calculate the proper amount of terminal cells. The automatic Emoji support level per terminal only considers the latter (cursor position), not the actual Emoji image(s) displayed. Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value. Also see the [ucs-detect project], which is a great resource that compares various terminal's Unicode/Emoji capabilities. - ---- +Unfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` or `emoji: :auto` is used, the gem will attempt to set the best fitting Emoji setting for you (e.g. `:rgi_at` on "Apple_Terminal" or `false` on Gnome's terminal widget). -To terminal implementors reading this: Although handling Emoji/ZWJ sequences as always having a width of 2 (`:all` mode described above) has some advantages, it does not lead to a particularly good developer experience. Since there is always the possibility of well-formed Emoji that are currently not supported (non-RGI / future Unicode) appearing, those sequences will take more cells. Instead of overflowing, cutting off sequences or displaying placeholder-Emoji, could it be worthwile to implement the `:rgi` option (see table above) and just give those unknown Emoji the space they need? It is painful to implement, I know, but it kind of underlines the idea that the meaning of an unknown Emoji sequence can still be conveyed (without messing up the terminal at the same time). Just a thought… +Please note that Emoji display and number of terminal columns used might differs a lot. For example, it might be the case that a terminal does not understand which Emoji to display, but still manages to calculate the proper amount of terminal cells. The automatic Emoji support level per terminal only considers the latter (cursor position), not the actual Emoji image(s) displayed. Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value. Also see the [ucs-detect project](https://ucs-detect.readthedocs.io/results.html), which is a great resource that compares various terminal's Unicode/Emoji capabilities. You can visually check how your terminals renders different kind of Emoji types with the [terminal-emoji-width.rb script](https://github.com/janlelis/unicode-display_width/blob/main/misc/terminal-emoji-width.rb). ---- +**To terminal implementors reading this:** Although the practice of giving all Emoji/ZWJ sequences a width of 2 (`:all` mode described above) has some advantages, it does not lead to a particularly good developer experience. Since there is always the possibility of well-formed Emoji that are currently not supported (non-RGI / future Unicode) appearing, those sequences will take more cells. Instead of overflowing, cutting off sequences or displaying placeholder-Emoji, could it be worthwile to implement the `:rgi` option (only known Emoji get width 2) and give those unknown Emoji the space they need? This would support the idea that the meaning of an unknown Emoji sequence can still be conveyed (without messing up the terminal at the same time). Just a thought… ### Usage with String Extension @@ -179,7 +188,7 @@ See [unicode-x](https://github.com/janlelis/unicode-x) for more Unicode related ## Copyright & Info -- Copyright (c) 2011, 2015-2024 Jan Lelis, https://janlelis.com, released under the MIT +- Copyright (c) 2011, 2015-2025 Jan Lelis, https://janlelis.com, released under the MIT license - Early versions based on runpaint's unicode-data interface: Copyright (c) 2009 Run Paint Run Run - Unicode data: https://www.unicode.org/copyright.html#Exhibit1 diff --git a/data/display_width.marshal.gz b/data/display_width.marshal.gz index c7cda7b..6aa6969 100644 Binary files a/data/display_width.marshal.gz and b/data/display_width.marshal.gz differ diff --git a/lib/unicode/display_width.rb b/lib/unicode/display_width.rb index a9ead54..a95a77e 100644 --- a/lib/unicode/display_width.rb +++ b/lib/unicode/display_width.rb @@ -8,9 +8,10 @@ module Unicode class DisplayWidth + DEFAULT_AMBIGUOUS = 1 INITIAL_DEPTH = 0x10000 - ASCII_NON_ZERO_REGEX = /[\0\x05\a\b\n\v\f\r\x0E\x0F]/ - ASCII_NON_ZERO_STRING = "\0\x05\a\b\n\v\f\r\x0E\x0F" + ASCII_NON_ZERO_REGEX = /[\0\x05\a\b\n-\x0F]/ + ASCII_NON_ZERO_STRING = "\0\x05\a\b\n-\x0F" ASCII_BACKSPACE = "\b" AMBIGUOUS_MAP = { 1 => :WIDTH_ONE, @@ -20,133 +21,83 @@ class DisplayWidth WIDTH_ONE: 768, WIDTH_TWO: 161, } + NOT_COMMON_NARROW_REGEX = { + WIDTH_ONE: /[^\u{10}-\u{2FF}]/m, + WIDTH_TWO: /[^\u{10}-\u{A1}]/m, + } FIRST_4096 = { WIDTH_ONE: decompress_index(INDEX[:WIDTH_ONE][0][0], 1), WIDTH_TWO: decompress_index(INDEX[:WIDTH_TWO][0][0], 1), } EMOJI_SEQUENCES_REGEX_MAPPING = { rgi: :REGEX_INCLUDE_MQE_UQE, + rgi_at: :REGEX_INCLUDE_MQE_UQE, possible: :REGEX_WELL_FORMED, } - REGEX_EMOJI_BASIC_OR_KEYCAP = Regexp.union(Unicode::Emoji::REGEX_BASIC, Unicode::Emoji::REGEX_EMOJI_KEYCAP) - REGEX_EMOJI_ALL_SEQUENCES = Regexp.union(/.[🏻-🏿\u{FE0F}]?(\u{200D}.[🏻-🏿\u{FE0F}]?)+/, Unicode::Emoji::REGEX_EMOJI_KEYCAP) - REGEX_EMOJI_NOT_POSSIBLE = /\A[#*0-9]\z/ + REGEX_EMOJI_VS16 = Regexp.union( + Regexp.compile( + Unicode::Emoji::REGEX_TEXT_PRESENTATION.source + + "(? 15 && codepoint < first_ambiguous - res += 1 - elsif codepoint < 0x1001 - res += index_low[codepoint] || 1 - else - d = INITIAL_DEPTH - w = index_full[codepoint / d] - while w.instance_of? Array - w = w[(codepoint %= d) / (d /= 16)] - end - - res += w || 1 - end - } + unless string.match?(NOT_COMMON_NARROW_REGEX[ambiguous_index_name]) + return width + string.size end - } - - res - end + end - # Same as .width_no_overwrite - but with applying overwrites for each char - def self.width_all_features(string, index_full, index_low, first_ambiguous, overwrite) - res = 0 + index_full = INDEX[ambiguous_index_name] + index_low = FIRST_4096[ambiguous_index_name] + first_ambiguous = FIRST_AMBIGUOUS[ambiguous_index_name] string.each_codepoint{ |codepoint| - if overwrite[codepoint] - res += overwrite[codepoint] - elsif codepoint > 15 && codepoint < first_ambiguous - res += 1 + if codepoint > 15 && codepoint < first_ambiguous + width += 1 elsif codepoint < 0x1001 - res += index_low[codepoint] || 1 + width += index_low[codepoint] || 1 else d = INITIAL_DEPTH w = index_full[codepoint / d] @@ -154,88 +105,91 @@ def self.width_all_features(string, index_full, index_low, first_ambiguous, over w = w[(codepoint %= d) / (d /= 16)] end - res += w || 1 + width += w || 1 end } - res + # Return result + prevent negative lengths + width < 0 ? 0 : width end + # Returns width of custom overwrites and remaining string + def self.width_custom(string, overwrite) + width = 0 - def self.emoji_width(string, mode = :all) - res = 0 + string = string.each_codepoint.select{ |codepoint| + if overwrite[codepoint] + width += overwrite[codepoint] + nil + else + codepoint + end + }.pack("U*") - string = string.encode(Encoding::UTF_8) unless string.encoding.name == "utf-8" + [width, string] + end - if emoji_set_regex = EMOJI_SEQUENCES_REGEX_MAPPING[mode] - emoji_width_via_possible(string, Unicode::Emoji.const_get(emoji_set_regex)) - elsif mode == :all_no_vs16 - emoji_width_all(string) - elsif mode == :basic - emoji_width_basic(string) - elsif mode == :all - res_all, string = emoji_width_all(string) - res_basic, string = emoji_width_basic(string) - [res_all + res_basic, string] - else - [0, string] + # Returns width for ASCII-only strings. Will consider zero-width control symbols. + def self.width_ascii(string) + if string.match?(ASCII_NON_ZERO_REGEX) + res = string.delete(ASCII_NON_ZERO_STRING).bytesize - string.count(ASCII_BACKSPACE) + return res < 0 ? 0 : res end + + string.bytesize end - # Ensure all explicit VS16 sequences have width 2 - def self.emoji_width_basic(string) + # Returns width of all considered Emoji and remaining string + def self.emoji_width(string, mode = :all, ambiguous = DEFAULT_AMBIGUOUS) res = 0 - no_emoji_string = string.gsub(REGEX_EMOJI_BASIC_OR_KEYCAP){ |basic_emoji| - if basic_emoji.size >= 2 # VS16 present - res += 2 - "" - else - basic_emoji - end - } + if emoji_set_regex = EMOJI_SEQUENCES_REGEX_MAPPING[mode] + emoji_width_via_possible( + string, + Unicode::Emoji.const_get(emoji_set_regex), + mode == :rgi_at, + ambiguous, + ) - [res, no_emoji_string] - end + elsif mode == :all_no_vs16 + no_emoji_string = string.gsub(REGEX_EMOJI_ALL_SEQUENCES){ res += 2; "" } + [res, no_emoji_string] - # Use simplistic ZWJ/modifier/kecap sequence matching - def self.emoji_width_all(string) - res = 0 + elsif mode == :vs16 + no_emoji_string = string.gsub(REGEX_EMOJI_VS16){ res += 2; "" } + [res, no_emoji_string] - no_emoji_string = string.gsub(REGEX_EMOJI_ALL_SEQUENCES){ - res += 2 - "" - } + elsif mode == :all + no_emoji_string = string.gsub(REGEX_EMOJI_ALL_SEQUENCES_AND_VS16){ res += 2; "" } + [res, no_emoji_string] - [res, no_emoji_string] + else + [0, string] + + end end # Match possible Emoji first, then refine - def self.emoji_width_via_possible(string, emoji_set_regex) + def self.emoji_width_via_possible(string, emoji_set_regex, strict_eaw = false, ambiguous = DEFAULT_AMBIGUOUS) res = 0 # For each string possibly an emoji - no_emoji_string = string.gsub(Unicode::Emoji::REGEX_POSSIBLE){ |emoji_candidate| - # Skip notorious false positives - if REGEX_EMOJI_NOT_POSSIBLE.match?(emoji_candidate) - emoji_candidate - - # Check if we have a combined Emoji with width 2 - elsif emoji_candidate == emoji_candidate[emoji_set_regex] - res += 2 + no_emoji_string = string.gsub(REGEX_EMOJI_ALL_SEQUENCES_AND_VS16){ |emoji_candidate| + # Check if we have a combined Emoji with width 2 (or EAW an Apple Terminal) + if emoji_candidate == emoji_candidate[emoji_set_regex] + if strict_eaw + res += self.of(emoji_candidate[0], ambiguous, emoji: false) + else + res += 2 + end "" # We are dealing with a default text presentation emoji or a well-formed sequence not matching the above Emoji set else - # Ensure all explicit VS16 sequences have width 2 - emoji_candidate.gsub!(Unicode::Emoji::REGEX_BASIC){ |basic_emoji| - if basic_emoji.size == 2 # VS16 present - res += 2 - "" - else - basic_emoji - end - } + if !strict_eaw + # Ensure all explicit VS16 sequences have width 2 + emoji_candidate.gsub!(REGEX_EMOJI_VS16){ res += 2; "" } + end emoji_candidate end @@ -244,7 +198,35 @@ def self.emoji_width_via_possible(string, emoji_set_regex) [res, no_emoji_string] end - def initialize(ambiguous: 1, overwrite: {}, emoji: true) + def self.normalize_options(string, ambiguous = nil, overwrite = nil, old_options = {}, **options) + unless old_options.empty? + warn "Unicode::DisplayWidth: Please migrate to keyword arguments - #{old_options.inspect}" + options.merge! old_options + end + + options[:ambiguous] = ambiguous if ambiguous + options[:ambiguous] ||= DEFAULT_AMBIGUOUS + + if options[:ambiguous] != 1 && options[:ambiguous] != 2 + raise ArgumentError, "Unicode::DisplayWidth: Ambiguous width must be 1 or 2" + end + + if overwrite && !overwrite.empty? + warn "Unicode::DisplayWidth: Please migrate to keyword arguments - overwrite: #{overwrite.inspect}" + options[:overwrite] = overwrite + end + options[:overwrite] ||= {} + + if [nil, true, :auto].include?(options[:emoji]) + options[:emoji] = EmojiSupport.recommended + elsif options[:emoji] == false + options[:emoji] = :none + end + + options + end + + def initialize(ambiguous: DEFAULT_AMBIGUOUS, overwrite: {}, emoji: true) @ambiguous = ambiguous @overwrite = overwrite @emoji = emoji @@ -263,4 +245,3 @@ def of(string, **kwargs) end end end - diff --git a/lib/unicode/display_width/constants.rb b/lib/unicode/display_width/constants.rb index aa04c39..d14edfe 100644 --- a/lib/unicode/display_width/constants.rb +++ b/lib/unicode/display_width/constants.rb @@ -2,8 +2,8 @@ module Unicode class DisplayWidth - VERSION = "3.0.1" - UNICODE_VERSION = "16.0.0" + VERSION = "3.2.0" + UNICODE_VERSION = "17.0.0" DATA_DIRECTORY = File.expand_path(File.dirname(__FILE__) + "/../../../data/") INDEX_FILENAME = DATA_DIRECTORY + "/display_width.marshal.gz" end diff --git a/lib/unicode/display_width/emoji_support.rb b/lib/unicode/display_width/emoji_support.rb index 704204e..46f927a 100644 --- a/lib/unicode/display_width/emoji_support.rb +++ b/lib/unicode/display_width/emoji_support.rb @@ -1,5 +1,4 @@ -# require "rbconfig" -# RbConfig::CONFIG["host_os"] =~ /mswin|mingw/ # windows +# frozen_string_literal: true module Unicode class DisplayWidth @@ -13,15 +12,19 @@ module EmojiSupport # Please note: Many terminals do not set any ENV vars, # maybe CSI queries can help? def self.recommended + @recommended ||= _recommended + end + + def self._recommended if ENV["CI"] - return :rqi_uqe + return :rqi end case ENV["TERM_PROGRAM"] when "iTerm.app" return :all - when "Apple_Terminal" # Also: If first Emoji part is EAW 1, gives whole ZWJ seqs width 1 - return :rgi + when "Apple_Terminal" + return :rgi_at when "WezTerm" return :all_no_vs16 end @@ -31,11 +34,11 @@ def self.recommended # konsole: all, how to detect? return :all when /kitty/ - return :basic + return :vs16 end if ENV["WT_SESSION"] # Windows Terminal - return :basic + return :vs16 end # As of last time checked: gnome-terminal, vscode, alacritty diff --git a/misc/terminal-emoji-width.rb b/misc/terminal-emoji-width.rb index 75fe894..09fd2cb 100755 --- a/misc/terminal-emoji-width.rb +++ b/misc/terminal-emoji-width.rb @@ -11,6 +11,14 @@ puts puts RULER + "⛹️" + ABC +puts "1C) BASE EMOJI CHARACTER + MODIFIER" +puts +puts RULER + "🏃🏽" + ABC + +puts "1D) MODIFIER IN ISOLATION" +puts +puts RULER + "Z🏽" + ABC + puts "2) RGI EMOJI SEQ" puts puts RULER + "🏃🏼‍♀‍➡" + ABC diff --git a/spec/display_width_spec.rb b/spec/display_width_spec.rb index 6c31e60..067dd97 100644 --- a/spec/display_width_spec.rb +++ b/spec/display_width_spec.rb @@ -175,7 +175,7 @@ end it 'can be passed as :overwrite option' do - expect( "\t".display_width(1, overwrite: { 0x09 => 12 }) ).to eq 12 + expect( "\t".display_width(overwrite: { 0x09 => 12 }) ).to eq 12 end end @@ -183,6 +183,17 @@ it 'works with non-utf8 Unicode encodings' do expect( 'À'.encode("UTF-16LE").display_width ).to eq 1 end + + it 'works with a string that is invalid in its encoding' do + s = "\x81\x39".dup.force_encoding(Encoding::SHIFT_JIS) + + # Would print as �9 on the terminal + expect( s.display_width ).to eq 2 + end + + it 'works with a binary encoded string that is valid in UTF-8' do + expect( '€'.b.display_width ).to eq 1 + end end describe '[emoji]' do @@ -210,10 +221,6 @@ end describe '(special emoji / emoji sequences)' do - it 'works with singleton skin tone modifiers: width 2' do - expect( "🏿".display_width(emoji: :all) ).to eq 2 - end - it 'works with flags: width 2' do expect( "🇵🇹".display_width(emoji: :all) ).to eq 2 end @@ -228,8 +235,12 @@ end describe '(modifiers and zwj sequences)' do + it 'applies simple skin tone modifiers' do + expect( "👏🏽".display_width(emoji: :rgi) ).to eq 2 + end + it 'counts RGI Emoji ZWJ sequence as width 2' do - expect( "🤾🏽‍♀️".display_width(1, emoji: :rgi) ).to eq 2 + expect( "🤾🏽‍♀️".display_width(emoji: :rgi) ).to eq 2 end it 'works for emoji involving characters which are east asian ambiguous' do @@ -240,58 +251,107 @@ describe '(modes)' do describe 'false / :none' do it 'does no Emoji adjustments when emoji suport is disabled' do - expect( "🤾🏽‍♀️".display_width(1, emoji: false) ).to eq 5 + expect( "🤾🏽‍♀️".display_width(emoji: false) ).to eq 5 expect( "❣️".display_width(emoji: :none) ).to eq 1 + expect( "👏🏽".display_width(emoji: :none) ).to eq 4 end end - describe ':basic' do + describe ':vs16' do it 'will ignore shorter width of all Emoji sequences' do # Please note that this is different from emoji: false / emoji: :none # -> Basic Emoji with VS16 still get normalized - expect( "🤾🏽‍♀️".display_width(1, emoji: :basic) ).to eq 6 + expect( "🤾🏽‍♀️".display_width(emoji: :vs16) ).to eq 6 + end + + it 'counts default-text presentation Emoji with Emoji Presentation (VS16) as 2' do + expect( "❣️".display_width(emoji: :vs16) ).to eq 2 + end + + it 'works with keycaps: width 2' do + expect( "1️⃣".display_width(emoji: :vs16) ).to eq 2 end end describe ':rgi' do - it 'will ignore shorter width of non-RQI sequences' do - expect( "🤾🏽‍♀️".display_width(1, emoji: :rgi) ).to eq 2 # FQE - expect( "🤾🏽‍♀".display_width(1, emoji: :rgi) ).to eq 2 # MQE - expect( "❤‍🩹".display_width(1, emoji: :rgi) ).to eq 2 # UQE - expect( "🤠‍🤢".display_width(1, emoji: :rgi) ).to eq 4 # Non-RGI/well-formed - expect( "🚄🏾‍▶️".display_width(1, emoji: :rgi) ).to eq 6 # Invalid/non-Emoji sequence + it 'will ignore shorter width of non-RGI sequences' do + expect( "🤾🏽‍♀️".display_width(emoji: :rgi) ).to eq 2 # FQE + expect( "🤾🏽‍♀".display_width(emoji: :rgi) ).to eq 2 # MQE + expect( "❤‍🩹".display_width(emoji: :rgi) ).to eq 2 # UQE + expect( "👏🏽".display_width(emoji: :rgi) ).to eq 2 # Modifier + expect( "J🏽".display_width(emoji: :rgi) ).to eq 3 # Modifier with invalid base + expect( "🤠‍🤢".display_width(emoji: :rgi) ).to eq 4 # Non-RGI/well-formed + expect( "🚄🏾‍▶️".display_width(emoji: :rgi) ).to eq 6 # Invalid/non-Emoji sequence + end + + it 'counts default-text presentation Emoji with Emoji Presentation (VS16) as 2' do + expect( "❣️".display_width(emoji: :rgi) ).to eq 2 + end + end + + describe ':rgi_at' do + it 'will assign width based on EAW of first partial Emoji to whole sequence' do + expect( "🤾🏽‍♀️".display_width(emoji: :rgi_at) ).to eq 2 + expect( "⛹️‍♀️".display_width(emoji: :rgi_at) ).to eq 1 + expect( "❤‍🩹".display_width(emoji: :rgi_at) ).to eq 1 + end + + it 'will count partial emoji for non-RGI sequences' do + expect( "🤠‍🤢".display_width(emoji: :rgi_at) ).to eq 4 # Non-RGI/well-formed + expect( "🚄🏾‍▶️".display_width(emoji: :rgi_at) ).to eq 5 # Invalid/non-Emoji sequence + end + + it 'uses EAW for default-text presentation Emoji with Emoji Presentation (VS16)' do + expect( "❣️".display_width(emoji: :rgi_at) ).to eq 1 end end describe ':possible' do it 'will treat possible/well-formed Emoji sequence as width 2' do - expect( "🤾🏽‍♀️".display_width(1, emoji: :possible) ).to eq 2 # FQE - expect( "🤾🏽‍♀".display_width(1, emoji: :possible) ).to eq 2 # MQE - expect( "❤‍🩹".display_width(1, emoji: :possible) ).to eq 2 # UQE - expect( "🤠‍🤢".display_width(1, emoji: :possible) ).to eq 2 # Non-RGI/well-formed - expect( "🚄🏾‍▶️".display_width(1, emoji: :possible) ).to eq 6 # Invalid/non-Emoji sequence + expect( "🤾🏽‍♀️".display_width(emoji: :possible) ).to eq 2 # FQE + expect( "🤾🏽‍♀".display_width(emoji: :possible) ).to eq 2 # MQE + expect( "❤‍🩹".display_width(emoji: :possible) ).to eq 2 # UQE + expect( "👏🏽".display_width(emoji: :possible) ).to eq 2 # Modifier + expect( "J🏽".display_width(emoji: :possible) ).to eq 3 # Modifier with invalid base + expect( "🤠‍🤢".display_width(emoji: :possible) ).to eq 2 # Non-RGI/well-formed + expect( "🚄🏾‍▶️".display_width(emoji: :possible) ).to eq 6 # Invalid/non-Emoji sequence + end + + it 'counts default-text presentation Emoji with Emoji Presentation (VS16) as 2' do + expect( "❣️".display_width(emoji: :possible) ).to eq 2 end end describe ':all' do it 'will treat any ZWJ/modifier/keycap sequences sequence as width 2' do - expect( "🤾🏽‍♀️".display_width(1, emoji: :all) ).to eq 2 # FQE - expect( "🤾🏽‍♀".display_width(1, emoji: :all) ).to eq 2 # MQE - expect( "❤‍🩹".display_width(1, emoji: :all) ).to eq 2 # UQE - expect( "🤠‍🤢".display_width(1, emoji: :all) ).to eq 2 # Non-RGI/well-formed - expect( "🚄🏾‍▶️".display_width(1, emoji: :all) ).to eq 2 # Invalid/non-Emoji sequence - expect( "❣️".display_width(emoji: :all) ).to eq 2 # VS16 + expect( "🤾🏽‍♀️".display_width(emoji: :all) ).to eq 2 # FQE + expect( "🤾🏽‍♀".display_width(emoji: :all) ).to eq 2 # MQE + expect( "❤‍🩹".display_width(emoji: :all) ).to eq 2 # UQE + expect( "👏🏽".display_width(emoji: :all) ).to eq 2 # Modifier + expect( "👏🏽".display_width(emoji: :all) ).to eq 2 # Modifier + expect( "J🏽".display_width(emoji: :all) ).to eq 2 # Modifier with invalid base + expect( "🤠‍🤢".display_width(emoji: :all) ).to eq 2 # Non-RGI/well-formed + expect( "🚄🏾‍▶️".display_width(emoji: :all) ).to eq 2 # Invalid/non-Emoji sequence + end + + it 'counts default-text presentation Emoji with Emoji Presentation (VS16) as 2' do + expect( "❣️".display_width(emoji: :all) ).to eq 2 end end describe ':all_no_vs16' do it 'will treat any ZWJ/modifier/keycap sequences sequence as width 2' do - expect( "🤾🏽‍♀️".display_width(1, emoji: :all_no_vs16) ).to eq 2 # FQE - expect( "🤾🏽‍♀".display_width(1, emoji: :all_no_vs16) ).to eq 2 # MQE - expect( "❤‍🩹".display_width(1, emoji: :all_no_vs16) ).to eq 2 # UQE - expect( "🤠‍🤢".display_width(1, emoji: :all_no_vs16) ).to eq 2 # Non-RGI/well-formed - expect( "🚄🏾‍▶️".display_width(1, emoji: :all_no_vs16) ).to eq 2 # Invalid/non-Emoji sequence - expect( "❣️".display_width(emoji: :all_no_vs16) ).to eq 1 # No VS16 + expect( "🤾🏽‍♀️".display_width(emoji: :all_no_vs16) ).to eq 2 # FQE + expect( "🤾🏽‍♀".display_width(emoji: :all_no_vs16) ).to eq 2 # MQE + expect( "❤‍🩹".display_width(emoji: :all_no_vs16) ).to eq 2 # UQE + expect( "👏🏽".display_width(emoji: :all_no_vs16) ).to eq 2 # Modifier + expect( "J🏽".display_width(emoji: :all_no_vs16) ).to eq 2 # Modifier with wrong base + expect( "🤠‍🤢".display_width(emoji: :all_no_vs16) ).to eq 2 # Non-RGI/well-formed + expect( "🚄🏾‍▶️".display_width(emoji: :all_no_vs16) ).to eq 2 # Invalid/non-Emoji sequence + end + + it 'uses EAW for default-text presentation Emoji with Emoji Presentation (VS16)' do + expect( "❣️".display_width(emoji: :all_no_vs16) ).to eq 1 end end end diff --git a/unicode-display_width.gemspec b/unicode-display_width.gemspec index a5b4244..e42c207 100644 --- a/unicode-display_width.gemspec +++ b/unicode-display_width.gemspec @@ -13,7 +13,7 @@ Gem::Specification.new do |s| s.extra_rdoc_files = ["README.md", "MIT-LICENSE.txt", "CHANGELOG.md"] s.license = 'MIT' s.required_ruby_version = '>= 2.5.0' - s.add_dependency 'unicode-emoji', '~> 4.0' + s.add_dependency 'unicode-emoji', '~> 4.1' s.add_development_dependency 'rspec', '~> 3.4' s.add_development_dependency 'rake', '~> 13.0'