Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
b342f94
Improve README Emoji mode table
janlelis Nov 18, 2024
5088d98
Rename :basic mode to :vs16
janlelis Nov 18, 2024
7ba1c3a
Implement :rgi_at mode for Apple_Terminal
janlelis Nov 18, 2024
c2953c6
Don't mix raw and escapd Unicode chars in regex
janlelis Nov 18, 2024
82b17bd
Release v3.1.0
janlelis Nov 18, 2024
b79047c
README: rgi_at uses EAW
janlelis Nov 19, 2024
5d13cb1
Improve VS16 matching
janlelis Nov 19, 2024
3f93a24
Release v3.1.1
janlelis Nov 19, 2024
d67859e
Refactor: Improve code quality, handle overwrite option differently
janlelis Nov 20, 2024
16d299a
Performance: Speed up if string is only common narrow characters
janlelis Nov 20, 2024
fc78784
Performance: Use bytesize for an extra boost when string is only ASCII
janlelis Nov 20, 2024
d510942
Use :all regex for Emoji pre-selection
janlelis Nov 20, 2024
49f2b74
Release v3.1.2
janlelis Nov 20, 2024
b00c5bf
Add link to terminal-emoji-width.rb
janlelis Nov 21, 2024
bc47d28
Handle invalid encoded strings
Earlopain Dec 25, 2024
a23a070
Merge pull request #28 from Earlopain/invalid-encoding-stuff
janlelis Dec 26, 2024
620454c
Add Encoding note to README and CHANGELOG
janlelis Dec 26, 2024
5ed64f9
CI: Add Ruby 3.4
janlelis Dec 26, 2024
2fbc7a7
CI: Deactivate jruby till jar-dependencies issue is sorted out
janlelis Dec 26, 2024
893f9a9
Release v3.1.3
janlelis Dec 26, 2024
dc64170
Fix and improve handling of Skin Tone Modifiers:
janlelis Jan 13, 2025
4bcbf6a
CI: Add jruby and Ruby 3.4 on Windows
janlelis Jan 13, 2025
a515fa2
Release v3.1.4
janlelis Jan 13, 2025
85692f4
Improve README
janlelis Jan 13, 2025
6632fe0
Memoize `EmojiSupport.recommended`
Earlopain Mar 10, 2025
0aa90fe
Merge pull request #30 from Earlopain/memoize-recommend
janlelis Mar 10, 2025
1352b28
Release v3.1.5
janlelis Aug 15, 2025
8965d62
Unicode 17
janlelis Sep 7, 2025
2153285
Release v3.2.0
janlelis Sep 9, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ jobs:
strategy:
matrix:
ruby:
- '3.4'
- '3.3'
- '3.2'
- '3.1'
Expand Down Expand Up @@ -36,6 +37,7 @@ jobs:
strategy:
matrix:
ruby:
- '3.4'
- '3.3'
- '3.2'
- '3.1'
Expand Down
48 changes: 39 additions & 9 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,55 @@
# CHANGELOG

## 3.1.0 (unreleased)
## 3.2.0

**Further Emoji improvements:**
- Unicode 17.0

## 3.1.5

- Cache Emoji support level for performance reasons #30, patch by @Earlopain:

## 3.1.4

- Fix that skin tone modifiers were ignored when used in a non-ZWJ sequence
context (= single emoji char + modifier) #29
- Add more docs and specs about modifier handling

## 3.1.3

Better handling of non-UTF-8 strings, patch by @Earlopain:

- Data with *BINARY* encoding is interpreted as UTF-8, if possible
- Use `invalid: :replace` and `undef: :replace` options when converting to UTF-8

## 3.1.2

- Performance improvements

## 3.1.1

- Performance improvements

## 3.1.0

**Improve Emoji support:**

- Emoji modes: Differentiate between well-formed Emoji (`:possible`) and any
ZWJ/modifier sequence (`:all`). The latter is more common and more efficient
to implement.
- Add alias `emoji: :auto` for `emoji: true` and `emoji: :none` for `emoji: false`
- Unify `rgi_*` options to just `rgi` to keep things simpler (corresponds to
- Unify `:rgi_{fqe,mqe,uqe}` options to just `:rgi` to keep things simpler (corresponds to
the former `:rgi_uqe` option). Most terminals that want to support the RGI set
will probably want to catch Emoji sequences with missing VS16s.
- Add new `:all_no_vs16` mode
- Only consider terminal cells needed when recommending Emoji support level
- Add new `:all_no_vs16` and `:rgi_at` modes to be able to support some terminals
that needs these quirks
- Add alias `emoji: :auto` for `emoji: true` and `emoji: :none` for `emoji: false`
- `:auto` mode: Only consider terminal cells when recommending Emoji support level
(Emoji themselves might display differently)
- Set default Emoji mode for unknown/unsupported terminals to `:none`
(instead of `:basic`)

- `:auto` mode: Set default Emoji mode for unknown/unsupported terminals to `:none`
- Rename `:basic` mode to `:vs16`

## 3.0.1


- Add WezTerm and foot as good Emoji terminals

## 3.0.0
Expand Down
57 changes: 33 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Unicode::DisplayWidth [![[version]](https://badge.fury.io/rb/unicode-display_width.svg)](https://badge.fury.io/rb/unicode-display_width) [<img src="https://github.com/janlelis/unicode-display_width/workflows/Test/badge.svg" />](https://github.com/janlelis/unicode-display_width/actions?query=workflow%3ATest)

Determines the monospace display width of a string in Ruby, which is useful for all kinds of terminal-based applications. The implementation is based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt), the [Emoji specfication](https://www.unicode.org/reports/tr51/) and other data, 100% in Ruby. It does not rely on the OS vendor ([wcwidth()](https://github.com/janlelis/wcswidth-ruby)) to provide an up-to-date method for measuring string width in terminals.
Determines the monospace display width of a string in Ruby, which is useful for all kinds of terminal-based applications. The implementation is based on [EastAsianWidth.txt](https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt), the [Emoji specfication](https://www.unicode.org/reports/tr51/) and other data, 100% in Ruby. It does not rely on the OS vendor ([wcwidth](https://github.com/janlelis/wcswidth-ruby)) to provide an up-to-date method for measuring string width in terminals.

Unicode version: **16.0.0** (September 2024)
Unicode version: **17.0.0** (September 2025)

## Gem Version 3 — Improved Emoji Support

Expand Down Expand Up @@ -71,6 +71,11 @@ Unicode::DisplayWidth.of("·", 1) # => 1
Unicode::DisplayWidth.of("·", 2) # => 2
```

### Encoding Notes

- Data with *BINARY* encoding is interpreted as UTF-8, if possible
- Non-UTF-8 strings are converted to UTF-8 before measuring, using the [`{invalid: :replace, undef: :replace}`) options](https://ruby-doc.org/3.3.5/encodings_rdoc.html#label-Encoding+Options)

### Custom Overwrites

You can overwrite how to handle specific code points by passing a hash (or even a proc) as `overwrite:` parameter:
Expand All @@ -96,39 +101,43 @@ There are many Emoji which get constructed by combining other Emoji in a sequenc

Another aspect where terminals disagree is whether Emoji characters which have a text presentation by default (width 1) should be turned into full-width (width 2) when combined with Variation Selector 16 (*U+FEOF*).

Finally, it varies if Skin Tone Modifiers can be applied to all characters or just to those with the "Emoji Base" property.

Emoji Type | Width / Comment
------------|----------------
Basic/Single Emoji character without Variation Selector | No special handling
Basic/Single Emoji character with VS15 (Text) | No special handling
Basic/Single Emoji character with VS16 (Emoji) | 2 (except with `emoji: :none` or `emoji: :all_no_vs16`)
Emoji Sequence | 2 if Emoji belongs to configured Emoji set
Basic/Single Emoji character without Variation Selector | No special handling
Basic/Single Emoji character with VS15 (Text) | No special handling
Basic/Single Emoji character with VS16 (Emoji) | 2 or East Asian Width (see table below)
Single Emoji character with Skin Tone Modifier | 2 unless Emoji mode is `:none` or `vs16`
Skin Tone Modifier used in isolation or with invalid base | 2 if Emoji mode is `:rgi` / `:rgi_at`
Emoji Sequence | 2 if Emoji belongs to configured Emoji set (see table below)

The `emoji:` option can be used to configure which type of Emoji should be considered to have a width of 2 and if VS16-Emoji should be widened. Other sequences are treated as non-combined Emoji, so the widths of all partial Emoji add up (e.g. width of one basic Emoji + one skin tone modifier + another basic Emoji). The following Emoji settings can be used:
#### Emoji Modes

Option | Description | Example Terminals
-------|-------------|------------------
`emoji: true` or `emoji: :auto` | Automatically use recommended Emoji setting for your terminal | -
`emoji: false` or `emoji: :none` | No Emoji adjustments, Emoji characters with VS16 not handled | Gnome Terminal, many older terminals
`emoji: :basic` | Full-width VS16-Emoji, but no width adjustments for Emoji sequences: All partial Emoji treated separately with a width of 2 | ?
`emoji: :rgi` | Full-width VS16-Emoji, all RGI Emoji sequences are considered to have a width of 2 | Apple Terminal
`emoji: :possible`| Full-width VS16-Emoji, all possible/well-formed Emoji sequences are considered to have a width of 2 | ?
`emoji: :all` | Full-width VS16-Emoji, all ZWJ/modifier/keycap sequences have a width of 2, even if they are not well-formed Emoji sequences | foot, Contour
`emoji: :all_no_vs16` | VS16-Emoji not handled, all ZWJ/modifier/keycap sequences to have a width of 2, even if they are not well-formed Emoji sequences | WezTerm
The `emoji:` option can be used to configure which type of Emoji should be considered to have a width of 2 and if VS16-Emoji should be widened. Other sequences are treated as non-combined Emoji, so the widths of all partial Emoji add up (e.g. width of one basic Emoji + one skin tone modifier + another basic Emoji). The following Emoji settings can be used:

`emoji:` Option | VS16-Emoji Width | Emoji Sequences Width / Comment | Example Terminals
----------------|------------------|---------------------------------|------------------
`true` or `:auto` | - | Automatically use recommended Emoji setting for your terminal | -
`:all` | 2 | 2 for all ZWJ/modifier/keycap sequences, even if they are not well-formed Emoji sequences | iTerm, foot
`:all_no_vs16` | EAW (1 or 2) | 2 for all ZWJ/modifier/keycap sequences, even if they are not well-formed Emoji sequences | WezTerm
`:possible`| 2 | 2 for all possible/well-formed Emoji sequences | ?
`:rgi` | 2 | 2 for all [RGI Emoji](https://www.unicode.org/reports/tr51/#def_rgi_set) sequences | ?
`:rgi_at` | EAW (1 or 2) | 1 or 2: Like `:rgi`, but Emoji sequences starting with a default-text Emoji have EAW | Apple Terminal
`:vs16` | 2 | 2 * number of partial Emoji (sequences never considered to represent a combined Emoji) | kitty?
`false` or `:none` | EAW (1 or 2) | No Emoji adjustments | gnome-terminal, many older terminals

- *EAW:* East Asian Width
- *RGI Emoji:* Emoji Recommended for General Interchange
- *ZWJ:* Zero-width Joiner: Codepoint `U+200D`,used in many Emoji sequences

#### Emoji Support in Terminals

Unfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` or `emoji: :auto` is used, the gem will attempt to set the best fitting Emoji setting for you (e.g. `:rgi` on "Apple_Terminal" or `:none` on Gnome's terminal widget).

Note that Emoji display and number of terminal columns used might differs a lot. For example, it might be the case that a terminal does not understand which Emoji to display, but still manages to calculate the proper amount of terminal cells. The automatic Emoji support level per terminal only considers the latter (cursor position), not the actual Emoji image(s) displayed. Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value. Also see the [ucs-detect project], which is a great resource that compares various terminal's Unicode/Emoji capabilities.

---
Unfortunately, the level of Emoji support varies a lot between terminals. While some of them are able to display (almost) all Emoji sequences correctly, others fall back to displaying sequences of basic Emoji. When `emoji: true` or `emoji: :auto` is used, the gem will attempt to set the best fitting Emoji setting for you (e.g. `:rgi_at` on "Apple_Terminal" or `false` on Gnome's terminal widget).

To terminal implementors reading this: Although handling Emoji/ZWJ sequences as always having a width of 2 (`:all` mode described above) has some advantages, it does not lead to a particularly good developer experience. Since there is always the possibility of well-formed Emoji that are currently not supported (non-RGI / future Unicode) appearing, those sequences will take more cells. Instead of overflowing, cutting off sequences or displaying placeholder-Emoji, could it be worthwile to implement the `:rgi` option (see table above) and just give those unknown Emoji the space they need? It is painful to implement, I know, but it kind of underlines the idea that the meaning of an unknown Emoji sequence can still be conveyed (without messing up the terminal at the same time). Just a thought…
Please note that Emoji display and number of terminal columns used might differs a lot. For example, it might be the case that a terminal does not understand which Emoji to display, but still manages to calculate the proper amount of terminal cells. The automatic Emoji support level per terminal only considers the latter (cursor position), not the actual Emoji image(s) displayed. Please [open an issue](https://github.com/janlelis/unicode-display_width/issues/new) if you notice your terminal application could use a better default value. Also see the [ucs-detect project](https://ucs-detect.readthedocs.io/results.html), which is a great resource that compares various terminal's Unicode/Emoji capabilities. You can visually check how your terminals renders different kind of Emoji types with the [terminal-emoji-width.rb script](https://github.com/janlelis/unicode-display_width/blob/main/misc/terminal-emoji-width.rb).

---
**To terminal implementors reading this:** Although the practice of giving all Emoji/ZWJ sequences a width of 2 (`:all` mode described above) has some advantages, it does not lead to a particularly good developer experience. Since there is always the possibility of well-formed Emoji that are currently not supported (non-RGI / future Unicode) appearing, those sequences will take more cells. Instead of overflowing, cutting off sequences or displaying placeholder-Emoji, could it be worthwile to implement the `:rgi` option (only known Emoji get width 2) and give those unknown Emoji the space they need? This would support the idea that the meaning of an unknown Emoji sequence can still be conveyed (without messing up the terminal at the same time). Just a thought…

### Usage with String Extension

Expand Down Expand Up @@ -179,7 +188,7 @@ See [unicode-x](https://github.com/janlelis/unicode-x) for more Unicode related

## Copyright & Info

- Copyright (c) 2011, 2015-2024 Jan Lelis, https://janlelis.com, released under the MIT
- Copyright (c) 2011, 2015-2025 Jan Lelis, https://janlelis.com, released under the MIT
license
- Early versions based on runpaint's unicode-data interface: Copyright (c) 2009 Run Paint Run Run
- Unicode data: https://www.unicode.org/copyright.html#Exhibit1
Binary file modified data/display_width.marshal.gz
Binary file not shown.
Loading