Monospace Unicode character width in Ruby
Clone or download
Latest commit 1f17fe7 Jun 5, 2018

README.md

Unicode::DisplayWidth [version]

Determines the monospace display width of a string in Ruby. Implementation based on EastAsianWidth.txt and other data, 100% in Ruby. Other than wcwidth(), which fulfills a similar purpose, it does not rely on the OS vendor to provide an up-to-date method for measuring string width.

Unicode version: 11.0.0

Supported Rubies: 2.5, 2.4, 2.3

Old Rubies that might still work: 2.2, 2.1, 2.0, 1.9

Introduction to Character Widths

Guesing the correct space a character will consume on terminals is not easy. There is no single standard. Most implementations combine data from East Asian Width, some General Categories, and hand-picked adjustments.

How this Library Handles Widths

Further at the top means higher precedence. Please expect changes to this algorithm with every MINOR version update (the X in 1.X.0)!

Width Characters Comment
X (user defined) Overwrites any other values
-1 "\b" Backspace (total width never below 0)
0 "\0", "\x05", "\a", "\n", "\v", "\f", "\r", "\x0E", "\x0F" C0 control codes that do not change horizontal width
1 "\u{00AD}" SOFT HYPHEN
2 "\u{2E3A}" TWO-EM DASH
3 "\u{2E3B}" THREE-EM DASH
0 General Categories: Mn, Me, Cf (non-arabic) Excludes ARABIC format characters
0 "\u{1160}".."\u{11FF}" HANGUL JUNGSEONG
0 "\u{2060}".."\u{206F}", "\u{FFF0}".."\u{FFF8}", "\u{E0000}".."\u{E0FFF}" Ignorable ranges
2 East Asian Width: F, W Full-width characters
2 "\u{3400}".."\u{4DBF}", "\u{4E00}".."\u{9FFF}", "\u{F900}".."\u{FAFF}", "\u{20000}".."\u{2FFFD}", "\u{30000}".."\u{3FFFD}" Full-width ranges
1 or 2 East Asian Width: A Ambiguous characters, user defined, default: 1
1 All other codepoints -

Install

Install the gem with:

$ gem install unicode-display_width

Or add to your Gemfile:

gem 'unicode-display_width'

Usage

require 'unicode/display_width'

Unicode::DisplayWidth.of("") # => 1
Unicode::DisplayWidth.of("") # => 2

Ambiguous Characters

The second parameter defines the value returned by characterrs defined as ambiguous:

Unicode::DisplayWidth.of("·", 1) # => 1
Unicode::DisplayWidth.of("·", 2) # => 2

Custom Overwrites

You can overwrite how to handle specific code points by passing a hash (or even a proc) as third parameter:

Unicode::DisplayWidth.of("a\tb", 1, 0x09 => 10)) # => 12

Emoji Support

Experimental emoji support is included. It will adjust the string's size for modifier and zero-width joiner sequences. You will need to add the unicode-emoji gem to your Gemfile:

gem 'unicode-display_width'
gem 'unicode-emoji'

You can then activate the emoji string width adjustments by passing emoji: true as fourth parameter:

Unicode::DisplayWidth.of "🤾🏽‍♀️" # => 5
Unicode::DisplayWidth.of "🤾🏽‍♀️", 1, {}, emoji: true # => 2

Usage with String Extension

Activated by default. Will be deactivated in version 2.0:

require 'unicode/display_width/string_ext'

"".display_width #=> 1
''.display_width #=> 2

You can actively opt-out from the string extension with: require 'unicode/display_width/no_string_ext'

Usage From the CLI

Use this one-liner to print out display widths for strings from the command-line:

$ gem install unicode-display_width
$ ruby -r unicode/display_width -e 'puts Unicode::DisplayWidth.of $*[0]' -- "一"

Replace "一" with the actual string to measure

Other Implementations & Discussion

See unicode-x for more Unicode related micro libraries.

Copyright & Info