Ruby
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin Do not oficially support JRuby Apr 17, 2017
lib Release v1.2.0 Jun 5, 2018
screenshots Improve README with an example of correctly used idegraphic variations Apr 20, 2017
spec Only do full composition testing on 2.4 Apr 17, 2017
.gitignore uniscribe 🚡 Apr 17, 2017
.travis.yml
CHANGELOG.md
CODE_OF_CONDUCT.md
Gemfile
MIT-LICENSE.txt
README.md
Rakefile uniscribe 🚡 Apr 17, 2017
uniscribe.gemspec Unicode 11 Jun 5, 2018

README.md

uniscribe | Describe the Unicode [version] [travis]

Describes Unicode characters with their name and shows compositions.

  • Helps you understand how glyphs and codepoints are structured within the data
  • Gives you the names of glyphs and codepoints, which can be used for further research
  • Highlights invalid/special/blank codepoints

Uses a similar color coding like its lower-level companion tool unibits.

Setup

Make sure you have Ruby installed and installing gems works properly. Then do:

$ gem install uniscribe

Usage

Pass the string to debug to uniscribe:

From CLI

$ uniscribe "test strı̈ng"

From Ruby

require "uniscribe/kernel_method"
uniscribe "test strı̈ng"

Output


0074 ├─ t		├─ LATIN SMALL LETTER T
0065 ├─ e		├─ LATIN SMALL LETTER E
0073 ├─ s		├─ LATIN SMALL LETTER S
0074 ├─ t		├─ LATIN SMALL LETTER T
0020 ├─ ] [		├─ SPACE
0073 ├─ s		├─ LATIN SMALL LETTER S
0074 ├─ t		├─ LATIN SMALL LETTER T
0072 ├─ r		├─ LATIN SMALL LETTER R
---- ├┬ ı̈		├┬ Composition
0131 │├─ ı		│├─ LATIN SMALL LETTER DOTLESS I
0308 │└─ ◌̈		│└─ COMBINING DIAERESIS
006E ├─ n		├─ LATIN SMALL LETTER N
0067 ├─ g		├─ LATIN SMALL LETTER G

Examples

Tamil

>> uniscribe "நகரத்தில்"

Screenshot Tamil

Thai

>> uniscribe "ม้าลายหกตัว"

Screenshot Thai

Ideographic Variations

>> uniscribe "辻󠄀㚑󠄁"

Screenshot Ideographic Variations

(the variation is not visible in the screenshot, because my system does not render it correctly)

Emoji Sequences

>> uniscribe "3️⃣🤸‍♀"

Screenshot Emoji

Lots of Combining Marks

>> uniscribe "̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍"

Screenshot Marks

Random Sequences of some Special Unicode Codepoints

>> uniscribe "\0A\u{E01D7}\x7F\r\n\u{D0000}\u{81}\u{FFF9}B\u{FFFB}🏴\u{E0061}\u{E007F}\u{10FFFF}"

Screenshot Strange

Some Blanks

>> uniscribe "­ᅠ 𝅸"

Screenshot Blanks

Notes

The proper detection of compositions / graphemes / combined characters depends on your Ruby version:

Ruby Unicode Version
2.5 10.0.0
2.4 9.0.0
2.3 8.0.0
2.2 7.0.0
2.1 6.1.0

Also see

Copyright (C) 2017-2018 Jan Lelis http://janlelis.com. Released under the MIT license.