Skip to content
Unicode-aware substring for JavaScript
JavaScript
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
test
.gitignore
.travis.yml Run tests on maintained node versions Jun 16, 2017
LICENSE Setup project Jan 28, 2015
README.md
index.js
package.json Bump version to 1.0.0 Jul 11, 2017

README.md

unicode-substring Build Status

Unicode-aware substring for JavaScript. Surrogate pairs are counted as a single character.

What?

Characters in JavaScript strings are exposed as 16-bit code points, also known as UCS-2 encoding. This usually good enough, but since there are more than 2^16 characters in Unicode, 16 bits is not enough to represent all characters. To overcome this limitation, characters with scalar value over 0x10FFFF need to be encoded as surrogate pairs. This encoding is known as UTF-16.

The purpose of this library is to treat surrogate pairs as one character when extracting substrings from a string. This might be preferable if indices are returned from an Unicode-compatible environment.

Usage

var unicodeSubstring = require('unicode-substring')
// unicodeSubstring(string, start, end)
unicodeSubstring("💥Emoji Rule💥", 0, 6)
// => "💥Emoji"

The start and end parameters behave similarly as String.prototype.substring.

You can’t perform that action at this time.