For example, emoji characters like "
"🌷".length == 2
The combined emoji are even longer:
"🏳️🌈".length == 6
What's more, some languages often include combining marks - characters that are used to modify the letters before them. Common examples are the German letter ü and the Spanish letter ñ. Sometimes they can be represented alternatively both as a single character and as a letter + combining mark, with both forms equally valid:
var two = "ñ"; // unnormalized two-char n+◌̃ , i.e. "\u006E\u0303"; var one = "ñ"; // normalized single-char, i.e. "\u00F1" console.log(one!=two); // prints 'true'
Unicode normalization, as performed by the popular punycode.js library or ECMAScript 6's String.normalize, can sometimes fix those differences and turn two-char sequences into single characters. But it is not enough in all cases. Some languages like Hindi make extensive use of combining marks on their letters, that have no dedicated single-codepoint Unicode sequences, due to the sheer number of possible combinations. For example, the Hindi word "अनुच्छेद" is comprised of 5 letters and 3 combining marks:
अ + न + ु + च + ् + छ + े + द
which is in fact just 5 user-perceived letters:
अ + नु + च् + छे + द
You can use the index.js file directly as-is. Or you you can install
grapheme-splitter to your project using the NPM command below:
$ npm install --save grapheme-splitter
To run the tests on
grapheme-splitter, use the command below:
$ npm test
Just initialize and use:
var splitter = new GraphemeSplitter(); // split the string to an array of grapheme clusters (one string each) var graphemes = splitter.splitGraphemes(string); // iterate the string to an iterable iterator of grapheme clusters (one string each) var graphemes = splitter.iterateGraphemes(string); // or do this if you just need their number var graphemeCount = splitter.countGraphemes(string);
Grapheme splitter includes TypeScript declarations.
import GraphemeSplitter = require('grapheme-splitter') const splitter = new GraphemeSplitter() const split: string = splitter.splitGraphemes('Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞')