Skip to content
/ unifill Public
forked from mandel59/unifill

Haxe library for Unicode string support

License

Notifications You must be signed in to change notification settings

skial/unifill

 
 

Repository files navigation

Unifill

Shim your code to support Unicode across all platforms.

Php Python Java C# Js/Node Interp Neko HashLink Lua CPP

Usage

The declaration using unifill.Unifill; introduces the methods whose name starts with u into String instances. Replace all methods of Strings in your code with Unifill's methods, and your code will be able to deal with Unicode strings across all platforms.

using unifill.Unifill;
import unifill.CodePoint;

class Main {
  public static function main() : Void {
    trace("日本語".uLength()); // ==> 3
    trace("русский".uCharAt(5)); // ==> и
    trace("🍺".uCodePointAt(0).toInt()); // ==> 127866
    trace(CodePoint.fromInt(0x1F37B)); // ==> 🍻
    for (c in "♠♡♢♣".uIterator()) {
      trace(c);
      trace(c + 4);
    }
  }
}

Iteration

You might write for loops like this:

function f(s : String) : Void {
  for (i in s.uLength()) {
    trace(s.uCharAt(i));
  }
}

But this way may be inefficient because f(s) has order of the square of the length of s.

Instead, you can use uIterator to make the function linear time:

function f(s : String) : Void {
  for (c in s.uIterator()) {
    trace(c.toString());
  }
}

uIterator iterates over each code point in the string.

InternalEncoding

For advanced usage, you can use InternalEncoding, which provides methods treating variable-length encoding without considering which encoding form is practically used.

These methods index by code units. That is, the value of InternalEncoding.charAt("эюя", 2) varies depending the target environment: the Neko target gives "ю", while the other targets give "я".

InternalEncoding.codePointWidthAt returns the number of code units the code point is consist of, so any platform gives "ю" for the following expression:

InternalEncoding.charAt("эюя", InternalEncoding.codePointWidthAt("эюя", 0))

Target Notes

  • Some targets will break, silently on some targets, when trying handle the Null character.
  • The Lua target has not been tested.

About

Haxe library for Unicode string support

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Haxe 100.0%