ICU Unicode Transliteration and Collation in Ruby.
Unicode sorting is complicated (, and Ruby doesn't do it correctly. But there is a widely-used implementation of the Unicode collation algorithm in the ICU (International Components for Unicode) libraries. There is also no way to do Transliteration in Ruby ( This gem is a simple C wrapper around ucol_getSortKey from the ICU Collation API and utrans_transUChars from the ICU Transliteration API. These are added as simple methods on String.


["cafe", "cafes", "caf\303\251"].sort
=> ["cafe", "cafes", "caf\303\251"]

require 'icunicode'

["cafe", "cafes", "caf\303\251"].sort_by {|s| s.unicode_sort_key}
=> ["cafe", "caf\303\251", "cafes"]

=> "burueberrui"

=> "blyeberry"


You must install ICU first. You can download the source from, or on Mac, you can install with MacPorts:

sudo port install icu

Then install the gem:

sudo gem install icunicode

To do:

Add support for locales other than en-US. Increase buffer size or make it grow dynamically.


Copyright © 2009 Justin Balthrop,; Published under The MIT License, see LICENSE

