Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Unicode normalization library. (Mirror of Yoshida-san's code base to maintain the RubyGem.)
C Ruby
branch: master

This branch is 17 commits behind blackwinter:master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
tools
.gitignore
MANIFEST
README
extconf.rb
test.rb
unicode.c
unicode.gemspec
unidata.map
ustring.c
ustring.h
wstring.c
wstring.h

README

		   Unicode Library for Ruby
			Version 0.3.0

		       Yoshida Masato


- Introduction

  Unicode string manipulation library for Ruby.
  This library is based on UTR #15 Unicode Normalization Forms(*1).

    *1 <URL:http://www.unicode.org/unicode/reports/tr15/>


- Install

  This can work with ruby-1.8 or later. I recommend you to
  use ruby-1.8.1 or later.

  Make and install usually.
  For example, when Ruby supports dynamic linking on your OS,

    ruby extconf.rb
    make
    make install


- Usage

  If you do not link this module with Ruby statically, 

    require "unicode"

  before using.


- Module Functions

  All parameters of functions must be UTF-8 strings.

  Unicode::strcmp(str1, str2)
  Unicode::strcmp_compat(str1, str2)
    Compare Unicode strings with a normalization.
    strcmp uses the Normalization Form D, strcmp_compat uses
    Normalization Form KD.

  Unicode::decompose(str)
  Unicode::decompose_compat(str)
    Decompose Unicode string. Then the trailing characters
    are sorted in canonical order.
    decompose uses the canonical decomposition,
    decompose_compat uses the compatibility decomposition.
    The decomposition is based on the character decomposition
    mapping in UnicodeData.txt and the Hangul decomposition
    algorithm.

  Unicode::compose(str)
    Compose Unicode string. Before composing, the trailing
    characters are sorted in canonical order.
    The parameter must be decomposed.
    The composition is based on the reverse of the
    character decomposition mapping in UnicodeData.txt,
    CompositionExclusions.txt and the Hangul composition
    algorithm.

  Unicode::normalize_D(str)
  Unicode::normalize_KD(str)
    Normalize Unicode string in form D or form KD.
    These are aliases of decompose/decompose_compat.

  Unicode::normalize_C(str)
  Unicode::normalize_KC(str)
    Normalize Unicode string in form C or form KC.
      normalize_C  = decompose + compose
      normalize_KC = decompose_compat + compose

  Unicode::upcase(str)
  Unicode::downcase(str)
  Unicode::capitalize(str)
    Case conversion functions.
    The mappings that are used by these functions are not normative
    in UnicodeData.txt.

- Bugs

  UTR #15 suggests that the look up for Normalization Form C
  should not be implemented with a hash of string for better
  performance.

  Case conversion functions should reflecte UTR #21.


- Copying

  This extension module is copyrighted free software by
  Yoshida Masato.

  You can redistribute it and/or modify it under the same
  term as Ruby.


- Author

  Yoshida Masato <yoshidam@yoshidam.net>


- History

  Feb 26, 2010 version 0.3.0 fix a capitalize bug and support SpecialCasing
  Dec 29, 2009 version 0.2.0 update for Ruby 1.9.1 and Unicode 5.2
  Sep 10, 2005 version 0.1.2 update unidata.map for Unicode 4.1.0
  Aug 26, 2004 version 0.1.1 update unidata.map for Unicode 4.0.1
  Nov 23, 1999 version 0.1
Something went wrong with that request. Please try again.