Skip to content
No description or website provided.
C++ Perl C
Latest commit eb7ec18 Mar 31, 2011 @rboulton Olly points out that xapian/unicode.h shouldn't be included by extern…
…al files.

Include via xapian.h instead



This module is a word tokenizer for CJK texts, supporting n-gram tokenization. It is designed to be used with Xapian (, and uses Xapian's unicode routines.

Currently, there is no documentation other than the source code.



  • N-gram tokenization on CJK texts.
  • Conversion from Traditional Chinese to Simplified Chinese, and vice versa.


This project was taken from , but then modified to use Xapian's internal unicode routines.

Something went wrong with that request. Please try again.