A MySQL compatible normalizer plugin for Groonga
C Objective-C Ruby Shell Makefile M4 Other
Latest commit 07ccfbe Nov 28, 2016 @kou kou Use Ruby 2.3.1
Permalink
Failed to load latest commit information.
build Add CMake support Mar 22, 2013
data/travis
doc
normalizers
packages
test test: add a spaces case Nov 28, 2016
tool
.dir-locals.el
.gitignore
.travis.yml
CMakeLists.txt
Makefile.am
README.md
autogen.sh
configure.ac
gpg_uid
groonga-normalizer-mysql.pc.in
required_groonga_version
version

README.md

README

Name

groonga-normalizer-mysql

Description

Groonga-normalizer-mysql is a Groonga plugin. It provides MySQL compatible normalizers and a custom normalizers to Groonga.

Here are MySQL compatible normalizers:

  • NormalizerMySQLGeneralCI for utf8mb4_general_ci
  • NormalizerMySQLUnicodeCI for utf8mb4_unicode_ci
  • NormalizerMySQLUnicode520CI for utf8mb4_unicode_520_ci

Here are custom normalizers:

  • NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark
    • It's based on NormalizerMySQLUnicodeCI
  • NormalizerMySQLUnicode520CIExceptKanaCIKanaWithVoicedSoundMark
    • It's based on NormalizerMySQLUnicode520CI

They are self-descriptive name but long. They are variant normalizers of NormalizerMySQLUnicodeCI and NormalizerMySQLUnicode520CI. They have different behaviors. The followings are the different behaviors. They describes with NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark but they are true for NormalizerMySQLUnicode520CIExceptKanaCIKanaWithVoicedSoundMark.

  • NormalizerMySQLUnicodeCI normalizes all small Hiragana such as , to Hiragana such as , . NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark doesn't normalize to nor to . and are different characters. and are also different characters. This behavior is described by ExceptKanaCI in the long name. This following behaviors ared described by ExceptKanaWithVoicedSoundMark in the long name.
  • NormalizerMySQLUnicode normalizes all Hiragana with voiced sound mark such as to Hiragana without voiced sound mark such as . NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark doesn't normalize to . and are different characters.
  • NormalizerMySQLUnicode normalizes all Hiragana with semi-voiced sound mark such as to Hiragana without semi-voiced sound mark such as . NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark doesn't normalize to . and are different characters.
  • NormalizerMySQLUnicode normalizes all Katakana with voiced sound mark such as to Katakana without voiced sound mark such as . NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark doesn't normalize to . and are different characters.
  • NormalizerMySQLUnicode normalizes all Katakana with semi-voiced sound mark such as to Hiragana without semi-voiced sound mark such as . NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark doesn't normalize to . and are different characters.
  • NormalizerMySQLUnicode normalizes all halfwidth Katakana with voiced sound mark such as ガ to halfwidth Katakana without voiced sound mark such as . NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark normalizes all halfwidth Katakana with voided sound mark such as ガ to fullwidth Katakana with voiced sound mark such as .

NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark and NormalizerMySQLUnicode520CIExceptKanaCIKanaWithVoicedSoundMark and are MySQL incompatible normalizers but they are useful for Japanese text. For example, ふらつく and ブラック has different means. NormalizerMySQLUnicodeCI identifies ふらつく with ブラック but NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark doesn't identify them.

Install

Debian GNU/Linux

Add apt-line for the Groonga deb package repository and install groonga-normalizer-mysql package:

% sudo apt-get -y install groonga-normalizer-mysql

Ubuntu

Add apt-line for the Groonga deb package repository and install groonga-normalizer-mysql package:

% sudo apt-get -y install groonga-normalizer-mysql

CentOS

Install groonga-repository package:

% sudo rpm -ivh http://packages.groonga.org/centos/groonga-release-1.1.0-1.noarch.rpm
% sudo yum makecache

Then install groonga-normalizer-mysql package:

% sudo yum install -y groonga-normalizer-mysql

Fedora

Install groonga-repository package:

% sudo rpm -ivh http://packages.groonga.org/fedora/groonga-release-1.1.0-1.noarch.rpm
% sudo yum makecache

Then install groonga-normalizer-mysql package:

% sudo yum install -y groonga-normalizer-mysql

OS X - Homebrew

Install groonga-normalizer-mysql package:

% brew install groonga-normalizer-mysql

Windows

You need to build from source. Here are build instructions.

Build system

Install the following build tools:

Build Groonga

Download the latest Groonga source from packages.groonga.org. Source file name is formatted as groonga-X.Y.Z.zip.

Extract the source and move to the source folder:

> cd ...\groonga-X.Y.Z
groonga-X.Y.Z>

Run CMake. Here is a command line to install Groonga to C:\groonga folder:

groonga-X.Y.Z> cmake . -G "Visual Studio 12 Win64" -DCMAKE_INSTALL_PREFIX=C:\groonga

Build:

groonga-X.Y.Z> cmake --build . --config Release

Install:

groonga-X.Y.Z> cmake --build . --config Release --target Install

Build groonga-normalizer-mysql

Download the latest groonga-normalizer-mysql source from packages.groonga.org. Source file name is formatted as groonga-normalizer-X.Y.Z.zip.

Extract the source and move to the source folder:

> cd ...\groonga-normalizer-mysql-X.Y.Z
groonga-normalizer-mysql-X.Y.Z>

IMPORTANT!!!: Set PKG_CONFIG_PATH environment variable:

groonga-normalizer-mysql-X.Y.Z> set PKG_CONFIG_PATH=C:\groongalocal\lib\pkgconfig

Run CMake. Here is a command line to install Groonga to C:\groonga folder:

groonga-normalizer-mysql-X.Y.Z> cmake . -G "Visual Studio 12 Win64" -DCMAKE_INSTALL_PREFIX=C:\groonga

Build:

groonga-normalizer-mysql-X.Y.Z> cmake --build . --config Release

Install:

groonga-normalizer-mysql-X.Y.Z> cmake --build . --config Release --target Install

Usage

First, you need to register normalizers/mysql plugin:

groonga> register normalizers/mysql

Then, you can use NormalizerMySQLGeneralCI and NormalizerMySQLUnicodeCI as normalizers:

groonga> table_create Lexicon TABLE_PAT_KEY --default_tokenizer TokenBigram --normalizer NormalizerMySQLGeneralCI

Dependencies

  • Groonga >= 3.0.3

Mailing list

Thanks

  • Alexander Barkov <bar@udm.net>: The author of MYSQL_SOURCE/strings/ctype-utf8.c.
  • ...

Authors

License

LGPLv2 only. See doc/text/lgpl-2.0.txt for details.

This program uses normalization table defined in MySQL source code. So this program is derived work of MYSQL_SOURCE/strings/ctype-utf8.c. This program is the same license as MYSQL_SOURCE/strings/ctype-utf8.c and it is licensed under LGPLv2 only.