Skip to content

fukuchi/Omeka-S-module-ngram-search

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 

N-gram search (module for Omeka S)

N-gram search is a module for Omeka S that enables CJK-ready full-text search using MySQL's n-gram tokenizer.

The default installation of the full-text search feature of the Omeka-S is not CJK (Chinese, Japanese, Korean) ready because the apropriate tokenizer is not used. This module simply activates n-gram tokenizer by modifying the table information that internally used by Omeka-S.

Installation

Preparation

First of all, backup the database. This module modifies the table schema, and that may cause unrecoverable failure.

This modules requires MySQL 5.6 or later. MariaDB currently does not provide n-gram tokenizer. If you want to enable CJK-ready search with MariaDB, try Mroonga search instead.

From ZIP

See the release page and download the latest NgramSearch.zip from the list. Then unzip it in the modules directory of Omeka-S, then enable the module from the admin dashboard. Read the user manual of Omeka-S for further information.

From GitHub

Please do not forget to rename the directory from Omeka-S-ngram-search to NgramSearch in the modules directory.

Notes

This module highly depends on the database structure of Omeka-S 2.x. If you are upgrading Omeka-S from 2.x to 3.x or later, we highly recommend you to uninstall this module before upgrading.

We have not heavily tested MySQL's n-gram tokenizer with large sized data yet. For advanced full-text search, we recommend you to check the Solr module.

Licensing information

Copyright (c) 2020 Kentaro Fukuchi

This module is released under the MIT License. See the LICENSE file for the details.

About

Omeka S module enabling MySQL's N-gram tokenizer for CJK full-text search

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages