Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
A small statistical segmenter for any language.
Ruby
tag: v0.1.0

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
lib
spec
.gitignore
.rspec
Gemfile
README.md
Rakefile
maxixe.gemspec

README.md

Maxixe

A simple statistical segmenter for any language

About

Maxixe is an implementation of the Tango algorithm describe in the paper "Mostly-unsupervised statistical segmentation of Japanese kanji sequences" by Ando and Lee. While the paper deals with Japanese characters, it should work on any unsegmented text given enough corpus data and a tuning of the algorithm paramenters.

How to use

First, you need a hash that contains the count of all n-grams in a given corpus.

Something went wrong with that request. Please try again.