Kurumi

Use MIRA to train a large amount of features.

Segment chinese(both traditional and simplified) sentences into words in high speed and correctly.

take care the name of the gem is different from the repo name!

Installation

Add this line to your application's Gemfile:

gem 'cseg'

And then execute:

$ bundle

Or install it yourself as:

$ gem install cseg

you need to install CRF++ first and set the environment variables.

On github the dictionary file was deleted since it is quite large, though you can get all from rubygems.

Recall and Precision

Tested on seghanbakeoff pku test set

Precision: 94.43%

Recall: 92.86%

Usage

#The default is Simplified Chinese
require "cseg"
Kurumi.segment("屌丝是一种自我讽刺。")
#=>["屌丝", "是", "一", "种", "自我", "讽刺", "。"]
#Use parameter "tr" to specify Traditional Chinese
Kurumi.segment("台妹真是正點。","tr")  
#=>["台妹", "真", "是", "正點", "。"]

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
lib		lib
.gitignore		.gitignore
Gemfile		Gemfile
LICENSE.txt		LICENSE.txt
README.md		README.md
Rakefile		Rakefile
cseg.gemspec		cseg.gemspec
cseg.rb		cseg.rb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kurumi

Installation

Recall and Precision

Usage

About

Releases

Packages

Languages

License

lengshuiyulangcn/kurumi

Folders and files

Latest commit

History

Repository files navigation

Kurumi

Installation

Recall and Precision

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages