Skip to content

lulalala/lulalala_address_tokenizer

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
bin
 
 
lib
 
 
 
 
 
 
 
 
 
 

LulalalaAddressTokenizer

Postal addresses tokenizer using Wapiti model.

Intended for addresses in CJK (Chinese, Japanese Korean) characters. After wapiti model labels each token(character), this gem combines adjacent word of the same label together. This is important for CJK languages because its phrases (combination of words) are not separated by spaces.

台灣地址分詞用

Installation

Add this line to your application's Gemfile:

gem 'lulalala_address_tokenizer'

And then execute:

$ bundle

Or install it yourself as:

$ gem install lulalala_address_tokenizer

Usage

tokenizer = LulalalaAddressTokenizer.new('address.mod')
tokenizer.parse("AA縣BB鎮CC路D號")
# {"city"=>"AA縣", "district"=>"BB鎮", "street"=>"CC路", "housenumber"=>"D號"}

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/lulalala_address_tokenizer.

About

Postal addresses tokenizer using Wapiti model

Resources

Stars

Watchers

Forks

Packages

No packages published