Vietnamese morphological analysis with using CRF
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Vietnamese Morphological Anlyzer with CRF

Vietnamese morphological analysis with using CRF



Please download model file to ./path/to/model_file

from crfpp import CRF_PP

viet_morph_analyzer = CRF_PP('/path/to/model_file')

sentence = 'Số điện thoại của trường'
result = viet_morph_analyzer.analyze(sentence)

How to make model file

Get tagged Corpus

Convert format from vnPOS to IOB2 tag format

Corpus is given below format.

Tấp_nập//JJ sắm//VB đtdđ//NN đầu//NN năm//NC

Change format to IOB2 tag format.(Use only I tag and B tag.)

% cat vnPOS.txt | python ./utils/ > vnPOS.iob2
# Output likes below one.
Tấp		B-JJ
nập		I_JJ
sắm		B-VB
đtdđ	B-NN
đầu		B-NN
năm		B-NC


Training with CRF++

% crf_learn ./crf_template ./vnPOS.iob2 ./vnPOS.crfpp.model

"crf_template" is a feature template files. You can change features.