A Khmer word segmentation tool built for NIPTICT (now CADT) Khmer Word Segmentation CRF model.
Important
km-5tag-seg-model
is required for this script to work. This library doesn't provide the model file.
pip install khmersegment
from khmersegment import Segmenter
segmenter = Segmenter("-m km-5tag-seg-model")
print(segmenter("Hello មិនដឹងប្រាប់អ្នកណាទេ?", deep=False))
# => ['Hello', ' ', 'មិន', 'ដឹង', 'ប្រាប់', 'អ្នកណា', 'ទេ', '?']
print(segmenter("Hello មិនដឹងប្រាប់អ្នកណាទេ?", deep=True))
# => ['Hello', ' ', 'មិន', 'ដឹង', 'ប្រាប់', 'អ្នក', 'ណា', 'ទេ', '?']
Apache-2.0
- pycrfpp Python binding for CRF++