Skip to content

Word segmentation using Conditional Random Fields (CRF) for Khmer document

Notifications You must be signed in to change notification settings

phylypo/segmentation-crf-khmer

Repository files navigation

khmer-crf-segmentation

Word segmentation using Conditional Random Fields (CRF) for Khmer document

See the detail article here:

https://medium.com/@phylypo/segmentation-of-khmer-text-using-conditional-random-fields-3a2d4d73956a

This project includes Python notebook that has the complete code to run the CRF. The notebook includes code to download/extract the data and trains the model.

  • CRF-Khmer-Segmentation.ipynb: Implementation using CRF
  • HMM_Khmer_Segmentaion.ipynb: Using Hidden Markov Model (HMM)
  • sklearn_Khmer_segmentation.ipynb: Naive Bayes and other sklearn algorithms (Random Forest and Linear Regression got to 93%, Naive Bayes is around 89%)

If you open this from Google Colab, you can run right away without any further setup.

See instruction here:

https://medium.com/@phylypo/open-python-notebook-from-github-9177ab819b53

About

Word segmentation using Conditional Random Fields (CRF) for Khmer document

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published