Skip to content

kanaung/wordlists

Repository files navigation

Kanaung-Wordlists

Wordlists dictionary for Burmese (Myanmar)

Under construction

We have built Burmese wordlists from Myanmar Letter Ka (U+1000) "က" to Myanmar Letter A (U+1021) "အ". Currently some words are not in order and duplicate words occur. We will fix these errors after completing "Burmese Sorting". Don't hesitate if you want to help with it.

Sources

Currently,all these words were taken from "Burmese Spelling Book", officially published in 2003 by Myanmar Department of Education Ministry. "မြန်မာစာလုံးပေါင်းသတ်ပုံကျမ်း(ဒုတိယနှိပ်ခြင်း ၊၂၀၀၃-ခုနှစ်၊ ဇွန်လ)" . We got a PDF file and detected it was encoded in standardized Unicode 5.1 or later.

Modifications

  1. As usual, PDF extraction cannot correctly detect text alignments, so some words are not in order, and ending-letters, such as Asat (U+103A) "်", Lower Vowel (U+1030) "ု" are missing and we had to add manually these letters.

  2. We consider the final lists to be clean and simple for other programming and research uses. This is why we removed all annotations explaining the correct usage of the words

Purposes

  1. For dictionary writers, these wordlists will be a useful source.

  2. For NLP(Natural Language Processing) researchers, it may be essential in several NLP works utilizing dictionary-lookup approach, such as POS-tagging, building N-grams, Myanmar-English bilingual corpora, applications in Myanmar OCR, etc.

Future

  1. We'll update the lists with new words

  2. Burmese sorting and related tools will be developed for several platforms.

About

Wordlists dictionary for Burmese (Myanmar)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published