Skip to content

seanghay/automatic-phonemic-and-phonetic-transcription

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 

Repository files navigation

Introduction

This repository stores source codes used in a research project entitled: "Phonological Principles and Automatic Phonemic and Phonetic Transcription of Khmer Words" which was presented in partial fulfillment of the requirements for the degree of Master OF Arts in Linguistics at the International College of Payap University, Thailand, in 2016.

There are two source codes:


  1. Ruby codes for data prepration processes
  2. Thrax codes for the conversion processes

Data Preparation (Dataset 02):
  • data --containing 18,948 entries from Khmer-Khmer Dictionary (1967)
  • cleanup.rb --removing stray characters, prefixes and duplicate entries
  • filter1.rb --removing Pali/Sanskrit loanwords using etymological tags
  • filter2.rb --removing P/S loanwords using diacritics and independent vowels
  • filter3.rb --removing P/S loanwords using pronunciation field
  • syl_group.rb --grouping native khmer words into their respective syllable groups
The conversion grammars:
  • automator_phonemic.grm --taking orthographic words one at a time and convert it into phonemic transcription.
  • automator_phonetic.grm --taking phonemic transcriptions one at a time and convert it into phonetic transcription.

Try them out

You may try these out by first running the Ruby codes in the aforementioned order (make sure you have Ruby installed). when you have all native words in their respective syllable groups, you may be able to the Thrax grammar, either the g2p one or the p2p one, on them one file at a time (make sure you have Thrax installed).

Inquiry

All inquiries should be redirected to makara_sok@hotmail.com.

Copyright © Makara Sok

Payap University 2016