Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
18043 lines (16992 sloc) 331 KB

Khmer Character Specification/Usages

Author: Makara Sok
Contributors: Marc Durdin,
Todd Bequette,
Becky Bequette,
Seth Wilson, and
Martin Hosken
Last Update: May 3, 2019

Table of Contents

  1. Introduction
  2. Khmer Language and Khmer Script
  3. The Linguistic Situation in Cambodia
  4. Khmer Phoneme Inventory
  5. Khmer Orthography
  6. Ligatures
  7. Unicode Encoding
  8. Text Processing
  9. Application of Khmer Script to Other Languages
  10. Stone Inscriptions (Pre-Angkor, Angkor, Post-Angkor Era)
  11. Sample Texts for Orthography Check
  12. Wordlist for Orthography Check
  13. Summary

References

Appendixes

1. Introduction

The Khmer language is the national language of the Kingdom of Cambodia and the Khmer script is the official script used not only in writing the Khmer language, but other ethnic minority languages. It is also used to write religious documents (i.e. Dharma).

This document seeks to describe how the Khmer script is used in various languages nowadays, namely:

  • Khmer language,
  • ethnic minority languages,
  • Pali and
  • Sanskrit.

Before getting into greater details, let’s look at the evolution of the Khmer script.

2. Khmer Language and Khmer Script

According to the Unicode Consortium (2017:631), the Khmer script is the official script of Cambodia and it is descended from the Brahmi script. It has been in use for more than 1400 years. During this period of time, the script has evolved substantially.

According to a study on Khmer letters (Doek 2000:5), Khmer letters evolved from Brahmi script. Deok (2000:13) and Maspero (1915:48) claim, as noted by Kul (2008:179-181), that Khmer letters (Khmer Script) have undergone ten distinct evolutions since the 6th century. The list below gives the details on when the changes happened and inscription it was used on.

  • 6th century, inscription found in the reign of Pavavarman
  • early 7th century, in the reign of Mahendravarman Jetrasen
  • 667 AD, in the reign of Jayavarman I
  • 970 AD, in the reign of Jayavarman V
  • 1002 AD, in the reign of Suryavarman I
  • 1066 AD, in the reign of Utyadityavarman II
  • 12th century, in the reign of Suryavarman II
  • 13th century, in the reign of Jayavarman VII
  • 1702 AD, in the mid of the Udong era
  • Nowadays

During the French protectorate era, an attempt to romanize Khmer language was made, but it was not welcomed because it was seen as an attack on traditional learning and the Khmer society (Scheuren 2010:19). The process of Khmer romanization was then failed, and it gave rise to Khmerization.

3. The Linguistic Situation in Cambodia

3.1. Languages of Cambodia

There are 27 languages spoken in Cambodia, according to the Ethnologue (Eberhard et al. 2019).

Status # Languages with Ethnologue Codes
national 1 Khmer [khm]
wider communication 3 Min Nan Chinese [nan], English [eng], French [fra]
dispersed 2 Thai [tha], Vietnamese [vie]
developing 3 Brao [brb], Jarai [jra], Central Mnong[1] [cmo]
vigorous 5 Western Cham [cja], Kaco’ [xkk], Hakka Chinese [hak], Kraol [rka], Tampuan [tpu]
threatened 6 Kavet [krv], Krung [krr], Lao [lao], Lao Phuon [phu], Mel-Khaonh [hkn], Stieng Bulo [sti]
shifting 1 Kuay[2] [kdt]
moribund 3 Pear [pcb], Somray [smu], Su’ung [syo]
nearly extinct 1 Chung [scq]
dormant 2 Chong [cog], Samre [sxm]
Total 27

3.2. Languages Using Khmer Script

Other than the Khmer language, Khmer script is also used to write ethnic minority languages, such as: Kuay, Bunong, Tampuan, Jarai, Krung, Brao and Kavet.

Language Name Population (in Cambodia) Location (Provinces in Cambodia) Language Status (EGIDS)
Kuay 37,700 Kampong Thom and Preah Vihear 7 (Shifting)
Bunong 37,500 Kratie, Mondolkiri 5 (Developing)
Tampuan 31,000 Ratanakiri 6a (Vigorous)
Jarai 20,800 Ratanakiri 5 (Developing)
Krung 20,700 Ratanakiri and Stung Treng 6b (Threatened)
Brao 9,030 Ratanakiri 5 (Developing)
Kavet 6,220 Ratanakiri and Stung Treng 6b (Threatened)

More information on each language can be found on the ethnologue website at https://www.ethnologue.com/country/kh/languages.

4. Khmer Phoneme Inventory

Khmer phoneme inventory varies from one scholar to another. The inventory hereunder is adapted from Huffman (1970:6-11) and Ehrman (1972:4-9) as stated in Sok (2016:11-13) and Headley (2014:x).

4.1 Consonants

4.1.1 Initial Consonants

There are 21 consonant phonemes shown in the table below. Twelve of them can only occur in the initial position (i.e. they are marked with a hyphen).

Bilabial Alveolar Palatal Velar Glottal
Plosives /p/ /t/ /c/ /k/ /ʔ/
Asp. Plosives /pʰ-/ /tʰ-/ /cʰ-/ /kʰ-/
Implosives /ɓ-/ /ɗ-/
Fricatives /s-/ /h/
Nasals /m/ /n/ /ɲ/ /ŋ/
Semi-vowels /w/ /j/
Lateral /l/
Flap /r-/

There is still a discussion on whether the following phonemes are a combination of two phonemes: (unaspirated plosive + glottal fricative), i.e. (a) /ph, th, ch, kh/ versus (b) /pʰ, tʰ, cʰ, kʰ/. One might argue that (a) is likely to be more reasonable because the aspiration can be separated from the plosive when there is an infix. i.e. ខុស /khoh/ ‘to be wrong’ + -ម- /m/, កំហុស /kɑm.hoh/ ‘mistake’.

For some foreign loan words, certain phonemes are used, yet they are not included in the table above as it is reserved for native phonemes. The phonemes are: labiodental fricative /f/, voiced alveolar fricative /z/, palatal fricative /ʃ/ and an alveolar implosive /ɠ/. They are usually found at the beginning of a syllable.

4.1.2. Initial Consonant Clusters

Initial consonant clusters may be composed of two or three consonants (i.e. C1C2 or C1C2C3).

C1 = { p t c k s ʔ m l }

C2 = { p t c k ʔ ɓ ɗ m n ɲ ŋ w j l r s h }

C3 = { r }

4.1.3. Final Consonants

Aspirated plosives (pʰ, tʰ, cʰ, kʰ) , implosives (ɓ, ɗ, ɠ), fricatives (f, s, z, ʃ; except glottal fricative /h/) and a flap (r) can never occur in the syllable final position. Each consonant in the syllable final position is pronounced without any audible release.

F = { p t c k ʔ h m n ɲ ŋ l w j }

Some foreign loanwords (i.e. English or French) may break this rule. For instance, ហ្វេសប៊ុក ‘Facebook’ may be pronounced as /fees.ɓuk/.

Also note that the orthographic representation may be different from the actual pronunciation when it comes to Pali/Sanskrit loanwords. See Final Consonants in Khmer Orthography.

4.2. Vowels

Khmer vowels are divided into two main groupsː monophthongs and diphthongs.

4.2.1. Monophthongs

There are 18 monophthongsː 10 long monophthongs and 8 short monophthongs.

Front Central Back
High i ii ɨ[3] ɨɨ u uu
Mid e ee[4] ə əə[5] o oo[6]
Mid-Low ɛɛ ɔɔ
Low a aa ɑ ɑɑ

4.2.2. Diphthongs

There are 9 diphthongsː / iə, ie, ae, ao, aə, ɨə, uə, ea, oa/. Vowels whose second member is a semi-vowel (i.e. /j/ or /w/) are not considered as diphthongs Ehrman (1972:9). A complete list of them is given belowː

  • /ɨw/, /əw/, /aw/
  • /ɨj/, /əj/ and /aj/.

V = { i ii ɨ ɨɨ u uu e ee ə əə o oo ɛɛ ɔɔ a aa ɑ ɑɑ iə ie ae ao aə ɨə uə ea oa }

4.3. Stress

Stress falls on the last syllable and it is not phonemic (Schiller 1994:312). Therefore it is not included in the phonemic transcription.

4.4. Syllable Structures

Khmer syllable structures are divided into three typesː monosyllables, disyllables and polysyllables (Huffman 1970:11-12).

Syllable Structures Description
Monosyllables C1C2C3VF The onset can have up to three consonants in a row. The final consonant is optional.
Disyllables minor syllable + major syllable
where:
- minor syllable is C(r)V(N)
  • C, any consonant
  • r, the flap /r/
  • V, any vowel
  • N, any nasal consonant

- major syllable is C1C2V(F)
  • C1, any C1 consonant
  • C2, any C2 consonant
  • V, any vowel
  • F, any final consonant
A disyllabic word is made up of a minor syllable and a major syllable. The minor syllable is susceptible to extreme syllable reduction (Huffman 1972ː59-60). Here are some examplesː
  • /cɑm.kaa/ ‘farm’ > [cə.kaa]
  • /kɑn.ɗaal/ ‘middle’ > [kə.ɗaal]
  • /kaɲ.craeŋ/ ‘bamboo basket’ > [kə.craeŋ]
  • /ɓɑŋ.riən/ ‘to teach’ > [pə.riən]
  • /rɔ.ɓɑɑŋ/ ‘fence’ > [lə.ɓɑɑŋ]
  • /prɑ.hael/ ‘perhaps’ > [pə.hael]
Polysyllable multiple monosyllables This syllable type is common in foreign loanwords, i.e. Pali/Sanskrit, English, French, etc.

4.5. Series Assimilation

Sok (2016:24) wrote:

Series assimilation, សម្របសូរ or ការលំឱនសូ "lit. sound adaptation" as proposed by Prom (2013:24) and ទំនាញសូរ "lit. sound inclination" by Chan (2010:100), is the core of Khmer pronunciation and it is regularly correlated with the writing system. The series system and the sonority hierarchy drive the series assimilation phenomenon. Huffman (1970:43) refers to this phenomenon as Vowel Governance, for the pronunciation of a vowel symbol, including the invisible inherent vowel, is determined or governed by the series of the initial consonant symbol it is attached to (see Table 6 above). Thus, it is simple to pronounce a Khmer word which begins with a single consonant and a consonant cluster whose members are from the same series. For instance,

  • ក ‘neck’ and កា (ក+ា) 'cup' are pronounced as /kɑɑ/ and /kaa/ respectively because these words begin with first series consonant symbol (ក), and the first series of the inherent vowel is /ɑɑ/ and the first series of vowel symbol ា is /aa/.
  • គ ‘to be mute’ and គា (គ+ា) are pronounced as /kɔɔ/ and /kie/ respectively because they begin with the second series consonant symbol (គ), and the second series of the inherent vowel is /ɔɔ/ and the second series of vowel symbol ា is /ie/.
  • ក្បាល (ក+្+ប+ា+ល) 'head' is pronounced as /kɓaal/ because ក /k/ and ប /ɓ/ are in the first series, and the first series of vowel symbol ា is /aa/.
  • គ្នា (គ+្+ន+ា) 'first person singular (informal)' is pronounced as /knie/ because គ and ន are both in the second series, and the second series of the vowel symbol ា is /ie/.

After describing three accounts on the series assimilation, Sok (ibidː26-27) categorized the orthographic consonants into five groupsː IMPLOSIVE, UNASPIRATED PLOSIVE, ASPIRATED PLOSIVE, FRICATIVE, and SONORANT (See the table below). The least sonorous consonant of an initial cluster determines the series of the following vowel (Huffman (1970ː44), Kin (2007ː63) and Chan (2010ː107)).

The table below reads IMPLOSIVE consonants are the least sonorous and SONORANT consonants are the most sonorous. Those in between are ordered by increasing sonority. For instance. IMPLOSIVE consonants are less sonorous than UNASPIRATED PLOSIVE, and UNASPIRATED PLOSIVE are less sonorous than ASPIRATED PLOSIVE, and so forth.

IMPLOSIVE 1st Series ដ ប
2nd Series ឌ -
Phoneme ɗ ɓ
UNASPIRATED PLOSIVE 1st Series ក ច ត - អ[7]
2nd Series គ ជ ទ ព -
Phoneme k c t p ʔ
ASPIRATED PLOSIVE 1st Series ខ ឆ ឋ ថ ផ
2nd Series ឃ ឈ ឍ ធ ភ
Phoneme kʰ cʰ tʰ tʰ pʰ
FRICATIVE 1st Series ស ហ
2nd Series - -
Phoneme s h
SONORANT 1st Series - - ណ - - - ឡ -
2nd Series ង ញ ន ម យ រ ល វ
Phoneme ŋ ɲ n m j r l w

For examples, the vowel of ល្ហុង ‘papaya’ takes the 1st series because the least sonorous consonant of the initial cluster is in the 1st series.

ល្ហុង ‘papaya’
character sequence ្ហ
character mapping l h o|u ŋ
series of character 2nd 1st 1st|2nd 2nd
series of syllable 1st
phonemic transcription /lhoŋ/

ខ្នាយ ‘spur’ is pronounced with the 1st series of the vowel governed by the series of the least sonorous initial consonant, ខ /kʰ/.

ខ្នាយ ‘spur’
character sequence ្ន
character mapping n aa|ie j
series of character 1st 2nd 1st|2nd 2nd
series of syllable 1st
phonemic transcription /knaaj/

The series assimilation does not only occur in the monosyllabic words, but also disyllabic words. The rule is the same, the vowel of the second syllable (a.k.a. major syllable) is determined by the series of the least sonorous initial consonant of the first syllable (a.k.a. minor syllable). However, this comes with a caveat.

Sok (ibid:29) describes:

Khin (2007:63) wrote that, in disyllabic words, if the initial consonant of the major syllable is one of these (ង /ŋ/, ញ /ɲ/, ន /n/, ម /m/, យ /j/, រ /r/, ល /l/, and វ /w/), the vowel of the major syllable should be in the same series as the series of the initial consonant of the minor syllable. [...]
Exceptions:
• អន្លក់ 'vegetable' is pronounced as /ʔɑn.luək/, not /ʔɑn.lɑk/ (if assimilation occurs)
• អង្រែ 'pestle' is pronounced as /ʔɑŋ.rɛɛ/, not /ʔɑŋ.rae/ (if assimilation occurs)

For instance, បន្លា ‘thorn’ is pronounced as /ɓɑn.laa/ not /ɓɑn.lie/ because the minor syllable is in the 1st series, and the initial of the major syllable is one of the consonant listed above.

បន្លា ‘thorn’
character sequence ្ល
character mapping ɓ n l aa|ie
series of character 1st 2nd 2nd 1st|2nd
series of syllable 1st 1st
phonemic transcription /ɓɑn.laa/

Sok (ibid:30) continues:

[...] if each syllable of a disyllable word begins with a plosive or fricative, the series of each vowel is determined independently. Thus, for example, បន្ទាយ (ប /ɓ/, ន /n/, ទ /t/, ា /aa|ie/, យ /j/) 'barracks' is pronounced as /ɓɑn.tiej/ (not /ɓɑn.taaj/) and ទន្សាយ (ទ /t/, ន /n/, ស /s/, ា /aa|ie/, យ /j/) 'rabbit' is pronounced as /tun.saaj/ (not /tun.siej/). Khin (2007:63) and Chan (2010:107) also agree that if the initial consonant of the major syllable is a strong consonant group (i.e. implosives, plosives, or fricative) then the series of the vowel of the major and minor syllable should be independent from one another (i.e. no series assimilation between the two syllables), as show in the example below.

បន្ទាយ ‘barracks’
character sequence ្ទ
character mapping ɓ n t aa|ie j
series of character 1st 2nd 2nd 1st|2nd 2nd
series of syllable 1st 2nd
phonemic transcription /ɓɑn.tiej/
ទន្សាយ ‘rabit’
character sequence ្ស
character mapping ɓ n t aa|ie j
series of character 2nd 2nd 1st 1st|2nd 2nd
series of syllable 2nd 1st
phonemic transcription /tun.saaj/

5. Khmer Orthography

5.1. Series System

“Series” is the term used by Huffman (1970ː15) to refer to two distinct group of consonants which determine how a syllable should be pronounced, 1st series and 2nd series. They are also called សំឡេងតូច ‘small voice’ and សំឡេង​ធំ ‘big voice’ or សំឡេង​ស្រាល ‘light voice’ and សំឡេង​ធ្ងន់ ‘heavy voice’ respectively. Henderson (1952ː151) refers to this term as “registers”, 1st register and 2nd register. The pitch of the 2nd register tends to be lower than the 1st register.

In an essence, the vowel quality is changed in harmony with the series of the initial consonant or consonant cluster while its orthographic representation stays the same. For example, consonant ក /k/ is in the 1st series and consonant គ /k/ is in the 2nd series. Vowel ា is realized phonemically as /aa/ in the 1st series and as /ie/ in the 2nd series. When ក is concatenated with ា, it creates a syllable which is pronounced as /kaa/. When គ is concatenated with ា, the syllable is pronounced as /kie/.

ក /k/ (1st series) + ា /aa|ie/ > កា /kaa/

គ /k/ (2nd series) + ា /aa|ie/ > គា /kie/

When a word begins with an initial consonant cluster, the Series Assimilation comes in to play--the least sonorous consonant determines the series of the vowel.

5.2. Phoneme-Grapheme Correspondence

Phoneme-grapheme correspondence of initial consonants, modified initial consonants, initial consonant digraphs, initial consonant clusters and final consonants are illustrated in tables in the following sections. Glosses are adopted from the Khmer-English dictionary (Headley 1997).

5.2.1 Initial Consonants

5.2.1.1. Initial Consonant Monographs

The table below illustrates the grapheme-phoneme correspondences with their Unicode code points, series and examples.

Grapheme Code Points Phoneme Series Examples
U+1780 k 1st កាត់ /kat/ ‘to cut’
U+1781 1st ខាត់ /kʰat/ ‘to polish’
U+1782 k 2nd គាត់ /koat/ ‘3P’
U+1783 2nd ឃាត់ /kʰoat/ ‘to forbid’
U+1784 ŋ 2nd ងារ /ŋie/ ‘role’
U+1785 c 1st ចាត់ /cat/ ‘to designate’
U+1786 1st ឆត្រ /cʰat/ ‘umbrella’
U+1787 c 2nd ជាត់ /coat/ ‘to drain off (liquid)’
U+1788 2nd ឈោង /cʰooŋ/ ‘to reach for (something)’
U+1789 ɲ 2nd ញាត់ /ɲoat/ ‘to stuff (something) in tightly’
U+178A ɗ 1st ដៃ /ɗaj/ ‘hand/arm’
U+178B 1st ឋាន /tʰaan/ ‘place/location’
U+178C ɗ 2nd ឌឺ /ɗɨɨ/ ‘to be obstinate’
U+178D 2nd ឍាល /tʰiel/ ‘shield’
U+178E n 1st ណាត់ /nat/ ‘to set (an appointment)’
U+178F t 1st តោង /taoŋ/ ‘to hang on to’
U+1790 1st ថា /tʰaa/ ‘to say/tell’
U+1791 t 2nd ទា /tie/ ‘duck’
U+1792 2nd ធាត់ /tʰoat/ ‘fat’
U+1793 n 2nd នាឡិ /niel/ ‘unit of weight (~12oz.)’
U+1794 ɓ 1st បាត់ /ɓat/ ‘to disappear’
U+1795 1st ផាត់ /pʰat/ ‘to blow away (of the wind)’
U+1796 p 2nd ព័ទ្ធ /poat/ ‘to besiege’
U+1797 2nd ភក់ /pʰuək/ ‘mud’
U+1798 m 2nd មាត់ /moat/ ‘mouth’
U+1799 j 2nd យោង /jooŋ/ ‘to pull upward’
U+179A r 2nd រោង /rooŋ/ ‘roofed structure’
U+179B l 2nd លា /lie/ ‘donkey’
U+179C w 2nd វារ /wie/ ‘to crawl’
U+179F s 1st សក់ /sɑk/ ‘hair’
U+17A0 h 1st ហក់ /hɑk/ ‘to jump’
U+17A1 l 1st ឡាន /laan/ ‘car/truck’
U+17A2 ʔ 1st អាន /ʔaan/ ‘to read’
5.2.1.2. Subscript Consonants

A subscript (a.k.a. Coeng, lit. ‘foot’) is an alternate form of a consonant which is usually placed after a consonant to form a consonant cluster. In Khmer orthography, if two consonants are put one after another, that means the first consonant is the initial consonant and the other one is the final consonant. កង and ក្ង are different and they are pronounced differently, /kɑɑŋ/ and /kŋɑɑ/ respectively.

A list of subscripts are included below. Thun (2011:17-18) wrote, “ញ and ឡ must be written with their subscript (​្ញ and ្ឡ), but the subscript of ឡ (​្ឡ) /l/ is never used. [...] When subscript ញ (​្ញ) is placed under itself, it has to be replaced by its full form, but smaller in size, i.e. ញ្ញ. Consonant ដ /ɗ/ and ត /t/ have the same subscript because most words written with ត /t/ are pronounced as ដ /ɗ/, for example, មាតាបិតា /mie.ɗaa.ɓej.ɗa/, តំណាង /ɗɑm.naaŋ/ [...].”

Grapheme Unicode Code Points of the Subscript Phoneme Series
Consonant Subscript
្ក U+17D2 U+1780 k 1st
្ខ U+17D2 U+1781 1st
្គ U+17D2 U+1782 k 2nd
្ឃ U+17D2 U+1783 2nd
្ង U+17D2 U+1784 ŋ 2nd
្ច U+17D2 U+1785 c 1st
្ឆ U+17D2 U+1786 1st
្ជ U+17D2 U+1787 c 2nd
្ឈ U+17D2 U+1788 2nd
្ញ U+17D2 U+1789 ɲ 2nd
្ដ U+17D2 U+178A ɗ 1st
្ឋ U+17D2 U+178B 1st
្ឌ U+17D2 U+178C ɗ 2nd
្ឍ U+17D2 U+178D 2nd
្ណ U+17D2 U+178E n 1st
្ត U+17D2 U+178F t 1st
្ថ U+17D2 U+1790 1st
្ទ U+17D2 U+1791 t 2nd
្ធ U+17D2 U+1792 2nd
្ន U+17D2 U+1793 n 2nd
្ប U+17D2 U+1794 ɓ 1st
្ផ U+17D2 U+1795 1st
្ព U+17D2 U+1796 p 2nd
្ភ U+17D2 U+1797 2nd
្ម U+17D2 U+1798 m 2nd
្យ U+17D2 U+1799 j 2nd
្រ U+17D2 U+179A r 2nd
្ល U+17D2 U+179B l 2nd
្វ U+17D2 U+179C w 2nd
្ស U+17D2 U+179F s 1st
្ហ U+17D2 U+17A0 h 1st
្ឡ U+17D2 U+17A1 l 1st
្អ U+17D2 U+17A2 ʔ 1st

Subscript(s) may occur in the initial clusters, disyllabic words and final clusters. They can be stacked up to two in a row; no more than that is allowed.

Here are some examples of when subscript(s) occur. The subscripts are highlighted.

  • in initial cluster: ល្ហុង = ល ្ហ ុ ង
  • in disyllabic word: បន្លា = ប ន ្ល ា
  • in final cluster: សាស្ត្រ = ស ា ស ្ត ្រ
  • after an independent vowel: ឱ្យ = ឱ ្យ

In rare occurrences, independent vowel can be a subscript, i.e. សុហ្ឫទ /soʔ.rɨt/ ‘buddy’.

5.2.1.3. Modified Initial Consonants

Khmer consonants can be changed from one series to another if they do not have a counterpart. Those consonant are listed below. The characters used to modify these consonants are described in the following section (Inherent Vowels and Consonant Shifters).

Here is a list of consonants which can be modified.

Grapheme Code Points Phoneme Series Examples
ង៉ U+1784 U+17C9 ŋ 1st ង៉ៃ /ŋaj/ ‘day (colloquial)’
ញ៉ U+1789 U+17C9 ɲ 1st ញ៉ាំ /ɲam/ ‘to eat’
ម៉ U+1798 U+17C9 m 1st ម៉ឺន /məən/ ‘ten thousand’
យ៉ U+1799 U+17C9 j 1st យ៉ាង /jaaŋ/ ‘way, kind’
រ៉ U+179A U+17C9 r 1st រ៉ក /rɑɑk/ ‘pulley’
វ៉ U+179C U+17C9 w 1st វ៉ៃ /waj/ ‘hit, fight (colloquial)’
ប៉ U+1794 U+17C9 p 1st ប៉ិន /pən/ ‘to be skillful’
ប៊ U+1794 U+17CA ɓ 2nd ប៊ុត /ɓut/ ‘topaz’
ស៊ U+179F U+17CA s 2nd ស៊ើប /səəp/ ‘to investigate’
ហ៊ U+17A0 U+17CA h 2nd ហ៊ាន /hien/ ‘to be bold’
អ៊ U+17A2 U+17CA ʔ 2nd អ៊ំ /ʔum/ ‘aunt, uncle’
5.2.1.4. Initial Complex Consonants

Eight initial complex consonants are used to transliterate foreign loanwords where there is no parallel single orthographic consonant. To fit into the Khmer spelling convention, they also have to have their counterparts which, in an essence, are modified by Consonant Shifter.

Grapheme Unicode Phoneme Series Examples
ហ្វ U+17A0 U+17D2 U+179C f 1st ហ្វូន /foon/ ‘phone’
ហ្វ៊ U+17A0 U+17D2 U+179C U+17CA 2nd ហ្វ៊ីល /fiil/ ‘film’
ស្ហ U+179F U+17D2 U+17A0 ʃ 1st ស្ហេបាប់ /ʃee.ɓap/ ‘Shebab’
ស្ហ៊ U+179F U+17D2 U+17A0 U+17CA 2nd ស្ហ៊ីអ៊ីត /ʃii.ʔiit/ ‘Shiite’
ហ្ស U+17A0 U+17D2 U+179F z 1st អាហ្សង់ទីន /ʔaa.zɑŋ.tin/ ‘Argentina’
ហ្ស៊ U+17A0 U+17D2 U+179F U+17CA 2nd ហ្ស៊ូម /zuum/ ‘zoom’
ហ្ក U+17A0 U+17D2 U+1780 ɠ 1st ហ្កាណា /ɠaa.naa/ ‘Ghana’
ហ្គ U+17A0 U+17D2 U+1782 2nd ហ្គេម /ɠeem/ ‘game’

5.2.2. Consonant Clusters

5.2.2.1. Two Consonant Clusters

This spreadsheet shows the instances of consonant clusters occurring in word-initial position (in transparent cells, for examples see Appendix A), in word medial position (in grey cells, for examples see Appendix B) and in final position (in yellow cells, for examples see Appendix C). Consonants on the top rows are the first member of the clusters (C1), and those on the far left column are the second member (C2), which are in subscript forms. The IPA corresponding to each consonant is included in the row and column next to C1 and C2. The orthographic realization of the clusters are placed in the intersecting cell. For example, the first cluster (ឆ្ក) is composed of ឆ and ក. ក and ក, ខ and ក, គ and ក, ឃ and ក, and ច and ក do not make a cluster, which is why the intersecting cells are empty.

Note that ឋ /tʰ/ almost never have a subscript after it, only one instance that ឋ /tʰ/ is followed by a subscript as in ដ្ឋ្យ (ដ ្ឋ ្យ).

Subscript ឡ /l/ is not included here, as mentioned above that it is never been used in the contemporary Khmer.

ឱ្យ (or ឲ្យ, as an alternative spelling) “to give” is a commonly used word that does not follow the Khmer spelling convention due to its historical background. It is the one and only instance that a subscript occurs after an independent vowel in contemporary Khmer.

ហ្ឫ is the only instance that an independent vowel occurs as a subscript form. It is not commonly used though. For instance, ហ្ឫទ័យ is usually written as ហឫទ័យ.

Both ឱ្យ and ហ្ឫ are found as headwords in the Chuon Nath Dictionary, the official Khmer-Khmer dictionary published in 1967.

5.2.2.2. Three Consonant Clusters

Initial consonant clusters composed of three consonants are rarely found in Khmer. They usually occur in foreign loanwords. There are three instances of three consonants in a row in the initial position in the headwords of the Khmer-Khmer dictionary.

  • ស្ត្រ- ស ្ត ្រ as in ស្ត្រី /strəj/ ‘woman’
  • ហ្វ្រ- ហ ្វ ្រ as in ហ្វ្រង្ក /frɑŋ/ ‘French currency’

In the present-day Khmer, there are more instances of words with the initial clusters of three consonants. These are very rare as well. Here are a few examples:

  • ស្ព្រ- ស ្ព ្រ as in ស្ព្រីង /spriiŋ/ ‘spring’
  • ស្គ្រ- ស ្គ ្រ as in ស្គ្រីន /skriin/ ‘screen’
  • ស្ទ្រ ស ្ទ ្រ as in ស្ទ្រីត /striit/ ‘street’
  • ហ្គ្រ- ហ ្គ ្រ as in ហ្គ្រីន /ɠriin/ ‘green’
  • ហ្វ្ល- ហ ្វ ្ល as in ហ្វ្លាស /flaah/ ‘flash’

Three consonant clusters in medial position:

  • -ក្ស្ម- ក ្ស ្ម as in លក្ស្មី /leak.sməj/ ‘wellness, glory’
  • -ដ្ឋ្យ- ដ ្ឋ ្យ as in បិដ្ឋ្យដ្ឋិក​សត្វ /pət.tjat.tʰeʔ.kaʔ.sat/ ‘vertebrae’
  • -ស្ក្រ- ស ្ក ្រ as in សំស្ក្រឹត /saŋ.skrət/ ‘Sanskrit’
  • -ស្គ្វ- ស ្គ ្វ as in ប៊ិស្គ្វីត៍ /ɓii.skwii/ ‘biscuit’
  • -ង្ក្រ- ង ្ក ្រ as in អង្ក្រង /ʔɑŋ.krɑɑŋ/ ‘k.o. large red ant’
  • -ង្ខ្យ- ង ្ខ ្យ as in សង្ខ្យា /sɑŋ.kjaa/ ‘counting’
  • -ង្គ្រ- ង ្គ ្រ as in សង្គ្រោះ /sɑŋ.kruəh/ ‘to save/rescue’
  • -ង្ក្ល- ង ្ក ្ល as in អង្គ្លេស /ʔɑŋ.kleh/ ‘English’
  • -ង្ឃ្រ- ង ្ឃ ្រ as in សង្ឃ្រាជ /sɑŋ.kreac/ ‘monk chief’
  • -ញ្ច្រ- ញ ្ច ្រ as in ចិញ្ច្រាំ /cəɲ.cram/ ‘to chop repeatedly’
  • -ញ្ជ្រ- ញ ្ជ ្រ as in កញ្ជ្រោង /kaɲ.crooŋ/ ‘fox’
  • -ន្ទ្រ- ន ្ទ ្រ as in កន្ទ្រាញ /kɑn.trieɲ/ ‘chief of a clane’
  • -ន្ធ្យ- ន ្ធ ្យ as in សន្ធ្យា /sɑn.tjie/ ‘twilight’
  • -ន្ត្រ- ន ្ត ្រ as in កន្ត្រៃ /kɑn.traj/ ‘scissors’

Three consonant clusters are also found in the final position.

  • -ន្ត្រ ន ្ត ្រ as in កន្ត្រ /kɑn.trɑɑ/ ‘wheelless pulley’
  • -ស្ត្រ ស ្ត ្រ as in តារាសាស្ត្រ /ɗaa.raa.sah/ ‘astronomyy’
  • -ន្ទ្រ ន ្ទ ្រ as in សុរេន្ទ្រ /soʔ.reen/ ‘Indra’
  • -ក្ត្រ ក ្ត ្រ as in ភក្ត្រ /pʰeak/ ‘face’

រ /r/ seems to play an important role in the three consonant clusters. According to the Chuon Nath dictionary data, it usually occurs as the last member of the cluster, but never to be found in between the first and the last member, or it is not pronounceable. For instance, ស្ព្រីង should be typed as it is spelled, like (ស /s/+ ្ព /p/+ ្រ /r/+ ី /ii/+ ង/ŋ/ → /spriiŋ/), not (ស /s/+ ្រ /r/+ ្ព /p/+ ី /ii/+ ង /ŋ/ → /srpiiŋ/). For a list of 50 instances of words with three consonant in the final position (i.e. one consonant followed by two subscripts), please refer to Appendix D.

5.2.3. Final Consonants

A final consonant occurs after a vowel in an orthographic syllable. For instance, ក /k/ in ស្អែក /sʔaek/ ‘tomorrow’ is a final consonant because it is in the coda position of the syllable and it is pronounced after the vowel ែ /ae/. However, it is impossible to do syllable analysis algorithmically because of the ambiguities in determining where the syllable boundaries in a sentence given that spaces are not used in between words in Khmer.

According to the data in the Khmer-Khmer dictionary, it is observed that six orthographic consonants are not found in the coda position of orthographic words. They are ឆ /cʰ/, ឈ /cʰ/, ផ /pʰ/, ហ /h/, ឡ /l/ and អ /ʔ/. There are instances which could be confusing whether they are a final consonant or an initial consonant. For instance, ក្រឡ ‘jug’ is composed of three orthographic characters: ក ្រ ឡ, and it may be thought that ឡ is the final consonant. It is actually not a case. It is, instead, the initial consonant of the second syllable with an inherent vowel. The word is pronounced as /krɑ.lɑɑ/, not /krɑɑl/.

Consonant clusters are found in the coda position of Pali/Sanskrit loanwords. These are unpredictable and the combination can be composed of any two or three consonants which are always left unpronounced, unless an orthographic vowel is placed after them.

សាស្ត្រ /sah/ ‘science or subject’ (no vowel after the final consonant cluster)

សាស្ត្រា /sah.straa/ ‘palm leaf manuscript’ (with a vowel after it)

សាស្ត្រ does not have a vowel after the cluster -ស្ត្រ, which is why it is left out of the pronunciation; សាស្ត្រា, however, has ា after the cluster (-ស្ត្រ). In this case, another syllable is created--ស្ត្រា /straa/.

F = {ក ខ គ ឃ ង ច ជ ញ ដ ឋ ឌ ឍ ណ ត ថ ទ ធ ន ប ព ភ ម យ រ ល វ ស} or
a random consonant clusters of two or three consonants (See Consonant Clusters)

It is important to note that, unlike old Khmer, subscript never occurs after a dependent vowel in the present day Khmer. Here are some examples from Bernard (1902:40-42) and their corresponding words from Khmer Choun Nath Dictionary (1967).

Old Khmer Present Day Khmer Gloss
ចក្រី្យ ចក្រី /caʔ.krej/ ‘chakri’
ចំហា្យ ចំហាយ /cɑm.haaj/ ‘steam (n)’
ចំហុយ ចំហុយ /cɑm.hoj/ ‘steam (v)’

Bernard (ibid) mostly used subscript ្យ after dependent vowel. For instance, ចំហុយ (with consonant យ) ‘to steam’ was not written as ចំហុ្យ (with subscript ្យ).

Maspero (1915:38) explained in his book entitled “Grammar of Khmer Language” that the subscript ្យ after a dependent vowel is a semi-vowel. For examples:

ញី្យ ញី /ɲii/ ‘female’
ដី្យ ដី /ɗej/ ‘soil’
ជ្រៃ្យ ជ្រៃ /cej/ ‘fig tree’

He further gave more examples that Sanskrit loanwords with “aya” at the end are written with ៃ and a final subscript ្យ . For examples:

ជៃ្យ “jaya” ជ័យ /cej/ ‘victory’
ភៃ្យ “bhaya” ភ័យ /pʰej/ ‘fright’

Maspero (ibid:70,138) also described the used of subscript ្ង can also be placed after a dependent vowel, and it is equivalent to the final consonant. For example

ទាំ្ង = ទាង ទាំង /teaŋ/ ‘all’

5.2.4. Vowels

Vowels are divided into three types in this paper: Inherent Vowels, Dependent Vowels and Independent Vowels.

5.2.4.1. Inherent Vowels

Inherent vowels are invisible, so they do not have their orthographic representation. Two inherent vowels are used in the present day Khmer, consonants in the 1st series take /ɑɑ/, and those in the 2nd series take /ɔɔ/. Therefore each Khmer orthographic consonant should be pronounced as shown in the table below. The grey cells shows the modified consonants in correspondent with their counterparts. Explanation as to how they are modified, see Consonant Shifters.

1st Series 2nd Series
Orthographic Phonemic Orthographic Phonemic
/kɑɑ/ /kɔɔ/
/kʰɑɑ/ /kʰɔɔ/
ង៉ /ŋɑɑ/ /ŋɔɔ/
/cɑɑ/ /cɔɔ/
/cʰɑɑ/ /cʰɔɔ/
ញ៉ /ɲɑɑ/ /ɲɔɔ/
/ɗɑɑ/ /ɗɔɔ/
/tʰɑɑ/ /tʰɔɔ/
/nɑɑ/ /nɔɔ/
/tɑɑ/ /tɔɔ/
/tʰɑɑ/ /tʰɔɔ/
/ɓɑɑ/ /pɔɔ/
/pʰɑɑ/ /pʰɔɔ/
យ៉ /jɑɑ/ /jɔɔ/
រ៉ /rɑɑ/ /rɔɔ/
/lɑɑ/ /lɔɔ/
វ៉ /wɑɑ/ /wɔɔ/
/sɑɑ/ ស៊ /sɔɔ/
/hɑɑ/ ហ៊ /hɔɔ/
/ʔɑɑ/ អ៊ /ʔɔɔ/

Inherent vowels in Pali/Sanskrit loanwords are different from those of Khmer. They are /aʔ/ in 1st series and /eaʔ/ in the 2nd series. Therefore, ក /kɑɑ/ and គ /kɔɔ/ in Khmer are pronounced as /kaʔ/and /keaʔ/ respectively.

5.2.4.2. Dependent Vowels

There are two types of themː dependent vowels and independent vowels. The first type is always attached to an initial consonant, and the second one can start a syllable on its own. It is important to note that different vowels stand in various places around the base consonant. It can go to the left, right, above, below and around the base. Please see the examples below.

Position Example Character Combination IPA
left កេ ក េ /kee/
right កា ក ា /kaa/
above កិ ក ិ /keʔ/
below កុ ក ុ /koʔ/
around កៀ ក ៀ /kiə/

The table below shows the vowels as taught in school (Um & Seng 2012ː1).

Dependent Vowels
Grapheme Unicode 1st Series 2nd Series
(inherent) - ក /kɑɑ/ ‘neck’ គ /kɔɔ/ ‘to be mute’
U+17B6 តា /taa/ ‘grandfather’ ទា /tie/ ‘duck’
[8] U+17B7 ប៉ិត /pət/ ‘to cut obliquely’ ពិត /pɨt/ ‘to be true or real’
U+17B8 សី /səj/ ‘shuttlecock’ ស៊ី /sii/ ‘to eat (impolite)’
U+17B9 តឹក /tək/ ‘10 cm’ ទឹក /tɨk/ ‘water’
U+17BA កឺ /kəə/ ‘heart(s) (suit of ‘cards)’ គឺ /kɨɨ/ ‘to be’
U+17BB កុក /kok/ ‘egret’ គុក /kuk/ ‘prison’
U+17BC កូរ /koo/ ‘to stir’ គូរ /kuu/ ‘to draw’
U+17BD កួរ /kuə/ ‘pod’ គួរ /k/ ‘should’
U+17BE តើ /taə/ ‘question marker’ ទើរ /təə/ ‘to be held’
U+17BF តឿ /tɨə/ ‘dwarf’ ជឿ /cɨə/ ‘to believe’
U+17C0 តៀប /tp/ ‘k.o. bowl’ ទៀប /tp/ ‘custard apple’
U+17C1 កេង /keeŋ/ ‘to cheat or exploit’ គេង /keeŋ/ ‘to sleep (for kid)’
U+17C2 កែ /kae/ ‘to edit or adjust’ គែ /kɛɛ/ ‘craw (of birds)’
U+17C3 កៃ /kaj/ ‘trigger’ គៃ /kej/ ‘to be frugal’
U+17C4 កោរ /kao/ ‘to shave’ គោ /koo/ ‘cow’
U+17C5 តៅ /taw/ ‘unit of measurement’ ទៅ /tɨw/ ‘to go’
ុំ U+17BB U+17C6 កុំ /kom/ ‘(don’t)’ គុំ /kum/ ‘to hold a grudge’
U+17C6 ចំ /cɑm/ ‘to be exactly at’ ជំ /cum/ ‘to discuss’
ាំ U+17B6 U+17C6 ចាំ /cam/ ‘to remember’ ជាំ /coam/ ‘to bruise’
U+17C7 តះ /tah/ ‘hurriedly’ ទះ /teah/ ‘to slap’
ុះ U+17BB U+17C7 ចុះ /coh/ ‘to go down’ ជុះ /cuh/ ‘to defecate’
េះ U+17C1 U+17C7 ចេះ /ceh/ ‘to know’ ជេះ /ceh/ ‘to chip (of skin/bark)’
ោះ U+17C4 U+17C7 កោះ /kɑh/ ‘island’ គោះ /kuəh/ ‘to knock’

These vowels (i.e. ើ ឿ ៀ ែ ៃ ោ) are composed of more than one part of vowels with similar look, but they do not have the quality of the each part (Kin 2007:45). That is to say, when writing, we write two parts of each of them by lifting up the pen. For instance, ើ is made up of េ /ee/ and ី /əj|ii/, but the realization of ើ is /aə/, not /ee+əj/.

Vowels whose second member is ំ (a.k.a. Nikahit) or ះ (a.k.a. Reahmuk) always have /m/ or /h/ in the final position. Nikahit and Reahmuk act as a final /m/ and /n/ respectively. It marks the end of the syllable and cannot be followed by a vowel.

A sign behaves like a vowel, ៈ (a.k.a. Yuukaleapintu), yet it is not counted as a vowel. It has its 1st and 2nd series realizations, and it never takes a final consonant.

Grapheme Code Points 1st Series 2nd Series
U+17C8 សុតៈ /soʔ.t/ ‘listening (to)’ សុទ្ទៈ /sot.teaʔ/ ‘laborer’

Yuukaleapintu can also be used to combine with another vowel just like Nikahit and Reahmuk.

Grapheme Code Points 1st Series 2nd Series
ែៈ U+17C2 U+17C8 ហែៈ /h/ ‘(startle)’ អ្ហ៊ែៈ /ʔhɛʔ/ ‘(interjection)’

These are the vowels that do not exist in the official alphabetical order.

Grapheme Code Points 1st Series 2nd Series
ិះ U+17B7 U+17C7 តិះ /teh/ ‘to insult’ ជិះ /cih/ ‘to ride’
ឹះ U+17B9 U+17C7 ឆ្កឹះ /ckəh/ ‘to pluck out’ ឆ្ពឹះ /cpɨh/ ‘to be rough (of fabrics)’
ឺះ U+17BA U+17C7 ច្រឺះ /crəəh/ ‘to be tightly packed’ ព្រឺះ /prɨɨh/ ‘to be spirited’
ែះ U+17C2 U+17C7 ច្រែះ /craeh/ ‘rust’ ជ្រែះ /crɛɛh/ ‘to chop away’
ាំង U+17B6 U+17C6 U+1784 តាំង /t/ ‘from/since’ ទាំង /teaŋ/ ‘together with’
5.2.4.3. Independent Vowels

Independent vowels are able to start a syllable without an initial consonant or initial consonant cluster (Um & Seng 2012:3). They are usually used in Pali/Sanskrit loanwords. The corresponding phonemic transcription to each grapheme are in bold next to the example words. Independent vowel are rarely used in any new words in the contemporary Khmer language. Even in the 1990s, there were not many either. See how often they are used in the headwords in the Chuon Nath dictionary below.

Independent Vowels
Grapheme Code Points Examples Frequency
U+17A5 ឥដ្ឋ /ʔət/ ‘brick’ 75
ឥឡូវ /ʔəj.ləw/ ‘now’
U+17A6 ឦសាន /ʔəj.saan/ ‘north east’ 5
U+17A7 ឧកញ៉ា /ʔok.ɲaa/ ‘tycoon’ 242
ឧបមា /ʔuʔ.paʔ.maa/ ‘to be comparable to’
U+17A9 ឩរុ /ʔuu.ruʔ/ ‘chest (royalty)’ 7
U+17AA ឪ /ʔəw/ ‘father’ 13
U+17AB ឫក /k/ ‘behavior’ 38
U+17AC ឬ /rɨɨ/ ‘or’ 2
U+17AD រឭក /rɔ.k/ ‘to be alert’ 3
U+17AE ឮ /lɨɨ/ ‘to hear’ 8
U+17AF ឯក /ʔaek/ ‘one’ 28
U+17B0 ឰដ៏ /ʔaj.ɗɑɑ/ ‘LOC’ 7
U+17B1 ឱ្យ or ឲ្យ[9] /ʔaoj/ ‘to give’ 55
0
U+17B2
U+17B3 ក្រឳ /krɑ.ʔaw/ ‘edible root of the lotus’ 6

In the present day, there is no evidence of an instance where an independent vowel is used as a subscript in the contemporary Khmer, even though there are three instances of ឫ being used as a subscript in three headwords (i.e. សុហ្ឫទ, សៅហ្ឫទ, and ហ្ឫទ័យ ), plus more than 190 instances of Sanskrit transliteration in the Chuon Nath dictionary. Here are some examples:

Khmer Contemporary Spelling IPA Sanskrit Transliteration Gloss
ក្រិស (ក ្រ ិ ស) /krəh/ ក្ឫឝ (ក ្ឫ ឝ) ‘small, dwarf’
គ្រឹះ (គ ្រ ឹ ះ) /krɨh/ គ្ឫហ (គ ្ឫ ហ) ‘house’
ព្រឹក្ស (ព ្រ ឹ ក ្ស) /prɨʔ.saa/ វ្ឬក្ស (វ ្ឫ ក ្ស) ‘tree’
មុសា (ម ុ ស ា) /muʔ.saa/ ម្ឫឞា (ម ្ឫ ឞា) ‘to be false’
វុឌ្ឍិ (វ ុ ឌ ្ឍ ិ) /wut.tʰiʔ/ វ្ឫទ្ធិ (វ ្ឫ ទ ្ធ ិ) ‘prosperity’
សតិ (ស ត ិ) /saʔ.teʔ/ ស្ម្ឫតិ (ស ្ម ្ឫ ត ិ) ‘consciousness’
សំស្ក្រឹត (ស ំ ស ្ក ្រ ឹ ត) /saŋ.skrət/ សំស្ក្ឫត (ស ំ ស ្ក ្ឨ ត) ‘Sanskrit’
ស្រឹង្គារ (ស ្រ ឹ ង ្គ ា រ) /srəŋ.kie/ ឝ្ឫង្គារ (ឝ ្ឫ ង ្គ ា រ) ‘love, lover’
ឧក្រិដ្ឋ (ឧ ក ្រ ិ ដ ្ឋ) /ʔuʔ.krət/ ឧត្ក្ឫឞ្ដ (ឧ ត ្ឫ ឞ ្ដ) ‘seriously, criminally’

5.3. Diacritics (signs)

Eight diacritical signs are currently used in the present-day Khmer. They are Muusikatoan ( ​៉), Triisap ( ​៊), Bantoc (​់), Robat ( ​៌), Kakabat (​៎), Ahsda (​៏), Samyok Sannya (​័) and Toandakhiat (​៍). These play different roles in the Khmer spelling conventions. The following sections describe the usage of each of these. The following table shows the frequency of each diacritic as occurred in the headwords of the Chuon Nath Dictionary. Usually, there is no occurrence of two diacritics in a row, except a combination of Consonant Shifter and a Samyok Sannya. (See Special Treatment of Consonant Shifters)

Diacritic Name Diacritic Character Code Points Frequency in KKD
Muusikatoan U+17C9 722
Triisap U+17CA 215
Bantoc U+17CB 1801
Robat U+17CC 83
Kakabat U+17CE 15
Ahsda U+17CF 10
Samyok Sannya U+17D0 384
Toandakhiat U+17CD 248

5.3.1. Consonant Shifters

Consonant Shifters are also known as “Series Shifters” or “Register Shifters”. As the name suggests, Consonant Shifters change the series of the consonant from the 1st to the 2nd or vice versa. This is done in order that the Khmer consonant chart is complete and each consonant has their corresponding counterpart as you can See Inherent Vowels above.

Consonant Shifters can also be used to change the series of consonant clusters. It is usually placed after the cluster, for it changes the series of the cluster, not either of the first nor the second alone (See Consonant Shifters with Consonant Clusters).

5.3.1.1. Muusikatoan ( ​៉)

Muusikatoan ( ​៉), a.k.a. ធ្មេញ​កណ្ដុរ ‘Thmenh Kandol’ or សម្លាប់​ពីរ ‘Samlab Pii’, is used to change the series of the consonants -- from the 2nd series to 1st series. Not any consonant can be used with Muusikatoan. It is only applicable with the ones that does not have their counterpart in the 1st series.

ង /ŋɔɔ/ + ​៉ > ង៉ /ŋɑɑ/
ញ /ɲɔɔ/ + ​៉ > ញ៉ /ɲɑɑ/
ម /mɔɔ/ + ​៉ > ម៉ /mɑɑ/
យ /jɔɔ/ + ​៉ > យ៉ /jɑɑ/
រ /rɔɔ/ + ​៉ > រ៉ /rɑɑ/
វ /wɔɔ/ + ​៉ > វ៉ /wɑɑ/
ប /ɓɑɑ/ + ​៉ > ប៉ /pɑɑ/
5.3.1.2. Triisap ( ​៊)

Triisap ( ​៊), a.k.a. សក់ក ‘Sakka’, is used to change the series of the consonants -- from the 1st series to the 2nd series.

ប /ɓɑɑ/ + ​៊ > ប៊ /ɓɔɔ/
ស /sɑɑ/ + ​៊ > ស៊ /sɔɔ/
ហ /hɑɑ/ + ​៊ > ហ៊ /hɔɔ/
អ /ʔɑɑ/ + ​៊ > អ៊ /ʔɔɔ/
5.3.1.3. Special Case of ប

ប is the only consonant that either Muusikatoan or Triisap can be attached to, and Muusikatoan does not change the series of it, it changes the consonant instead; On the other hand, Triisap does changes the series of ប from the first to the second series.

/ɓɑɑ/
ប ៊ ប៊ /ɓɔɔ/
ប ៉ ប៉ /pɑɑ/

Another special case for ប is that even though its consonant quality is /ɓ/, it would be realized as /p/ when it precedes subscript ្រ in a cluster, i.e. ប្រៃ ‘salty’ is pronounced as /praj/ not /ɓraj/.

5.3.1.4. Special Treatment of Consonant Shifters

According to the spelling convention (Khin 2017:44), Muusikatoan or Triisap has to be turned into a glype that looks like ុ when there is a vowel on top of the base consonant (a.k.a. Above Vowels). Khin (ibid) does not give a list of these vowels, but other scholars do include them. The ុ is not U+17BB, but it is a rendering version of the consonant shifter. Um & Seng (2012:52) and Kul (2008:28-31) state that ិ (U+17B7), ី (U+17B8), ឹ (U+17B9), ឺ (U+17BA), ើ (U+17BE) and ាំ (U+17B6 U+17C6) are the Above Vowels that fall into this special case.

Kul (ibid) and Nuon (1954:វសហ) describes a few exceptional cases of when Shifters should or should not turn into the character that looks like ុ.

  • One caveat is that when Triisap is attached to ប and they followed by one of the Above Vowels mentioned above, Triisap does not change its orthographic realization to ុ. It should stay the same. If changing, it would get confused with Muusikatoan.
ប៊ិក ប ៊ ិ ក /ɓɨc/ ‘pen’
ប៊ិន ប ៊ ិ ន /ɓɨn/ ‘cooler box’
ប៉ិន ប ៉ ិ ន /pən/ ‘to be keen on something’
  • Triisap may stay the same when attached to អ even though there is an above vowel after it.
អ៊‌ឹម[10] អ ៊ ZWNJ ឹ ម /ʔɨm/ ‘to be breastfed’
អ៊ំ អ ៊ ំ /ʔum/ ‘uncle/aunty’
  • Triisap may stay the same or be changed when there is an above vowel attached to it. This is only with ហ (U+17A0).
ហ៊ឹម ហ ៊ ឹ ម /hɨm/ ‘(sigh)’

ហ៊‌ឹម ហ ៊ ZWNJ ឹ ម (The ZWNJ is placed in between the Triisap and the vowel in order to prevent the default rendering from happening.)

  • For the above two cases of Triisap ៊ being used with អ and ហ, Um & Seng (2012:52) rules out that the Triisap has to change its glype to look like ុ, i.e. ហ៊ីង, អ៊ីកអ៊ាក។
  • Samyok Sannya (​័) is seen to influence the orthographic realization of the consonant shifter in the same way the above vowels do. However, no reference has been found to describe this phenomenon.
ប៉័ង ប ៉ ័ ង /paŋ/ ‘bread’
ប៊័រ ប ៊ ័ រ /ɓəə/ ‘butter’
  • Disyllabic words should never take a consonant shifter in the second syllable. The series assimilation does the job of changing the series of the second syllable​ which results in changing the vowel of the syllable (Sok 2016:28-30).
ច្រងាង /crɑ.ŋaaŋ/ ‘scatter in the way’
not ច្រង៉ាង
ព្រហើន /prɔ.həən/ ‘to be insolent’
not ព្រហ៊ើន nor ព្រហ៊‌ើន
  • When the Triisap is followed by this vowel (​​ុំ), one has to change the second part of the vowel (i.e. ំ) to ម (Kul ibid:32).

    • ស៊ុំ should be written as ស៊ុម
    • ហ៊ុំ should be written as ហ៊ុម

    However, both spellings are found in everyday use. It may be because this rule has been overlooked.

5.3.1.5. Consonant Shifters with Consonant Clusters

Consonant Shifters can also be used with consonant clusters when the desired vowel is not the default one that determine by the least sonorous consonant or the cluster composed of consonants from the same series. The consonant shifter is usually placed in between the subscript and the vowel. To some, it should go after the base consonant, but in fact the shifter does not merely change the series of one of the two members. It changes the series of the cluster as a whole.

  • ស្រ៊ឹប ស ្រ ៊ ឹ ប /srɨp/ ‘sound of a heavy, solid object falling, thud’
    as opposed to ស្រឹប /srəp/
  • ស្អ៊ុយ ស ្អ ៊ ុ យ /sʔuj/ ‘to be fat (of the belly)’
    as opposed to ស្អុយ /sʔoj/
  • ម្ង៉ៃ ម ្ង ៉ ៃ /mŋaj/ ‘one day’
    as opposed to ម្ងៃ /mŋej/
  • ស្ប៉ឹម ស ្ប ៉ ឹ ម /spəm/ ‘to be stuck at the mouth or entrance’
    as opposed to ស្បឹម /sɓəm/

For list of all possible initial consonant clusters with their corresponding counterparts, please see Appendix E.

5.3.2. Bantoc (​់)

Sok (2016ː66-68) called this phenomenon “Vowel Modification”. Here is a quote of the accountː

Two vowels could be modified by adding a diacritic to it. The diacritic used is called Bantoc "់". The Bantoc is used to shorten as well as change the vowel quality completely. The two vowels are:

  • Inherent vowels in the first series and the second series
  • ា in the first series and the second series

Bantoc is usually placed on the final consonant, and only certain orthographic consonants could have the Bantoc. They could be one of these nine final consonants: -ក /k/, -ង /ŋ/, -ច /c/, -ញ /ɲ/, -ត /t/, -ន​ /n/, ​-ល​ /l/, -ស /s/, or -ប /ɓ/. [...]

No. Syllable Structure 1st Series Syllable Structure 2nd Series
a b
1 CF់
C៉F់
/ɑɑ/ ​ is shorted to /ɑ/
Where
F = -ក/-ង/-ច/-ញ/ -ត/-ន/-ល/-ស/-ប
CF់
C៊F់
/ɔɔ/ is changed to //
Where F = -ក/-ស /ɔɔ/ is changed to /u/
(else where)
F = -ង/-ច/-ញ/-ត/ -ន/-ល/-ប
C្CF់
C្C៉F់
C្CF់
C្C៊F់
2 CាF់
C៉ាF់
/aa/ is shorten to /a/
Where
F = -ក/-ង/-ច/-ញ/ -ត/-ន/-ល/-ស/-ប
CាF់
C៊ាF់
/ie/ is changed to /ea/
Where
F = -ក/-ង/-ច/-ញ
/ie/ is changed to /oa/
Where
F = -ត/-ន/-ល/-ស/-ប
C្CាF់
C្C៉ាF់
C្CាF់
C្C៊ាF់

1a. The first series inherent vowel /ɑɑ/ is shortened to /ɑ/ when the Bantoc is placed on the final consonant. For example, កក 'to be harden/frozen' is transcribed as /kɑɑk/, while កក់ 'to wash (hair)' is transcribed as /kɑk/.

កក ‘to be harden/frozen’
character sequence
character mapping k ɑɑ|ɔɔ k
series of character 1 1|2
series of syllable 1
phonemic transcription /kɑɑk/
កក់ ‘to wash (hair)’
character sequence
character mapping k ɑɑ|ɔɔ k
series of character 1 1|2
series of syllable 1
phonemic transcription /kɑk/

1b. The second series inherent vowel /ɔɔ/ is changed to /uə/ when the Bantoc is placed on one of the these two final consonants (-ក /k/ or -ស /s/), and it is changed to /u/ when the Bantoc is placed on any other final consonants. Therefore, for instance, គក់ 'to wash (clothes)' is transcribed as /kuək/ (See the illustration below)

គក់ ‘to wash (clothes)’
character sequence
character mapping k ɑɑ|ɔɔ k
series of character 2 1|2
series of syllable 2
phonemic transcription /kuək/

2a. The first series vowel ា /aa/ is shorten to /a/ when the Bantoc is placed on the final consonant. Thus, as illustrate below, កាក់ 'coin' is transcribed as /kak/.

កាក់ ‘coin’
character sequence
character mapping k aa|ie k
series of character 1 1|2
series of syllable 1
phonemic transcription /kak/

2b. The second series vowel ា /ie/ is changed to /ea/ when the Bantoc is placed on -ក /-k/, -ង /-ŋ/, -ច /-c/, or -ញ /-ɲ/ as in ទាក់ 'to trap' which is transcribed phonemically as /teak/ (see the illustration of ទាក់ below); and it is changed to /oa/ when the Bantoc is placed on any other final consonant as in គាត់ '3S' phonetically transcribed as /koat/ (see the illustration of គាត់ below).

ទាក់ ‘to trap’
character sequence
character mapping t aa|ie k
series of character 2 1|2
series of syllable 2
phonemic transcription /teak/
គាត់ ‘3S’
character sequence
character mapping k aa|ie t
series of character 2 1|2
series of syllable 2
phonemic transcription /koat/

Bantoc never occurs after a consonant cluster.

Bantoc is never used in Pali/Sanskrit loanwords.

* កុហក /koʔ.hɑk/ ‘to tell a lie’
not កុហក់
* សុគត /soʔ.kut/ ‘to die (royal register)’

Note: The combination of ា ង ់ is not allowed in Khmer spelling convention (Kol 2008:33). In cases uəlike that, ាំ ង is used instead.

* តាំង /taŋ/ ‘to exhibit’
not តាង់
* ទាំង /teaŋ/ ‘both, all’
not ទាង់

5.3.3. Robat ( ​៌)

According to the Khmer-English dictionary, Robat ( ​៌), which literally means ‘subscript រ (្រ)’; however, it looks like the top part of the consonant (រ), is a diacritical mark which indicates an orthographic ‘r’ in some words borrowed from Sanskrit. It is never used in Pali loanwords nor Khmer words. It is used in the same way as Repha in Sanskrit, but it is silent in Khmer (i.e. It does not have any audible pronunciation). It also silents the final consonant which has no vowel after it, but when a vowel is attached to the consonant, the consonant should be pronounced as normal.

Robat is always put on the consonant, but never on a subscript. It may be followed by a vowel. It never occurs after a consonant cluster. No sign or vowel should be put above it.

គភ៌	/koa/		‘pregnancy’ 
ធម៌	/tʰoa/		‘Dharma’
មាគ៌ា	/mie.kie/	‘road, way’

5.3.4. Kakabat (​៎)

There is no reference on where Kakabat (​៎) should be placed. In a manual on how to type Khmer Unicode characters, it is said to be placed after a consonant, a subscript or a dependent vowel. Words with Kakabat have to be pronounced with loud voice and the speaker may use high or low pitch (Khin 2007ː72).

Here are some examples of words (from the Choun Nath Dictionary) written with Kakabat.

Khmer IPA Character Gloss
កូ៎ក ! /kook/ ក ូ ៎ ក ! ‘sound made to catch others’ attention’
កូ៎កៗ /kook.kook/ ក ូ ៎ ក ៗ
ណា៎ !
ណ៎ះ[11]
ហ្ន៎ះ !
/naa/
/nah/
/nah/
ណ ា ៎ !
ណ ៎ ះ !
ហ ្ន ៎ ះ !
‘(affirmation)’
អឺ៎ះ ! /ʔəəh/ អ ឺ ៎ ះ ! ‘(exclamation denoting surprise)’
អុ៎ក /ʔok/ អ ុ ៎ ក ‘to blame’
អោ៎ ! /ʔao/ អ ោ ៎ ! ‘(exclamation of distress)’
អ្ហ៊‌ឺ៎ះ ! /ʔhɨh/ អ ្ហ ៊ ZWNJ ‍ ឺ ៎ ះ ‘(exclamation of when having a burden)’
ឆា៎ /cʰaa/ ឆ ា ៎ ‘(a word used to refrain in a song)’

5.3.5. Ahsda (​៏)

Ahsda (​៏) is restrictedly used on five consonants: ក ដ ន ម ហ. They are pronounced with their respective inherent vowel. It is used to disambiguate the single character words

Khmer IPA Gloss
ក៏ /kɑɑ/ ‘also’
ដ៏ /ɗɑɑ/ ‘so, very’
ន៏ /nɑɑ/ ‘(usually used with នុ៎ះ)’
ម៏ /mɔɔ/ ‘to come (colloquial)’
ហ៏ /hɑɑ/ ‘(say this when passing something to someone)’

Ahsda never occurs after a consonant cluster.

There is only one occurence of when Ahsda is placed after a dependent vowelː ទៅ៏ ! /tɨw/ ‘go’. In the present day Khmer, Ahsda has never seen being used after any vowel, but if it does, it should still be rendered above the consonant. If there is an above vowel, it should be render right above the vowel above the consonant.

5.3.6. Samyok Sannya (​័)

Samyok Sannya (​័) is usually placed after a consonant, a subscript or after a consonant shifter. \

Khmer Character IPA Gloss
ជ័រ ជ ័ រ /coa/ ‘rubber’
ប៉័ង ប ៉ ័ ង /paŋ/ ‘bread’
ត្រ័យ ត ្រ ័ យ /traj/ ‘threefold’

Samyok Sannya is equivalent with vowel /a/ when the consonant preceding it is in the 1st series, but it can realize as /ea/, /e/ or /oa/ in other environments.

ខ័ន an sword (royalty) ព័រ poa Pear ethnic group
ល័ខ leak artificial dye
ន័យ nej meaning or sense
័ង ប៉័ង p bread ទីន័ង tii.neaŋ seat (royalty)

Samyok Sannya (​័) in the present-day Khmer has another phonetic property, it is [əə]. Most words borrowed from English with the rhotacization at the end usually are transliterated into a consonant with Samyok.

Khmer Character IPA Gloss
ម៉ាស្ទ័រ ម ៉ ា ស ្ទ ័ រ /maa.stəə/ ‘master’
កុំព្យូទ័រ ក ុំ ព ្យ ូ ទ ័ រ /kom.pjuu.təə/ ‘computer’
កុងទ័រ ក ុ ង ទ ័ រ /koŋ.təə/ ‘counter’
គូល័រ គ ូ ល ័ រ /kuu.ləə/ ‘color’

5.3.7. Toandakhiat (​៍)

Toandakhiat is used to silence the character, which usually is a consonant) it is placed on. It is found to be used after a subscript once in the Choun Nath dictionary (i.e. អ្នកសិល្ប៍). It is also found after certain vowels with a few instances of when it preceded by a subscript. It gets ambiguous as to which character is silenced when Toandakhiat is placed after a consonant cluster and/or vowel.

Examples of words with Toandakhiat placed:

  • after a subscript and a vowel:
Khmer Character IPA Gloss
កិរ្តិ៍
កេរ្តិ៍
ក ិ រ ្ត ិ ៍
ក េ រ ្ត ិ ៍
/kee/ ‘reputation’
ប៉ុស្តិ៍ ប ៉ ុ ស ្ត ិ ៍ /poh/ ‘post office’
ប្រសិទ្ធិ៍ ប ្រ ស ិ ទ ្ធ ិ ៍ /prɑ.sət/ ‘a place name’
រាមកិរ្តិ៍ រ ា ម ក ិ រ ្ត ិ ៍ /riem.kee/ ‘Ramayanak’
សួស្តិ៍ ស ួ ស ្ត ិ ៍ /suah/ ‘glory’
  • after a consonant and vowel ិ
Khmer Character IPA Gloss
កឡោបិ៍ ក ឡ ោ ប ិ ៍ /kaʔ.laop/ ‘a kind of basket’
តែនតិ៍ ត ែ ន ត ិ ៍ /taen/ ‘tent’
ទីបេតិ៍ ទ ី ប េ ត ិ ៍ /tii.ɓee/ ‘Tibet mount’
នាំអាទិ៍ ន ាំ អ ា ទ ិ ៍ /noam.ʔaat/ ‘to initiate’
ពោធិ៍ ព ោ ធ ិ ៍ /poo/ ‘banyan’
ពោធិ៍ធំ ព ោ ធ ិ ៍ ធ ំ /poo.tʰum/ ‘a kind of plant’
ពោធិ៍ធ្លេ ព ោ ធ ិ ៍ ធ ្ល េ /poo.tlee/ ‘a kind of plant’
ពោធិ៍បាយ ព ោ ធ ិ ៍ ប ា យ /poo.ɓaaj/ ‘a kind of banyan tree’
ពោធិ៍សាត់ ព ោ ធ ិ ៍ ស ា ត ់ /poo.sat/ ‘Pursat province’
ព្យាធិ៍ ព ្យ ា ធ ិ ៍ /pjie/ ‘leprosy’
ព្រហស្បតិ៍ ព ្រ ហ ស ្ប ត ិ ៍ /prɔ.hoah/ ‘Thursday’
វាទអាទិ៍ វ ា ទ អ ា ទ ិ ៍ /wiet.ʔaat/ ‘to try to gain power’
សកវាទិ៍ ស ក វ ា ទ ិ ៍ /sak.kak.waa/ ‘’
ស្មាធិ៍ ស ្ម ា ធ ិ ៍ /smaat/ ‘meditate’
ស្លឹកពោធិ៍ ស ្ល ឹ ក ព ោ ធ ិ ៍ /slək.poo/ ‘banyan leaf’
  • after ុ
Khmer Character IPA Gloss
រាហុ៍ រ ា ហ ុ ៍ /rie/ ‘name of a giant’
ចារុ៍ ច ា រ ុ ៍ /caa/ ‘small tubes made of gold’

It should be rendered right on the consonant. It has never seen being used with any other vowel besides the ones mentioned above.

5.4. Obsolete Characters

Some signs are no longer used in the present-day Khmer texts. They have been known to be used in Pali/Sanskrit loanwords (see sections on Pali, Sanskrit and Inscriptions).

  • Bathamasat (​៓)
  • Atthacan (​៝)
  • Viriam (​៑)
  • Avakrahasanya (ៜ)
  • Obsolete consonants (ឝ and ឞ)
  • Obsolete independent vowel (ឨ)

Here is a list of them with their code point.

Code Point Orthographic Character Name
U+17D3 KHMER SIGN BATHAMASAT
U+17DD KHMER SIGN ATTHACAN
U+17D1 KHMER SIGN VIRIAM
U+17DC KHMER SIGN AVAKRAHASANYA
U+179D KHMER LETTER SHA
U+179E KHMER LETTER SSO
U+17A8 KHMER INDEPENDENT VOWEL QUK

5.5. Punctuation

Two types of punctuation are used in Khmer: (1) native punctuation and (2) foreign punctuation. According to Khin (2007:87-90), 6 native punctuations were created by Khmer people and they are used in ancient and modern texts; 10 foreign punctuations were borrowed from French to be used in writing and printing. Prom (2006:55-69) lists more than that.

5.5.1. Native Punctuation

Name in Khmer Name in English Sign Usage
ភ្នែក​មាន់ Phnaek Muan - mark the beginning of a poem
គោមូត្រ Koomuut - mark the end of a poem
ល្បះ ឬ ខណ្ឌ Lbah or Khan - placed at the end of a sentence or a section of a poem
- used in lunar dates
ល្បះចប់ ឬ ល្បះ​បរិយោសាន Lbah Chob or Lbah Pareyaosan ។៚ - Khin (2007:88) wrote it was used in the old days to mark the end of an article or a book, which is different from Prom (20006:55) who presented ៕ with the same name.
ដកឃ្លា White Space - mark a pause in the speech, not to separate words (Open Forum of Cambodia 2004:15).
- used in the same way as a comma and a semicolon in French.
បេយ្យាលៈ ឬ ឡាក់ Beyyal or Lak ។ល។ or ។ប។ - used at the end of a list to denote that the list goes on
ចំណុច​ពីរ​គូស
ឬ ដំពីរ
Camnuc Pii Kuuh or Dompii - placed before a list or a direct quote in the same way that French and English uses colon (:). Colon is used in place of Comnuc Pii Kuuh for Comnuc Pii Kuuh is hard to find and type in the computer (Prom 2006:58).

5.5.2. Foreign Punctuation

Name in Khmer Name in English Sign Usage
រជ្ជុសញ្ញា hyphen - - begin a list
- placed after a number list
- begin a dialogue exchange
- denote the continuation of a word at the end of a line
- denote “to” or “till”
- connect related two items
- syllable break
- denote omitted syllable
វង់ក្រចក brackets () - denote extra information
ចុច​មួយ point . - place after abbreviation
ចុច​ពីរ colon : - used in place of Chomnuc Pii Kuuh
ចំណុច​រាយ / ពងត្រី ellipsis ... - denote “etc”
- denote unfinished phrase
- denote a section of quote that is out of focus (usually place in side square brackets)
ក្បៀស comma , - separate items with spaces in between
ចុចក្បៀស semicolon ; In Chuon Nath dictionary, it is used
- to denote the list has not ended yet
- placed before ប៉ុន្តែ ‘but’
- separate two or more related sentences
សញ្ញាសួរ question mark ? - end interrogative sentence
សញ្ញាឧទាន exclamation mark ! - end an interjection
អព្ភន្តរ double quotes “” - mark a quote or direct quote
- encircle a title or name of a book or news article
- focus point
ឃ្នាប or តង្កៀប square brackets [] - used in place of the normal brackets when there is brackets-in-brackets to avoid confusion
- used for phonetic transcription
សញ្ញា “និង” ampersand & - used to emphasize the items on its left and right
បន្ទាត់​ទ្រេត forward slash / - separate numbers
- used in place of ឬ ‘or’
ផ្កាយ asterisk * - denote errors in a phrase
រ៉ាត់ឃ្នាប curly brackets {} - used in syntax to wrap around certain things
សញ្ញាស្មើ equal; that is = - denote ‘that means’ or ‘equal to’
សញ្ញាធំជាង greater than; to > - denote greater than
- denote what is on its left becomes what is on its right
សញ្ញាតូចជាង smaller than;
from
< - denote less than
- denote what is on its left is derived from what is on its right
សញ្ញាផ្ទុយ opposite the word on its left and its right are opposite

5.6. Numerals

The table below shows Khmer numerals in the first row, Unicode code points in the second, and their gloss in the third row.

U+17E0 U+17E1 U+17E2 U+17E3 U+17#4 U+17E5 U+17E6 U+17E7 U+17E8 U+17E9
‘zero’ ‘one’ ‘two’ ‘three’ ‘four’ ‘five’ ‘six’ ‘seven’ ‘eight’ ‘nine’

Note that Thai and Lao numerals are derived from Khmer’s which is why they are very similar.

Numerals can never be used as a base of an orthographic syllable. It does not have a consonant quality, thus it is impossible to pronounce it when if it ever gets to combine with a vowel.

5.7. Divination Lore (a.k.a. លេខ​អត្ត “Lek Attak”)

Divination lore (a.k.a. លេខ​អត្ត “Lek Attak”) is used in Old Khmer to calculate and foretell the future of an event, but not in the present-day Khmer. The table below shows the divination lore in the first row and their Unicode code points.

U+17F0 U+17F1 U+17F2 U+17F3 U+17F4 U+17F5 U+17F6 U+17F7 U+17F8 U+17F9
‘zero’ ‘one’ ‘two’ ‘three’ ‘four’ ‘five’ ‘six’ ‘seven’ ‘eight’ ‘nine’

5.8. Lunar Dates

Khmer traditional date system follows the moon’s phase. It is divided into two main categories: ខ្នើត “waxing moon” and រនោច “waning moon”. The proper way of writing the date is using the Khmer numerals with a Khmer punctuation character “។”. Khin (2007:87-88) illustrates that the number to the left of the Khan (។) denotes days of the week (i.e. ១ for Sunday, ២ for Monday and so forth), to the right of it is the number value of Khmer months as given in Khmer Months of the Year, and the number that goes above or underneath the sign corresponding with the Phase of the Moon (i.e. if number if above the sign, it denotes “waxing moon”; underneath, “waning moon”). For instance, ១᧺៥ is read as “Sunday, 10th day of the waning moon,

5.8.1. Phase of the Moon

5.8.1.1. ខ្នើត “Waxing Moon”

Waxing moon, a.k.a. ខ្នើត /knaət/ in Khmer, is the first 15 days of the lunar month.

  • ᧠ U+19E0 /paʔ.tʰaa.mie.saat/ ‘the first Ashadha (eighth month of the lunar calendar)’
  • ᧡ U+19E1 /muəj.koət/ ‘the first day’
  • ᧢ U+19E2 /pii.koət/ ‘the second day’
  • ᧣ U+19E3 /ɓəj.koət/ ‘the third day’
  • ᧤ U+19E4 /ɓuən.koət/ ‘the fourth day’
  • ᧥ U+19E5 /pram.koət/ ‘the fifth day’
  • ᧦ U+19E6 /pram.muəj.koət/ ‘the sixth day’
  • ᧧ U+19E7 /pram.pii.koət/ ‘the seventh day’
  • ᧨ U+19E8 /pram.ɓəj.koət/ ‘the eighth day’
  • ᧩ U+19E9 /pram.ɓuən.koət/ ‘the ninth day’
  • ᧪ U+19EA /ɗɑp.koət/ ‘the tenth day’
  • ᧫ U+19EB /ɗɑp.muəj.koət/ ‘the eleventh day’
  • ᧬ U+19EC /ɗɑp.pii.koət/ ‘the twelfth day’
  • ᧭ U+19ED /ɗɑp.ɓəj.koət/ ‘the thirteenth day’
  • ᧮ U+19EE /ɗɑp.ɓuən.koət/ ‘the fourteenth day’
  • ᧯ U+19EF /ɗɑp.pram.koət/ ‘the fifteenth day’
5.8.1.2. រនោច “Waning Moon”

Waning moon, a.k.a. រនោច /rɔ.nooc/ in Khmer, is a period of 15 days in the lunar calendar counting from the day after the full moon back to the new moon (i.e. complete darkness).

  • ᧰ U+19F0 /tuʔ.tiʔ.jeak.saat/ ‘the second Ashadha during the Adhikameas leap year’
  • ᧱ U+19F1 /muəj.rooc/ ‘the first day’
  • ᧲ U+19F2 /pii.rooc/ ‘the second day’
  • ᧳ U+19F3 /ɓəj.rooc/ ‘the third day’
  • ᧴ U+19F4 /ɓuən.rooc/ ‘the fourth day’
  • ᧵ U+19F5 /pram.rooc/ ‘the fifth day’
  • ᧶ U+19F6 /pram.muəj.rooc/ ‘the sixth day’
  • ᧷ U+19F7 /pram.pii.rooc/ ‘the seventh day’
  • ᧸ U+19F8 /pram.ɓəj.rooc/ ‘the eighth day’
  • ᧹ U+19F9 /pram.ɓuən.rooc/ ‘the ninth day’
  • ᧺ U+19FA /ɗɑp.rooc/ ‘the tenth day’
  • ᧻ U+19FB /ɗɑp.muəj.rooc/ ‘the eleventh day’
  • ᧼ U+19FC /ɗɑp.pii.rooc/ ‘the twelfth day’
  • ᧽ U+19FD /ɗɑp.ɓəj.rooc/ ‘the thirteenth day’
  • ᧾ U+19FE /ɗɑp.ɓuən.rooc/ ‘the fourteenth day’
  • ᧿ U+19FF /ɗɑp.pram.rooc/ ‘the fifteenth day’

5.8.2. Khmer Months of the Year

Khmer months of the year is used in fortune telling and they have their own numeric value which is also used in Lunar date described in a section on Lunar dates.

Khmer Name Sankrit Name[12] English Transliteration Numeric Value
មិគសិរ Mārgaśīrṣa (मार्गशीर्ष) Mikeaser
បុស្ស Pauṣa (पौष) Boss
មាឃ Māgha (माघ) Meakh
ផល្គុន Phālguna (फाल्गुन) Phalkun
ចេត្រ Caitra (चैत्र) Chetr
ពិសាខ Vaiśākha (वैशाख) Pisakh
ជេស្ឋ Jyeṣṭha (ज्येष्ठ) Chesth
អាសាឍ Ashadha (आषाढ) Asath
ស្រាពណ៍ Śrāvaṇa (श्रावण) Srapn
ភទ្របទ Bhadrapada (भाद्रपद) Phutrabot ១០
អស្សុជ Āśvina (अश्विन) Assoch ១១
កក្ដិក Kārtika (कार्तिक) Kakdek ១២

6. Ligatures

6.1. Consonant and Vowel Combination

Khin (2007:41) illustrates the anatomy of Khmer consonants and how Khmer characters should be written. There are three levels to consider: (1) the top level is reserved for a vowel, Consonant Shifters and/or a diacritic, (2) the middle level is for consonants and (3) the bottom level is for subscripts and/or a vowel (see the image below). In level (1), if a vowel and a diacritic occur together, the diacritic stay above the vowel (i.e. អឺ៎), but if the diacritic is a Triisap, the vowel should be placed above the Triisap (i.e. អ៊‌ីត). In level (3), if a subscript is used with a Below Vowel[13], the Below vowel should be placed right below the subscript (i.e. ស្ដុក).

Anatomy of Khmer consonants

However, given that the space is limited when below subscript and below vowel stacked together. Two solutions have been done: (a) put the below subscript and below vowel sideways and (b) shrink the base consonant to accommodate the subscript allowing the below vowel to stay above its default position.

The below subscript and below vowel stacked together

(a)

The below subscript and below vowel side-by-side

(b)

The base consonant shrunk to accommodate the subscript

A consonant consists of two parts: the body and the head or usually called hair. For instance, ក is composed of an upside down U shape and a wavy line above it. The upside down U shape is the body and the wavy line is the hair. When ក stands on its own, the wavy line presides on the body fully; however, when a vowel[14] that looks like ា is attached to it. The wavy line goes beyond the body and gets merged with the vowel as illustrated in (b). (a) is not recommended as it goes against the consonant and vowel composition convention.

(a)

កា without the consonant and vowel merged

(b)

កា with the consonant and vowel merged

A proof of how Khmer characters should be presented when combining a consonant with a wavy line and a ា looking vowel exists in Khmer manuscript written on palm leaves (see image below). The word highlighted is ភាសា (ភ ា ស ា) and you may notice the difference in how the wavy line looks when the consonant stands alone and when it is attached to ា.

Khmer manuscript on palm leaves

NOTE: When ប is combined with ា ោ or ៅ , the shape of the consonant change to បា បោ and បៅ respectively (see the image below).

ប with ា ោ or ៅ

6.2. ញ with a Subscript

ញ has a little curvy line at the bottom level. When a subscript is placed underneath it, that curvy line should disappear and be replaced by the expected subscript.

This is the character when it stands alone.

ញ

This is the character with ្ច attached to it.

ញ with ្ច

If the character (ញ) needs to be combined with its own subscript (្ញ), the combination should look like this:

ញ and its own subscript

6.3. ិ and ៍ Combination

When vowel ិ [U+17B7] is followed by diacritic ៍ [U+17CD], they get connected (ិ៍).

6.4. ក, គ, ត, ភ, វ and ិ, ី, ឹ, ឺ

In some font, Khmer OS Muol for example, the combination of consonants ក (U+1780), គ (U+1782), ត (U+178F), ភ (U+1797) and វ (U+179C) and above vowels (i.e. ិ (), ី (), ឹ (), ឺ ()) becomes a glyph whose top line (i.e. the hair, the wavy line) got straighten and be in a shape similar to the vowels.

ក (U+1780) and above vowels

គ (U+1782) and above vowels

ត (U+178F) and above vowels

ភ (U+1797) and above vowels

វ (U+179C) and above vowels

7. Unicode Encoding

7.1. Overview

There are discrepancies in character sequences posited by Unicode Standard (2018:646), OpenType (2018) and the Open Forum (2004:7-14). Only the conflicting characters are in the table below.

Unicode Standard OpenType Open Forum
RegShift after the base consonant between a consonant and a subscript between a subscript and a vowel
Robat before or after a subscript considered as an Above Sign which goes after an Above vowel in between the base consonant and a subscript
Subscript at the end n/a n/a
IndV can be the base can’t be the base can be the base
Nikahit sign sign vowel/sign?
Reahmuk sign sign vowel/sign?

Unlike Unicode Standard and Open Forum, the OpenType breaks vowels into four subcategories:

(1) Pre-base Vowels (PreV): េ​ [U+17C1], ែ [U+17C2], ​ៃ [U+17C3],

(2) Below-base Vowel (BlwV): ុ [U+17BB], ូ [U+17BC], ួ [U+17BD], ុំ [U+17BB U+17C6],

(3) Above-base Vowel (AbvV): ិ [U+17B7], ី [U+17B8], ឹ [U+17B9], ឺ [U+17BA], ើ [U+17BE], and

(4) Post-base Vowels (PstV): ោ [U+17C4], ៅ [U+17C5], ៀ [U+17C0], ឿ [U+17BF], ា [U+17B6].

In Khmer language grammar textbook, Nikahit ( ំ U+17C6) and Reahmuk ( ះ U+17C7) are included in the vowel inventory, even though they are described elsewhere that they are diacritics or signs. The public usually consider these two as vowels. Plus when these two characters are used with other vowels, i.e. ុ (U+17BB) as in ុះ, េ (U+17C1) as in េះ and ោ (U+17C4) as in ោះ, Khmer linguists usually consider them as one vowel unit. Unicode Standard, however, considers these as a combination of vowel and diacritics (Nikahit and Reahmuk).

Below are the character sequences posited by Unicode Standard, OpenType and Open Forum:

Unicode Standard (2018:639)

B {R | C} {S {R}}* {{Z} V} {O} {S}

where

  • B is a base character (consonant character, independent vowel character, and so on)
  • R is a Robat C is a consonant shifter
  • S is a subscript consonant or independent vowel sign
  • V is a dependent vowel sign
  • Z is a zero width non-joiner or a zero width joiner
  • O is any other sign

OpenType (2018)

Cons + {COENG + (Cons | IndV)} + [PreV | BlwV] + [RegShift] + [AbvV] + {AbvS} + [PstV] + [PstS]

where

  • Cons – Consonant character
  • IndV – Independent vowel character
  • COENG – The COENG code
  • PreV – Vowel that is positioned before the base glyph
  • BlwV – Vowel that is positioned below the base glyph
  • RegShift – Triisap or Muusikatoan character
  • AbvV - Vowel that is positioned above the base glyph
  • AbvS – A sign character that is positioned above the base glyph
  • PstV – Vowel that is positioned after the base glyph
  • PstS – Sign character that is positioned after the base glyph
  • { } – Indicates 0 to 2 occurrences
  • – Indicates 0 or 1 occurrence
  • | – Exclusive OR
  • + – Cumulative AND

Note: ZWNJ and ZWJ are to be placed in between the RegShift and the AbvV to prevent the RegShift from changing its shape.

Open Forum

Open Forum of Cambodia wrote a document on “How to type Khmer Unicode” (2004). Khmer character ordering was devised as:

Consonant + Coeng consonant(s) + Vowel + Sign(s)

Consonant + Coeng consonant(s) + Consonant-shifter + Vowel + Sign(s)

Consonant + Robat {+ Vowel} {+ Sign}

Consonant + Coeng consonant(s) + Consonant-shifter + Vowel + Above signs + After signs

where

  • Consonant – [U+1780..U+17A2] or [U+17A5..U+17B3]
  • Coeng consonant – [U+17D2] + [U+1780..U+17A2] or [U+17A5..U+17B3]
  • Vowel – [U+17B6..U+17C7]
  • Sign – [U+17CB..U+17D1, U+17D3]
    • Above sign – [U+17CB, U+17CD..U+17D1, U+17D3, U+17DD]
    • After sign – [U+17C7, U+17C8]
  • Consonant shifter – U+17C9, U+17CA
  • Robat – U+17CC

7.2. New Proposal

Character orderings posited below are an attempt made to eliminate the discrepancy in the three schemes mentioned above. Though the proposal here has been discussed and contributed from various sources and individual, there may still be room of improvement. Kanjahn (2012:2) posited a character ordering that allows Register Shifter in two places, one before the subscript (right after the base) and another after the subscript. He allows Post Signs (which he called Space Signs) (ៈ [U+17C8] and ះ [U+17C7]) after a subscript.

Here is what we propose.

  • BaseCommon: Consonant [U+1780..U+17A2] or Independent Vowel [U+17A5..U+17B3]
  • BaseOther: Dotted Circle [U+25CC] or non-break space [U+00A0]
  • Robat: the Robat [U+17CC]
  • Ahsda: the Ahsda [U+17CF]
  • AbvV: above-base vowel [U+17B7..U+17BA, U+17BE]
  • PreV: pre-base vowel [U+17C1..U+17C3]
  • BlwV: below-base vowel [U+17BB..U+17BD]
  • PstV: post-base vowel [U+17B6, U+17BF, U+17C0, U+17C4, U+17C5]
  • Coeng: the Coeng [U+17D2]
  • RegShift: Register Shifter, Muusikatoan [U+17CA] or Triisap [U+17C9]
  • AbvS: [U+17C6, U+17CB, U+17CD, U+17CE, U+17D0, U+17D1, U+17D3, U+17DD]
  • Z: zero width non-joiner [U+200C] or zero width joiner [U+200D]

Partial clusters defined:

Base: BaseCommon | BaseOther
VowelGroup: (Z + AbvV) | PreV | BlwV | PstV
CoengGroup: Coeng + Base

Robat/Ahsda cluster defined:

Base [Robat | Ahsda] [PreV | BlwV | PstV] [PstS]

General cluster defined:

Base {CoengGroup} [RegShift] [VowelGroup] [AbvS] [PstS | CoengGroup]

Independent Cluster: All Khmer characters not classified above are considered clusters by themselves.

Restrictions: Here are some restrictions in the present day Khmer which have been observed. It:

  • doesn’t allow digits as bases,
  • doesn’t allow Coeng, consonant shifters, or above-base vowels or signs in robat syllables (but Jarai does, i.e. ំ័ [U+17C6 U+17D0], a combination of two above-base signs),
  • allows Coeng Ro only as the last Coeng when there are two subscripts in a row and one of which is a Coeng Ro,
  • allows at most two subscripts in a syllable, and
  • allows at most one vowel, and at most one above-base and one post-base sign.

7.3. Character Distribution in an Orthographic Syllable

Reahmuk (ះ), Nikahit (ំ), Yuukaleapintu (ៈ), Toandakhiat (៍), Ahsda (៏) and Bantoc (់) always occurs at the syllable final position. They can be considered as orthographic syllable boundary.

Not any character can be weaved together with another one. An independent vowel can not be placed after another independent vowel in the same syllable. Coeng (្ U+17D2), RegShift, Robat, Vowel, Samyok Sannya or Kakabat can never be followed by itself.

The table below illustrates which character can be placed after another in the same orthographic syllable. “v” denotes that characters in the top row (1) can follow those in the left most column (2), “x” 1 can't follow 2, and “v'” 1 can follow 2 in theory in the same syllable.

                         ​1
2
Cons IndV Coeng RegShift Robat V Samyok Sannya Kakabat
Cons x x v x x x x x
IndV x x v' x x x x x
Coeng v v' x x x v[15] x x
RegShift v x x x x x x x
Robat v x x x x x x x
V v x x v v x x x
Samyok Sannya v x x v x x x x
Kakabat v x x v' x v v' x
Reahmuk v x x v x v x v
Nikahit v x x v x v x v
Yuukaleapintu v x x v' v v x v
Toandakhiat v x x x x v x x
Ahsda v x x x x v x x
Bantoc v x x x x x x x

7.4. Rendering Issues

According to Horton et al. (2017), there are at least eight cases of rendering issues in Khmer Unicode implementation. These eight issues lead to other problem for the end users and developers.

  • Confusability: users are not able to make a sound judgement on which is the right way to type a word. Oftentimes, different sequences of the same characters are shown up exactly the same on the screen.
  • Vulnerability: users can be spoofed and taken advantage of.
  • Searchability: users are not able to find what they are looking for because they use different character sequences in the query.
  • Compatibility: when a user uses android to send/share text documents with a windows or mac user, the text may look different because of the different implementation in the rendering engines.

The following sections list the issues one by one. The examples are adopted from the paper (Horton et al. 2017). The rendered text were done in Google Chrome 58.0 and Android 6.0.1.

7.4.1. Subscript and Vowel Concatenation

This is a case of when a “subscript” and a “vowel” are combined. Typing either one before another does not make any difference in the visual output on the screen, but it is invalid to place a Vowel before a Subscript.

  • Subscript + Vowel ខ ្ម ែ រ > ខ្មែរ ‘Khmer’
  • Vowel + Subscript ខ ែ ្ម រ > ខែ្មរ invalid sequence

7.4.2. Concatenation of Two Subscripts

This is a case of when having two subscripts after a base consonant, and one of the subscript is [U+17D2 U+179A] ្ត. Placing either subscript before another would have the same visual output.

  • Subscript + [U+17D2 U+179A] ស ្ត ្រ ី > ស្ត្រី ‘woman’
  • [U+17D2 U+179A] + Subscript ស ្រ ្ត ី > ស្រ្តី invalid sequence

7.4.3. Concatenation of a Subscript and a Consonant Shifter

This is a case of combining a subscript with a consonant shifter. The Khmer spelling convention (Kol 2008:28-32) does not mention which one should come first, but Open Forum of Cambodia (2004:11) claims that the consonant shifter should always come after the subscript.

  • Subscript + Consonant Shifter ម ្យ ៉ ា ង > ម្យ៉ាង ‘one way’
  • Consonant shifter + Subscript ម ៉ ្យ ា ង > ម៉្យាង invalid sequence

7.4.4. Concatenation of A Consonant Shifter and a Vowel

This is a special case of Khmer text rendering. The consonant shifter, either ៊ [U+17CA] or ៉ [U+17C9], has to be rendered as a glyph that looks like ុ , when there is an above vowel attached to it (See Special Treatment of Consonant Shifters).

  • Consonant Shifter ( ​៊) + Above vowel ស ៊ ី > ស៊ី ‘to eat (vulgar)’
  • Above vowel + Below vowel ស ី ុ > សុី invalid sequence
  • Below vowel + Above vowel ស ុ ី > សុី invalid sequence
  • Consonant Shifter ( ​៉) + Above vowel ស ៉ ី > ស៉ី incorrect sequence

The last example is considered as incorrect rather than invalid because the character sequence is valid, but according to the usage of Muusikatoan ( ​៉) it cannot be used with the 1st series consonants.

7.4.5. Vowels of Two Unicode Code Points

This is one of the most common confusing character sequences. Which should come first is not a concern among users though because, like other issue, the output on the screen does not look any different.

  • [U+17BB] [U+17C6] ក ុ ំ > កុំ ‘don’t’
  • [U+17C6] [U+17BB] ក ំ ុ > កំុ invalid sequence
  • [U+17B6] [U+17C6] ច ា ំ > ចាំ ‘to wait’
  • [U+17C6] [U+17B6] ច ំ ា > ចំា invalid sequence

7.4.6. One Unicode Code Point

ោ [U+17C4] can be confused with a combination of េ [U+17C1] and ា [U+17B6] because the outputs of the three encoding on the screen look the same.

  • [U+17C4] ល ោ ក > លោក ‘Mr.’
  • [U+17C1] [U+17B6] ល េ ា ក > លេាក invalid sequence
  • [U+17B6] [U+17C1] ល ា េ ក > លាេក invalid sequence

Similarly, ើ [U+17BE] can be confused with a combination of េ [U+17C1] and ី [U+17B8].

  • [U+17BE] ប ើ > បើ ‘if’
  • [U+17C1] [U+17B8] ប េ ី > បេី invalid sequence
  • [U+17B8] [U+17C1] ប ី េ > បីេ invalid sequence

7.4.7. Similar Subscripts

This is the case of identical subscript. The users do not see any different in the outlook of these two.

  • [U+17D2] [U+178A] ក ណ ្ដ ា ល > កណ្ដាល ‘Kandal province’
  • [U+17D2] [U+178F] ក ណ ្ដ ា ល > កណ្តាល incorrect sequence

7.4.8. Spaces and Joiners

  • Zero Width Space is an invisible space (i.e. no width) which is usually put in between words in a sentence. It is helpful for text processing tools because it tells where the word boundaries are (Open Forum of Cambodia 2004:16-17).
  • Zero Width non-Joiner can be inserted before consonant shifters in order to prevent them from being rendered as subscript (​ុ), and it can also be inserted directly before vowels in order to prevent the formation of ligatures between the base character and the vowel. (Kanjahn 2012:3).
  • Zero Width Joiner can also be used before vowels to force a ligature between above vowels (see the section on Special Treatment of Consonant Shifters) and certain consonants.

8. Text Processing

8.1. Keyboard

Below is the standard key arrangement approved by the National ICT Development Authority (NiDA). There keyboard layout is divided into three layers: (1) the main layer, (2) the Shift layer and (3) the AltGr layer. A normal press on any key could output the character on the main layer. To output the character on Shift layer or AltGr layer, press and hold the Shift or AltGr key and then press any key of the expected character. Here is a list of character on each layer:

(1) the main layer

  • first row: « ១ ២ ៣ ៤ ៥ ៦ ៧ ៨ ៩ ០ ឥ ឲ ឮ
  • second row: ឆ ឹ េ រ ត យ ុ ិ ោ ផ ៀ ឪ
  • third row: ា ស ដ ថ ង ហ ្ ក ល ើ ់
  • fourth row: ឋ ខ ច វ ប ន ម ុំ ។ ៊

(2) the Shift layer

  • first row: » ! ៗ “ ៛ % ៍ ័ ៏ ( ) ៌ = ឭ
  • second row: ឈ ឺ ែ ឬ ទ ួ ូ ី ៅ ភ ឿ ឧ
  • third row: ាំ ៃ ឌ ធ អ ះ ញ គ ឡ ោះ ៉
  • fourth row: ឍ ឃ ជ េះ ព ណ ំ ុះ ៕ ?

(3) the AltGr layer

  • first row: ‍‌zwj zwnj @​ ៑ $ € ៙ ៚ * { } × ៎ \
  • second row: ឯ ឫ ឦ ឱ ឰ ឩ ឳ
  • third row: ៖ ៈ
  • fourth row: , . /

Here is how the layout looks:

Khmer Unicode Keyboard

For smartphone, there is no standard keyboard layout in place when it comes to the number of rows on each layer and the number of characters on each row. However, most keyboards use 4x10 on each layer, meaning 4 rows and 10 characters on each row.

8.2. Sorting

The Royal Academy of Cambodia was approached and asked for advice regarding the sorting in Khmer. It turns out that there is no documentation related to this topic that could help. The Choun Nath dictionary seems to deploy two ways of sorting: (1) alphabetical order of characters and (2) alphabetical order of sounds. You may find words written with similar initial consonant sounds listed next to another. For instance, words begin with independent vowel ឫ are listed after the consonant រ which sounds similar to the independent vowel. A paper done by PAN localization[16](nd) entitled “Khmer Collation Development” suggests a solution that the sorting used in the Chuon Nath dictionary has to be adapted. Chuon Nath dictionary sorts entries based on they are pronounced, not the spelling. For instance, បង់ [U+1794 U+1784 U+17CB] is listed before បកតិ [U+1794 U+1780 U+178F U+17B7] even though the second character of the second word (i.e. ក [U+1780]) appears before that of the first word (i.e. ង [U+1784]) in the alphabet chart.

8.3. Fonts

Khmer typefaces have changed significantly since the 6th century. The following image shows how Khmer consonants evolved over time as quoted by Scheuren (2010:8) from Maspero (1915:48).

Khmer consonants evolved over time

8.3.1. Font Style

According to Scheuren (ibid:9-10) there are three main types of Khmer font styles: (a) មូល Mool ‘lit. round’, (b) ជ្រៀង Chrieng ‘lit. slanted’ and (c) Upright which is the standing version of Chrieng. Mool style was commonly found in inscriptions, while Chrieng was used in palm-leaf manuscript (a.k.a. Sastra) which existed before the printing types introduced in 1877.

To date, the Mool style is usually used in banners and titles of books or articles; while the Upright, the standing Chrieng, is of everyday uses. The table below shows how the three styles look.

Mool Style in Khmer OS Moul font face Khmer OS Moul font face
Chrieng Style in Khmer OS Metalchrieng font face Khmer OS Metalchrieng font face
Upright style in Khmer OS system font face Khmer OS system font face

8.3.2. Font Rendering

Khmer font rendering is complex because characters are not rendered in a linear order. Vowels are not always found after the base consonant. They can go to the left, right, above, below or even around the base. Similarly, subscripts can be rendered to the left, right or below the base. These make it confusing to the users as to what should be typed when and placed where.

Lists of vowel according to their positions when collating with the base are shown in the table below.

left above below right around
េ (U+17C1)
ែ (U+17C2)
ៃ (U+17C3)
ិ (U+17B7)
ី (U+17B8)
ឹ (U+17B9)
ឺ (U+17BA)
ំ (U+17C6)
ុ (U+17BB)
ូ (U+17BC)
ួ (U+17BD)
ា (U+17B6)
ះ (U+17C7)
ៈ (U+17C8)
ាំ (U+17B6 U+17C6)
ើ (U+17BE)
ឿ (U+17BF)
ៀ (U+17C0)
ោ (U+17C4)
ៅ (U+17C5)
ុំ (U+17BB U+17C6)
ុះ (U+17BB U+17C7)
េះ (U+17C1 U+17C7)
ោះ (U+17C4 U+17C7)

Here is a list of subscripts and where they should be rendered in their respective categories.

left below right
្រ ្ក
្ខ
្គ
្ង
្ច
្ឆ
្ជ
្ញ
្ដ
្ឋ
្ឌ
្ណ
្ត
្ថ
្ទ
្ធ
្ន
្ផ
្ព
្ភ
្ម
្ល
្វ
្ហ
្អ
្ឃ
្ឈ
្ឍ
្ប
្យ
្ស

8.4. Diacritic Position

All diacritics are placed on top of the base: ៉ ៊ ់ ៌ ៍ ័ ៏ ៎ ៝. Only one diacritic is usually found on a consonant in a syllable, except ‘a Consonant Shifter + Samyok Sannya’. There is also a common occurence of a consonant shifter with Nikahit “Consonant Shifter + Nikahit,” but be aware that Nikahit functions as a vowel in that environment, despites its name.

9. Application of Khmer Script to Other Languages

Khmer alphabet is not only used to write Khmer language, but it is also used to write at least six ethnic minority languages (i.e. Bunong, Tampuan, Brao, Krung, Jarai and Kuay) and two dead languages (i.e. Pali and Sanskrit). Each language requires unique syllable configuration and character sequences. This section describes the two main points in each language:

  • a list of consonants, subscripts, vowels, diacritics, symbols and punctuations if applicable
  • how the writing system is different from Khmer

See Appendix F for a side by side comparison of the characters existing in Khmer and each ethnic language.

The following sections has a list of characters used in each ethnic minority language if the form of table where the orthographic characters are in the first row, phonemic representations corresponding with the orthographic characters in the second row and the unicode code points in the third row.

9.1. Bunong

According to the Bunong-Khmer Bilingual Dictionary​ (2011:ទ-ន), 53 Khmer characters are used in writing Bunong.

9.1.1. Bunong Consonants

There are 28 orthographic consonants in Bunong. យ្ស /ç/ and ស /h/ always occurs at the final position.

/k/ /kʰ/ /g/ /ŋ/ /c/ /cʰ/ /ɟ/ /ɲ/ /ɗ/ /t/
U+1780 U+1781 U+1782 U+1784 U+1785 U+1786 U+1787 U+1789 U+178A U+178F
យ្ស
/tʰ/ /d/ /n/ /ɓ/ /pʰ/ /p/ /m/ /j/ /ç/ /r/
U+1790 U+1791 U+1793 U+1794 U+1795 U+1796 U+1798 U+1799 U+1799
U+17D2
U+179F
U+179A
អ្យ អ្វ
/l/ /w/ /b/ /h/ /h/ /ʔ/ /ʔj/ /ʔw/
U+179B U+179C U+179E U+179F U+17A0 U+17A2 U+17A2
U+17D2
U+179A
U+17A2
U+17D2
U+179C

9.1.2. Bunong Subscripts

្អ [U+17D2 U+17A2] is not listed in the table because it is used not in the Bunong Khmer Dictionary, however, it is used by some speakers.

្យ ្រ ្ល ្វ ្ហ
/j/ /r/ /l/ /w/ /h/
U+17D2
U+1799
U+17D2
U+179A
U+17D2
U+179B
U+17D2
U+179C
U+17D2
U+17A0

9.1.3. Bunong Vowels

/a/ [ĭ] /i/ [ɵ̆] /ɨ/ /ŭ/ /u/ /ɵ/ /iᵊ/ /e/
U+17B6 U+17B7 U+17B8 U+17B9 U+17BA U+17BB U+17BC U+17BE U+17C0 U+17C1
/ɛ/ /ăj/ /o/ /ăw/ /ɔ/
U+17C2 U+17C3 U+17C4 U+17C5 U+17DD

ៀ /iᵊ/ [U+17C0] is used in certain words borrowed from Khmer.

ៃ /ăj/ [17C3] and ៅ /ăw/ [U+17C5] are phonological compounds of a vowel and a consonant.

Bunong does not have an inherent vowel.

9.1.4. Bunong Additional Vowels

ាៈ or ា​់ ឺៈ េៈ ែៈ ោៈ ៝ៈ or ៝​់
/ă/ /ə̆/ [ĕ] /ɛ̆/ [ŏ] /ɔ̆/ /ɨᵊ/
U+17B6
U+17C8
or
U+17B6
U+17CB
U+17BA
U+17C8
U+17C1
U+17C8
U+17C2
U+17C8
U+17C4
U+17C8
U+17DD
U+17C8
or
U+17DD
U+17CB
U+17BF

ឿ /ɨᵊ/ [U+17BF] is used in certain words borrowed from Khmer.

9.1.5. Bunong Symbols and Punctuation

In Bunong writing, four symbols are used in the same way as those of Khmer language(i.e. ៗ ។ ៕ and ៖) and other which are borrowed.

9.1.6. How is the writing system different from Khmer?

  • The use of two Khmer obsolete characters:
    • ឞ /b/ [U+179E] as in ឞារ (ឞ ា រ) /ɓar/ ‘two’
    • ៝ /ɔ/ [U+17DD] as in ក៝ន (ក ៝ ន) /kɔn/ ‘child by birth; son’
  • ់ [U+17CB] can be placed on consonant រ [U+179A] as in ព៝រ់ (ព ៝ រ ់) /pɔ̆r/ ‘to burn’.
  • ៈ [U+17C8] cannot occur with a consonant on its own, it has to be preceded by a vowel as shown in “Additional Vowels” list above. This is the opposite of the Khmer spelling convention where ៈ [U+17C8] is usually used to attach to the consonant without any vowel intervening it.
  • ៝ [U+17DD] behaves like a vowel, and it can be followed by a final consonant with or without ់ as in គ៝ង (គ ៝ ង) /gɔŋ/ ‘k.o. gong’ and គ៝ង់ (គ ៝ ង ់) /gɔ̆ŋ/ ‘to roast’.
  • Sequences uniquely used in Bungong (i.e. ប្ហ្យៅ, ប្ហ្វៃ, អ្យ្រ៝ស).
  • The writing system does not follow the Khmer two series system where one vowel symbol can represent two sounds, depending on which consonant series precedes it. In Bunong, each vowel symbol represents only one sound in every instance.

9.2. Tampuan

The number of consonants and vowels in the Tampuan Alphabet book (Pech 2006) and the Tampuan-Khmer Dictionary (Muang 2012) varies. The first lists 30 consonants and 27 vowels. The latter lists 29 consonants and 28 vowels. The following details are adapted from the Tampuan Alphabet book (ibid), the Tampuan-Khmer dictionary (ibid) and other documents obtained from SIL members.

9.2.1. Tampuan Consonants

/k/ /kh/ /k/ /kh/ /ŋ/ /c/ /c/ /ɲ/ /ʔd/ /ʔd/
U+1780 U+1781 U+1782 U+1783 U+1784 U+1785 U+1787 U+1789 U+178A U+178C
/n/ /t/ /th/ /t/ /th/ /n/ /ʔb/ /ph/ /p/ /ph/
U+178E U+178F U+1790 U+1791 U+1792 U+1793 U+1794 U+1795 U+1796 U+1797
អ្យ
/m/ /j/ /r/ /l/ /w/ /ç/ /h/ /l/ /ʔ/ /ʔj/
U+1798 U+1799 U+179A U+179B U+179C U+179F U+17A0 U+17A1 U+17A2 U+17A2
U+17D2
U+1799
ច្យ ហ្ញ
/cj/ /hɲ/
U+1785
U+17D2
U+1799
U+17A0
U+17D2
U+1789

“Tampuan Khmer English Dictionary with English Khmer Tampuan Glossary” (2007)​ uses /ñ/ instead of /ɲ/ for ញ [U+1789] and /d/ instead of /ʔd/ for ដ [U+178A] and ឌ [U+178C].

9.2.2. Tampuan Subscripts

្គ ្ង ្ញ ្ន ្ម ្យ ្រ ្ល ្វ
/ʔ/ /ŋ/ /ɲ/ /n/ /m/ /j/ /r/ /l/ /w/
U+17D2
U+1782
U+17D2
U+1784
U+17D2
U+1789
U+17D2
U+1793
U+17D2
U+1798
U+17D2
U+1799
U+17D2
U+179A
U+17D2
U+179B
U+17D2
U+179C

្ង [U+17D2 U+1784] and ្ញ [U+17D2 U+1789] are used to write proper names only. They do not usually used in common Tampuan words.

្គ [U+17D2 U+1782] only occurs word finally under វ and យ.

9.2.3. Tampuan Vowels

[17]
/ɒː/ /ɔː/ /aː/
N/A
/ɛ/
/i̤/
/əi/
/i̤ː/
/ə/
/ɨ̤/
/əɨ/
/ɨ̤ː/
/o/
/ṳ/
/oː|ɔː|ou /
/ṳː/
/uə/
/ṳə/
/aə/
/əː/
U+17B6 U+17B7 U+17B8 U+17B9 U+17BA U+17BB U+17BC U+17BD U+17BE
ុំ ាំ
/ɨə/
/ɨ̤ə/
/iə/
/i̤ə/
/eː/
/e̤ː|ɛ̤ː/
N/A
/ɛː/
/ai/
N/A
/ao/
/o̤ː/
N/A
/ɨ̤w/
/om/
/ṳm/
/ɒm/
N/A
/am|a/
N/A
U+17BF U+17C0 U+17C1 U+17C2 U+17C3 U+17C4 U+17C5 U+17BB
U+17C6
U+17C6 U+17B6
U+17C6
ុះ េះ ោះ ិះ ឹះ ើះ ែះ ូះ
/ah/
N/A
/oh/
/ṳh/
/ɛh/
N/A
/ɒh/
N/A
N/A
/i̤h/
/əh/
/ɨ̤h/
/aəh/
N/A
N/A
N/A
/ouh/
/ṳːh/
U+17C7 U+17BB
U+17C7
U+17C1
U+17C7
U+17C4
U+17C7
U+17B7
U+17C7
U+17B9
U+17C7
U+17BE
U+17C7
U+17C2
U+17C7
U+17BC
U+17C7

ែះ exists in the Tampuan-Khmer dictionary (ibid), but not in the Tampuan alphabet book (ibid). One of the reference documents states that ែះ is not used.

The empty cell in the first column of the first vowel table above is the inherent vowel.

9.2.4. Tampuan Diacritics

The Tampuan Alphabet book (ibid) illustrates examples of the usage of ៉ (U+17C9) and ៊ (U+17CA). They are used to change the series of the consonant.

  • ប៉ ម៉ ង៉ យ៉ រ៉ វ៉ ញ៉
  • ស៊ ហ៊ ប៊ អ៊

9.2.5. How is the writing system different from Khmer?

  • The ័ [U+17D0] symbol indicates that the main vowel is pronounced with a breathy phonation. The symbol only ever occurs word finally. It is only used in situations where the vowel would otherwise be tense. In most cases a second series consonant will indicate breathy phonation on the following vowel too. It should be noted that the ័ is supposed to stand at the word boundary​ (i.e. usually on the final consonant), but since there is restriction in the Khmer Unicode character ordering when combining ះ [U+17C7] with ័ , it has to be encoded before the vowel. If placing ័ after ះ , the text would not look right (i.e. the dotted circle appears in between the two characters).
  • Only nine consonants have subscript forms.
  • ៉ [U+17C9] is used with clusters like ប្រ ប្ល to indicate that ប sounds [p], not [b]. Therefore, ប្រ is [br] and ប្រ៉ is [pr]; ប្ល is [bl] and ប្ល៉ is [pl].
  • ់ can be put on any final consonant. It does not have the same restriction as in Khmer.
  • A white space is used in between each word to denote a word boundary.
  • Subscript ្វ [U+17D2 U+179C] is placed after subscript ្រ [U+17D2 U+179A].

9.3. Brao

The following is the character inventory of Brao language (a.k.a. Brao Ombaa). There are 36 consonants, 2 subscripts, 19 vowels and 5 diacritics.

9.3.1. Brao Consonants

/k/ /kʰ/ /k/ /kʰ/ /ŋ/ /c/ /ɟ/ /c/ /ɟ/ /ɲ/
U+1780 U+1781 U+1782 U+1783 U+1784 U+1785 U+1786 U+1787 U+1788 U+1789
/d/ /ʔd/ /d/ /ʔn/ /n/ /t/ /tʰ/ /t/ /tʰ/ /n/
U+178A U+178B U+178C U+178D U+178E U+178F U+1790 U+1791 U+1792 U+1793
/b/ /pʰ/ /p/ /pʰ/ /m/ /j/ /r/ /l/ /w/ /ç/
U+1794 U+1795 U+1796 U+1797 U+1798 U+1799 U+179A U+179B U+179C U+179F
អ្យ
/h/ /l/ /ʔ/ /ʔɟ/ /g/ /ʔb/
U+17A0 U+17A1 U+17A2 U+17A2
U+17D2
U+1799
U+179D U+179E

9.3.2. Brao Subscripts

្រ ្ល
/r/ /l/
U+17D2 U+179A U+17D2 U+179B

9.3.3. Brao Vowels

Some vowels in Brao have two phonemic representations: one for the 1st series and another is for the 2nd series. For instance, ិ is realized as /ɛ/ in the 1st series and /i/ for the 2nd series.

inherent
/ɔɔ/
N/A
/aa/
N/A
/ɛ/
/i/
/ɨj/
/ii/
/ə/
/ɨ/
N/A
/ɨɨ/
/o/
/u/
/oo/
/uu/
/uə/
N/A
/ɨə/
N/A
U+17B6 U+17B7 U+17B8 U+17B9 U+17BA U+17BB U+17BC U+17BD U+17BF
ោះ ុំ ាំ
/iə/
N/A
/əə/
N/A
N/A
/ɛɛ/
/aj/
N/A
/ɔm/
N/A
/ah/
N/A
/ɔh/
/uəh/
/om/
/um/
/am/
N/A
U+17C0 U+17BE U+17C2 U+17C3 U+17C6 U+17C7 U+17C4
U+17C7
U+17BB
U+17C6
U+17B6
U+17C6
ាំង
/aŋ/
N/A
U+17B6
U+17C6
U+1784

9.3.4. Brao Diacritics

U+17CB U+17C9 U+17CA U+17DD U+17CE

9.3.5. How is the writing system different from Khmer?

  • Two obsolete characters (i.e. ឝ and ឞ) are used as consonants.
  • Only two subscripts are used in Brao.
  • Khmer does not have អ្យ as a cluster, nor អ្យ្រ៊ីប /ʔə̆rʔjiip/ ‘very black’.
  • ់ is placed on any final consonant whose vowel (either /aa/ and /ɔɔ/) before it is shortened.
  • A white space is used in between each word to denote a word boundary.
  • ៎ lengthens the vowel in the /ɔɔh/ sequence.

9.4. Krung

In Krung language, there are 33 consonants, 6 subscripts, 16 vowels and 3 diacritics. Krung series system mostly conforms to Khmer writing system.

9.4.1. Krung Consonants

The source does not provide phonemic representation, but the romanized version of each orthographic consonant. They are listed in the second rows.

k kh k kh ng c j c j nh
U+1780 U+1781 U+1782 U+1783 U+1784 U+1785 U+1786 U+1787 U+1788 U+1789
d qd d qd n t t n b ph
U+178A U+178B U+178C U+178D U+178E U+178F U+1791 U+1793 U+1794 U+1795
p ph m j r l w s h l
U+1796 U+1797 U+1798 U+1799 U+179A U+179B U+179C U+179F U+17A0 U+17A1
q g qb
U+17A2 U+179D U+179E

9.4.2. Krung Subscripts

្ង ្ន ្ម ្រ ្ល ្អ
ng n m r l q
U+17D2
U+1784
U+17D2
U+1793
U+17D2
U+1798
U+17D2
U+179A
U+17D2
U+179B
U+17D2
U+17A2

9.4.3. Krung Vowels

Inherent
/àà/
N/A
/aa/
N/A
/è/
/i/
/ùy/
/ii/
/e/
/ù/
N/A
/ùù/
/o/
/u/
/oo/
/uu/
/ue/
/ue/
/ee/
N/A
U+17B6 U+17B7 U+17B8 U+17B9 U+17BA U+17BB U+17BC U+17BD U+17BE
ោះ ិះ ាំង
/ie/
/ie/
N/A
/èè/
/ay/
N/A
/ah/
N/A
/àh/
N/A
/èh/
N/A
/ang/
N/A
U+17C0 U+17C2 U+17C3 U+17C7 U+17C4
U+17C7
U+17B7
U+17C7
U+17B6
U+17C6
U+1784

9.4.4. Krung Diacritics

Three diacritics are used:

  • ់ shortens the vowel length.
  • ៉ changes the consonant to the 1st series
  • ៊ changes the consonant to the 2nd series
U+17CB U+17C9 U+17CA

9.4.5. How is the writing system different from Khmer?

  • Two obsolete characters (i.e. ឝ and ឞ) are used as consonants.
  • The consonant shifters are used with the obsolete characters (i.e. ឝ៊ and ឞ៊).
  • A white space is used in between each word to denote a word boundary.

9.5. Jarai

In Jarai language, there are 35 consonants, nine subscripts, 22 vowels and four diacritics.

9.5.1. Jarai Consonants

/k/ /kʰ/ /k/ /kʰ/ /ŋ/ /c/ /cV̤/ /c/ /cV̤/ /ɲ/
U+1780 U+1781 U+1782 U+1783 U+1784 U+1785 U+1786 U+1787 U+1788 U+1789
/tV̤/ /ˀd/ /tV̤/ /ˀd/ /n/ /t/ /tʰ/ /t/ /tʰ/ /n/
U+178A U+178B U+178C U+178D U+178E U+178F U+1790 U+1791 U+1792 U+1793
/bV̤/ /pʰ/ /p/ /pʰ/ /m/ /j/ /ɣ/ /l/ /w/ /s/
U+1794 U+1795 U+1796 U+1797 U+1798 U+1799 U+179A U+179B U+179C U+179F
/h/ /l/ /ʔ/ /kV̤/ /ˀb/
U+17A0 U+17A1 U+17A2 U+179D U+179E

9.5.2. Jarai Subscripts

្គ ្ង ្ញ ្ន ្ម ្យ ្ល ្វ
/k/ /ŋ/ /ɲ/ /n/ /m/ /j/ /l/ /w/
U+17D2
U+1782
U+17D2
U+1784
U+17D2
U+1789
U+17D2
U+1793
U+17D2
U+1798
U+17D2
U+1799
U+17D2
U+179B
U+17D2
U+179C

9.5.3. Vowels

inherent
/ə/
/ɔ/
/aː/
N/A
/ɛʔ/
/iʔ/
N/A
/iː/
/əʔ/
/ɨʔ/
N/A
/ɨː/
/oʔ/
/uʔ/
/oː/
/uː/
N/A
/uə/
N/A
/əː/
U+17B6 U+17B7 U+17B8 U+17B9 U+17BA U+17BB U+17BC U+17BD U+17BE
ោះ ិះ ែះ
/ɨɑ/
/ɨa/
/ie/
N/A
N/A
/eː/
N/A
/ɛː/
/ɑm/
N/A
/ah/
N/A
/aʔ/
N/A
/ɑh/
N/A
/ɛh/
/ih/
N/A
/ɛɛh/
U+17BF U+17C0 U+17C1 U+17C2 U+17C6 U+17C7 U+17C8 U+17C4
U+17C7
U+17B7
U+17C7
U+17C2
U+17C7
ុំ ាំ ាំង ឹះ ុះ
/om/
/um/
/am/
N/A
/aŋ/
N/A
/əh/
/ɨh/
/oh/
/uh/
U+17BB
U+17C6
U+17B6
U+17C6
U+17B6
U+17C6
U+1784
U+17B9
U+17C7
U+17BB
U+17C7

9.5.4. Jarai Diacritics

Four diacritics are used.

  • ់ [U+17CB] marks short vowel. *
  • ៉ [U+17C9] changes second series consonant to the first series.
  • ៊ [U+17CA] changes the first series consonant to the second series.
  • ័ [U+17D0] marks nasalization, but since it can be confused in function with the Khmer Samyok Sannya, another form is proposedː ម៍, placing before the syllable it modifies.

9.5.5. How is the writing system different from Khmer?

  • Two obsolete characters (i.e. ឝ and ឞ) are used as consonants.
  • A white space is used in between each word to denote a word boundary.
  • The placement of ័ is unclear, but it seems to be inconsistent with how it is used in Khmer.
    • ឝ៉្លំ័ ឝ ៉ ្ល ំ ័
    • ឆ័រុម ឆ ័ រ ុ ម [ɟɨɣũm] ‘needle’
    • ប្វៈ័ ? [bũaʔ] ‘work’
    • វុៈ័ ? [wãʔ] ‘oil’
    • ក្លា័ប ? [klaap] ‘difficult’
    • ឝ្វ័ះ ? [gũah] ‘morning’

9.6. Kuay

In Kuay language, there are 34 consonants, 21 subscripts, 25 vowels, and 4 diacritics.

9.6.1. Kuay Consonants

/k/ /kʰ/ /k/ /kʰ/ /ŋ/ /c/ /cʰ/ /c/ /cʰ/ /ɲ/
U+1780 U+1781 U+1782 U+1783 U+1784 U+1785 U+1786 U+1787 U+1788 U+1789
/ɗ/ /ɗ/ /n/ /t/ /tʰ/ /t/ /tʰ/ /n/ /ɓ/ /pʰ/
U+178A U+178C U+178E U+178F U+1790 U+1791 U+1792 U+1793 U+1794 U+1795
/p/ /m/ /j/ /r/ /l/ /w/ /s/ /h/ /l/ /ʔ/
U+1796 U+1798 U+1799 U+179A U+179B U+179C U+179F U+17A0 U+17A1 U+17A2
អ្ច
/ʄ/
U+17A2
U+17D2
U+1785

9.6.2. Kuay Subscripts

្ក ្ខ ្គ ្ឃ ្ង ្ច ្ជ ្ញ ្ដ ្ឌ
[k] [kʰ] [k] [kʰ] [ŋ] [c] [c] [ɲ] [ɗ] [ɗ]
U+17D2
U+1780
U+17D2
U+1781
U+17D2
U+1782
U+17D2
U+1783
U+17D2
U+1784
U+17D2
U+1785
U+17D2
U+1787
U+17D2
U+1789
U+17D2
U+178A
U+17D2
U+178C
្ត ្ថ ្ទ ្ន ្ប ្ព ្ភ ្ម ្យ ្រ
[t] [tʰ] [t] [n] [ɓ] [p] [pʰ] [m] [j] [r]
U+17D2
U+178F
U+17D2
U+1790
U+17D2
U+1791
U+17D2
U+1793
U+17D2
U+1794
U+17D2
U+1796
U+17D2
U+1797
U+17D2
U+1798
U+17D2
U+1799
U+17D2
U+179A
្ល ្វ ្ស ្ហ ្អ
[l] [β] [s] [h] [ʔ]
U+17D2
U+179B
U+17D2
U+179C
U+17D2
U+179F
U+17D2
U+17A0
U+17D2
U+17A2

9.6.3. Kuay Vowels

inherent
/ɐː/
N/A
/aː/
/ia/
/ɛ/
/i/
[ɐj]
/i̤ː/
/ɜ/
/ə̤/
/əː/
/ɨ̤ː/
/ɔ|o/
/ṳː/
/oː/
/ṳː/
/uə/
N/A
/ɜː/
/ə̤ː/
U+17B6 U+17B7 U+17B8 U+17B9 U+17BA U+17BB U+17BC U+17BD U+17BE
េះ
/ɨə/
/ɨ̤ə/
/iə/
/i̤ə/
/eː/
/e̤ː/
/ɛː/
N/A
[aj]
[ej]
/ɔː/
/o̤ː/
N/A
[o̤w]
[ɐm]
[ɔ̤m]
[ah]
[a̤h]
[ɛh]
[ɛ̤h]
U+17BF U+17C0 U+17C1 U+17C2 U+17C3 U+17C4 U+17C5 U+17C6 U+17C7 U+17C1
U+17C7
ោះ ុះ ិះ ឹះ ាំ ាំង
[ɐh]
[a̤h]
[ɔh]
[uh]
N/A
[ih]
[ɜh]
N/A
[am]
[ɜ̤m]
[aŋ]
[a̤ŋ]
U+17C4
U+17C7
U+17BB
U+17C7
U+17B7
U+17C7
U+17B9
U+17C7
U+17B6
U+17C6
U+17B6
U+17C6
U+1784

9.6.4. Kuay Diacritics

Four diacritics are used.

U+17CB U+17C9 U+17CA U+17D0

Punctuation are used in the same as in Khmer language (i.e. ។ “” «» ? ! ៖ ៗ … ៕ ‌៚).

9.6.5. How is the writing system different from Khmer?

  • ់ gets to be used on រ, a thing which never occurs in Khmer.
  • ់ can be used on a final consonant preceded by a Samyok Sannya or ែ.

9.7. Sastras

9.7.1. Pali

Nhok (1962:1-2) wrote that there are 41 characters in the Pali alphabet inventory--8 vowels and 33 consonants. Note that vowels are divided into two groups: independent and dependent. It is observed independent vowels usually start a syllable, while the dependent ones always attached to a initial consonant.

No diacritic is used in Pali.

The tables below present Khmer characters used in Pali in the first row, romanization of them in the second and the unicode code points in the third row.

9.7.1.1. Pali Consonants
k kh g gh c ch j jh ñ
U+1780 U+1781 U+1782 U+1783 U+1784 U+1785 U+1786 U+1797 U+1788 U+1789
ṭh ḍh t th d dh n
U+178A U+178B U+178C U+178D U+178E U+178F U+1790 U+1791 U+1792 U+1793
p ph b bh m y r l v s
U+1794 U+1795 U+1796 U+1797 U+1798 U+1799 U+179A U+179B U+179C U+179F
h
U+17A0 U+17A1 U+17C6
9.7.1.2. Pali Subscripts
្ក ្ខ ្គ ្ឃ ្ង ្ច ្ឆ ្ជ ្ឈ ្ញ
k kh g gh c ch j jh ñ
U+17D2
U+1780
U+17D2
U+1781
U+17D2
U+1782
U+17D2
U+1783
U+17D2
U+1784
U+17D2
U+1785
U+17D2
U+1786
U+17D2
U+1797
U+17D2
U+1788
U+17D2
U+1789
្ដ ្ឋ ្ឌ ្ឍ ្ណ ្ត ្ថ ្ទ ្ធ ្ន
ṭh ḍh t th d dh n
U+17D2
U+178A
U+17D2
U+178B
U+17D2
U+178C
U+17D2
U+178D
U+17D2
U+178E
U+17D2
U+178F
U+17D2
U+1790
U+17D2
U+1791
U+17D2
U+1792
U+17D2
U+1793
្ប ្ផ ្ព ្ភ ្ម ្យ ្រ ្ល ្វ ្ស
p ph b bh m y r l v s
U+17D2
U+1794
U+17D2
U+1795
U+17D2
U+1796
U+17D2
U+1797
U+17D2
U+1798
U+17D2
U+1799
U+17D2
U+179A
U+17D2
U+179B
U+17D2
U+179C
U+17D2
U+179F
្ហ
h
U+17D2
U+17A0
9.7.1.3. Pali Independent Vowels
a ā i ī u ū e o
U+17A3 U+17A4 U+17A5 U+17A6 U+17A7 U+17A9 U+17AF U+17B1
9.7.1.4. Pali Dependent Vowels
inherent
a ā i ī u ū e o
N/A U+17B6 U+17B7 U+17B8 U+17BB U+17BC U+17C1 U+17C4
9.7.1.5. Pali Vowel Combination
ិំ ុំ
iṃ uṃ
U+17C6 U+17B7
U+17C6
U+17BB
U+17C6
9.7.1.6. How is the writing system different from Khmer?
  • Nikahit (​ំ) is in the consonant chart and used in the same way as a consonant. Chin et. al. (2012:2) wrote that it is not only used to put on top of a consonant, but also on one of these three independent vowels: ឣ ឥ ឧ (i.e. ឣំ ឥំ ឧំ).
  • Nikahit is used to combine with two vowels: ិ and ុ to make ិំ and ុំ. ិំ is exceptionally found in Pali, and never in Khmer spelling convention. (It may be replaced by ឹ in Khmer alphabet.)
  • Consonants can be stacked together but not pronounced as a cluster. The subscript is pronounced as an initial consonant of the next syllable whether or not there is a vowel after it.

9.7.2. Sanskrit

This inventory is adapted from Huot (1956:1-5,18-21). In a more recent book, Hum (2005:ឌ,ឍ,ធ) includes three additional consonants: ឡ ក្ស ជ្ញ. Sanskrit has 33 consonants, 33 subscripts, nine dependent vowels, 13 independent vowels and six diacritics.

The tables that follow present Khmer characters used in Sanskrit in the first row, romanization of them in the second and the unicode code points in the third row.

9.7.2.1. Sanskrit Consonants
k kh g gh c ch j jh ñ
U+1780 U+1781 U+1782 U+1783 U+1784 U+1785 U+1786 U+1797 U+1788 U+1789
ṭh ḍh t th d dh n
U+178A U+178B U+178C U+178D U+178E U+178F U+1790 U+1791 U+1792 U+1793
p ph b bh m y r l v ç
U+1794 U+1795 U+1796 U+1797 U+1798 U+1799 U+179A U+179B U+179C U+179D
s h
U+179E U+179F U+17A0
9.7.2.2. Sanskrit Subscripts
្ក ្ខ ្គ ្ឃ ្ង ្ច ្ឆ ្ជ ្ឈ ្ញ
k kh g gh c ch j jh ñ
U+17D2
U+1780
U+17D2
U+1781
U+17D2
U+1782
U+17D2
U+1783
U+17D2
U+1784
U+17D2
U+1785
U+17D2
U+1786
U+17D2
U+1797
U+17D2
U+1788
U+17D2
U+1789
្ដ ្ឋ ្ឌ ្ឍ ្ណ ្ត ្ថ ្ទ ្ធ ្ន
ṭh ḍh t th d dh n
U+17D2
U+178A
U+17D2
U+178B
U+17D2
U+178C
U+17D2
U+178D
U+17D2
U+178E
U+17D2
U+178F
U+17D2
U+1790
U+17D2
U+1791
U+17D2
U+1792
U+17D2
U+1793
្ប ្ផ ្ព ្ភ ្ម ្យ ្រ ្ល ្វ ្ឝ
p ph b bh m y r l v ç
U+17D2
U+1794
U+17D2
U+1795
U+17D2
U+1796
U+17D2
U+1797
U+17D2
U+1798
U+17D2
U+1799
U+17D2
U+179A
U+17D2
U+179B
U+17D2
U+179C
U+17D2
U+179D
្ឞ ្ស ្ហ
s h
U+17D2
U+179E
U+17D2
U+179F
U+17D2
U+17A0
9.7.2.3. Sanskrit Independent Vowels
a ā i ī u ū r ṝ e
U+17A3 U+17A4 U+17A5 U+17A6 U+17A7 U+17A9 U+17AB U+17AC U+17AD U+17AF
ai o au
U+17B0 U+17B1 U+17AA
9.7.2.4. Sanskrit Dependent Vowels
inherent
a ā i ī u ū e ai o au
N/A U+17B6 U+17B7 U+17B8 U+17BB U+17BC U+17C1 U+17C3 U+17C4 U+17C5
9.7.2.5. Sanskrit Combinations of Consonant and Independent Vowel

ក [U+1780] serves as a placeholder for any consonant.

ក្ឫ ក្ឬ ក្ឭ[18]
kṛ kṝ kḷ
U+1780 \ U+17D2 \ U+17AB U+1780 \ U+17D2 \ U+17AC U+1780 \ U+17D2 \ U+17AD

ក្ឭ is rendered incorrectly here, it should be rendered as shown below

ក្ឭ

9.7.2.6. Sanskrit Diacritics
Anuneaseka
Virama \ វិរាមៈ Avakraha
អវគ្រហៈ
Anusvara
អនុស្វរៈ
Visarga
វិសគ៌ៈ
Athisvara
អធ៌ស្វរៈ
Anuneaseka[19]
អនុនាសិកៈ
U+17D1 U+17DC U+17C6 U+17C8 U+17CC N/A
VIRIAM AVAKRAHASANYA NIKAHIT YUUKALEAPINTU ROBAT -
  • ៑ Virama (Huot 1956:20-21)
  • ៜ Avakraha (Huot 1956:21)
  • ំ Anusvara or Nikahit (Huot 1956:18-19)
  • Anusvara is placed on a vowel (ibid:117)
    ហវីំឞិ

ហវីំឞិ

  • Anusvara is placed on an indepdent vowel (ibid:115)
    ឧំឞិ

ឧំឞិ

  • ះ Visarga or Visachani (Huot 1956:19-20)
  • Anuneaseka (Huot 1956:19)

Anuneaseka

  • ៌ Athisvara RO in a cluster as the second member (Huot 1956:26). It is not listed as one of the 5 diacritics. It is instead an alternative appearance of រ when occurring after another consonant in an initial cluster.
  • AbvV can be placed above the Athisvara (ibid:133)
    ចតុថ៌ី

ចតុថ៌ី

  • The Athisvara is placed on a subscript (ibid:160)
    ចតុក្យ៌ស៑

ចតុក្យ៌ស៑

9.7.2.7. How is the writing system different from Khmer?
  • The use of the two obsolete characters (i.e. ឞ ឝ ៑).
  • Anuneaseka doesn’t exist in the current version of Khmer Unicode character inventory.
  • Independent vowels are seen to be used like subscripts.
    • ក្ឫឞ្ណ kṛṣṇa ‘black’ (Huot 1956ː12)
    • ស្ប្ឫហា spṛha ‘wish (n)’ (Huot 1956ː12)
  • Independent vowel which is used like a subscript can be placed under a subscript (ibid:87)
    ស្ម្ឫតិ

ស្ម្ឫតិ

  • Consonant clusters which never exist in Khmer languageː
    • សត្ត្វ sattva ‘animal’
    • មត្ស្យ matsya ‘fish’
    • វ្រត vrata ‘buddhist temple’

មត្ស្យ is not rendered correctly here, it should be rendered as shown below:

មត្ស្យ

10. Stone Inscriptions (Pre-Angkor, Angkor, Post-Angkor Era)

The oldest stone inscriptions written in Khmer language dated back to the 5th century. In stone inscriptions, there are 33 consonants, 14 vowels and three diacritics (Vong 2011ː15). These are used to transcribe inscriptions as they were carved in the stones. The table below shows the equivalence of each of them. For more details on character comparison, see https://drive.google.com/drive/folders/1jxv9xxrWNPd0U7j1wDW838elUQMUMrV0?usp=sharing.

Consonant Subscript Vowels Special Signs
្ក
្ខ
្គ [20]
្ឃ
្ង
្ច
្ឆ
្ជ
្ឈ
្ញ
្ដ
្ឋ
្ឌ
្ឍ
្ណ
្ត
្ថ
្ទ
្ធ
្ន
្ប
្ផ
្ព
្ភ
្ម
្យ
្រ
្ល
្វ
្ឝ
្ឞ
្ស
្ហ

11. Sample Texts for Orthography Check

This section includes sample text of each ethnic minority language.

11.1. Bunong Sample Text

នអើន​ទែស​ក៝ន​នទ្រោក.. នើម​ងក៝ច​ម៝ស ក្រូយ ចិត.. នតើម​ឆា​អើម ឞុត​ឞូនុយ្ស​ឞារ​ហៃ​គុ​ប៝ន​នហាញ។ ឞុត​នទ្រោក​ទូ​ហៃ​ម្វាយ​ៗ ទូ​ហៃ​ឞុត​នទ្រោក​ងក្វាង់ ទូ​ហៃ​ជឹត​ឞុត​នទ្រោក​មែ គុ​នហាញ ច្យាប់​នហាញ។ ជ៝ៈ​នារ​ជ៝ៈ​ខៃ​ជ៝ៈ​នាម់​ពាង់​អី​ច្យាប់​នទ្រោក​មែ នទ្រោក​មែ​រី​ឞុន ឞុន​រី​គែស​ក៝ន។ ពាង់​ឞារ​ហៃ​ហោម​ច្យាប់​នទ្រោក​នហាញ​ដ៝ង់។ ទូ​នារ​រី​ពាង់​អី​ច្យាប់​នទ្រោក​ងក្វ៝ង់​គែស​រាវេ។ រាវេ​ឞារ​ពែ​នារ ឞិច​មោ​រើយ ឆោង​មោ​កាស។ ទាស​អី​ពាង់​ទូ​ហៃ​ជាៈ​ឆោង​នហាញ ឆោង​នហាញ​មោ​អុច​ដ៝ង់។ ងើយ​ឞាស​ពាង់​អី​ច្យាប់​នទ្រោក​មែ ពាង់​អី​ច្យាប់​នទ្រោក​មែ​គែស​នាវ​រាវេ​មា​ពាង់​អី​ច្យាប់​នទ្រាក​ងក្វ៝ង់។ ពាង់​អោប ៖ “ហៃ​កោញ! ម៝ស​នាវ​រាវេ​ម្រែ​នៃ? ឞារ​ពែ​នារ​ហើយ មោ​លាង់​ឆោង​មោ​លាង់​ឆា​រ៝​នៃ”។ ពាង់​រី កោញ​ពាង់​អើស ៖ “មោ​ឞុត​នាវ​រាវេ​អោស គ៝ប់​រាវេ​អាប់​នារ​កាល់​ទឹ​រាវេ​នាវ​ក៝ន​នទ្រោក​នៃ​ទើម”។ ពាង់​អី​ច្យាប់​នទ្រោក​មែ​អោប ៖ “មើម​ក៝ន​នទ្រោក?” “អើ ក៝ន​នទ្រោក​មៃ​នៃ ក៝ន​នទ្រោក​គ៝ប់”។ ពាង់​ព្លើង​អើស ៖ “ក៝ន​នទ្រោក​មៃ?” ពាង់​អើស ៖ “អើ ក៝ន​នទ្រោក​នៃ​ក៝ន​នទ្រោក​គ៝ប់​ងាន់​គ៝ប់​ឞិច​អា​មាង់​ងក្ល៝ន់​នហេល​មៃ​រាលាច់​មា​គ៝ប់​មោ​ទើយ​អោស”។ ព្លើង ៖ “មោ​ក៝ន​នទ្រោក​មៃ ក៝ន​នទ្រោក​គ៝ប់​ងាន់ នទ្រោក​មៃ​នទ្រោក​ងក្វ៝ង់​មោ​ឞ្លោវ​គែស​ក៝ន។ នទ្រោក​មែ​ជេង​ឞុត​ក៝ន”។ “មៃ​លើយ​រាលាច់ រាលាវ​នទ្រោក​គ៝ប់​ជើង​នទ្រោក​មែ​មៃ នទ្រី​នទ្រោក​នៃ​ក៝ន​នទ្រោក​គ៝ប់​ងាន់ គែស​នទ្រោក​ងក្វ៝ង់​គ៝ប់​ទឹង​គែស​ក៝ន​នទ្រោក​មែ​មៃ”។ ជេសរី​ពាង់​អី​ច្យាប់​នទ្រោក​មែ​មោ​ទើយ​រាលាច់ ពាង់​គុ​ឆ្រុង​ជេសរី​ហាន់​ឆឹត​អា​ច្វាញ។ ត៝ត់​ឆឹត​អា​ច្វាញ​គែស​នាវ​រាវេ​ឞឹច​មោ​រើយ​ឆោង​មោ​កាស​ងោយ​មា​រាវេ​ក៝ន​នទ្រោក ក៝ប់​ត៝ត់​អាង​អោយ​មា​ព្លឹ​រាលាច់​នាវ​ក៝ន​នទ្រោក​ជឹត។ រី​រាលាច់​មោ​ទើយ តា​ឆាក់​ពាង់​នូយ្ស​ហោ​ងាន់ មោ​លែៈ​នូយ្ស​អោស ៖ ​“ក៝ន​នទ្រោក​នៃ​ក៝ន​នទ្រោក​គ៝ប់​ងាន់ មើម​ទឹង​លាស​ក៝ន​នទ្រោក​ពាង់​ច្រាវ”។ ជេសរី​ពាង់​ងើយ​អ៝ន់​មា​ពាង់​ច្យាប់​នទ្រោក​ងក្វ៝ង់ ៖ ​“មៃ​លាស​ក៝ន​នទ្រោក​មៃ​នឹង​នែង នទ្រី​ហាន់​ជ៝យ​ឞូ​រាញ​ញច្រាៈ​នាវ​ទោយ្ស​អ៝ន់​រាង្លាច់​នាវ​អា មៃ​ហាន់​ជ៝យ​ឞូរាញ​មៃ គ៝ប់​ជ៝យ​ឞូរាញ​គ៝ប់”។ ជេសរី​ផូង​ខាន់ពាង់​​តឹមនាល​នារ ឞារ​នារ​ហែ​តឹម​ម៝ប់​ឞាល់។ “ល៝រ​មៃ មៃ​ក៝ប់​គ៝ប់ ល៝រ​គ៝ប់​គ៝ប់​ក៝ប​មៃ”។ ជេសរី​កោញ​អី​ច្យាប់​នទ្រោក​ងក្វ៝ង់ ជ៝យ​ឞូ​រាញ​គែស​ល៝រ​ពាង់​អី​ច្យាប់​នទ្រោក​មែ​ងោយ​មា​រាវេ ឞើយ​មោ​គែស​ឞូរាញ​ញច្រាៈ។ ជេសរី​ពាង់​កាស​ឞើស​អោយ​ហាន់​ជ៝យ​ឞូនុយ្ស​ហាន់​ត៝ត់​អា​ត្រ៝ង​រី​ម៝ប់​មា​កោញ​រាពាយ​ញច្វាត់ នទ្រ៝ត់​ព្រុស មផារ​នទ្រ៝ត់​រាពាយ​រី​ពាង់​រាក​រាពាយ​ឞើស​កើយ​លាស៖ “​ច្យាក​រាពាយ​មពីក​កាប់​យៅ​ឆា​ម៝ស​ញច្វាត់​ពាង់​តើម​អោយ​តើម​ព្លា​នៃ”។ ជេសរី​រាពាយ​តាង់​ពាង់​រាក រាពាយ​លាស ៖ “ហៃ​កោញ! ម៝ស​មៃ​ទឹង​រាក​គ៝ប់​មេស?” ពាង់​អើស ៖ “គ៝ប់​នទ្រ៝ត់​មៃ​នៃ​ហើយ មៃ​ម៝ន​រាពាយ​ឞើស?” “អើ​គ៝ប់។ គ៝ប់​ហើយ​រាពាយ”។ “នទ្រី​កោញ​ទាន់​ទោយ្ស​ម៝ន​អើ គ៝ប់​លិច គឹត”។ “លិច​ច្រាវ​អោស គ៝ប់​មោ​នូយ្ស​មា​មៃ​អោស រាពាយ​អោប៖ ​មៃ​អាស​ហាន់​ហាៈ​កោញ​ឞើស​អើម​អោយ​នៃ​ញអោត​នទ៝ស​មោ​នាន់​ញអោត​នទើ​មោ​នាន់​ក្វាន់​ព្រីត​តា​តី​ឞារ​អឹ”។ ពាង់​អើស ៖ “គ៝ប់​គើញ​ហាន់​ជ៝យ​ឞូរាញ​ច្រាៈ​ទោយ្ស​ម៝ស​ម្រែ​ហៃ”។ រាពាយ​អោប ៖ “មើម​នាវ​មៃ​រី កោញ?” កោញ​ពាង់​ងក៝ច​លែៈ​រាងោច​នាវ​ផូង​ខាន់ពាង់​វៃ​គុ​នហាញ​ច្យាប់​នទ្រោក​នហាញ​មា​ត៝ត់​នទ្រោក​គែស​ក៝ន រាពាយ​អ្យាត់​លែៈ​នោ​នាវ​រាពាយ​អោប ៖ “នទ្រី​នតើម​នាវ​មៃ​រី​កោញ”។ “អើ​ទាស​រី​ទើម​ម៝ន មេៈ”។ “មៃ​លិច​ច្រាវ​អោស​តៃ​គ៝ប់​កើល​មៃ នទ្រី​មៃ​ហាន់​ជ៝យ​ព្រីត​អ៝ន់​គ៝ប់​ទោស​មៃ​ជ៝យ​ព្រីត​អ៝ន់​គ៝ប់ គែស​មៃ​ហាន់​ល៝រ​អា​នតិច គ៝ប់​ហាន់​ឞើស​កើយ”។ ជេសរី​កោញ​អី​ច្យាប់​នទ្រោក​មែ ហាន់​ឆឹត​ល៝រ។ ​ពាង់​ទឹង​លែៈ​គុ​ក៝ប់ ក៝ប់​ទុត​មា​ជ៝ៈ ជេសរី​ត៝ត់​រាពាយ​ឆៃ​ត៝ត់​រាពាយ​ឞូរាញ​បាៈៗ អោប​អ្វែស​លាង។ ត្រុយ្ស​កោញ​អី​ច្យាប់​នទ្រាក​ងក្វ៝ង់ ៖ “ម៝ស​ជ៝ៈ​មៃ? ជ៝ៈ​ឞូរាញ​គុ​ក៝ប់​មៃ​ទូ​ហៃ​ទើម”។ រាពាយ​អើស ៖ “គែស​នាវ​ដ៝ង់”។ ពាង់​អីរី​អោប​ជឹត ៖ “ម៝ស​នាវ​មៃ?” “មោ​នី​អោស​នាវ។ នទ្រី​រឹង​លែៈ​ឞូរាញ​ឞ៝ន់?” អើស ៖ “រឹង​លែៈ​ហើយ ក៝ប់​មៃ​ទើម”។ “គ៝ប់​អោយ​អា​នើស​ម៝ស​មោ​ជ៝ៈ​ម្រែ​ហៃ ពើស​អ្វាញ់​ទែស​ក៝ន​កោញ​គ៝ប់​មេៈ”។ ផូង​ខាន់ពាង់​ហីស​រាហ៝ល់។ ជេសរី អោប ៖ “អាស​ទែស​ក៝ន​កោញ?” “អើ​ទែស​ក៝ន​កោញ​គ៝ប់​ងាន់​ហែស”។ ពាង់​អី​ច្យាប់​នទ្រោក​ងក្វ៝ង់​អើស ៖ “គ៝ប់​មោ​អ្យាត់​លិច​វៃ​ឆៃ​ឞូនុយ្ស​ឞូក្លោ​គែស​ក៝ន​អោស”។ “ងាន់​ហែស គ៝ប់​នើស​អ្វាញ់​អុញ។ អូរ​ពាង់​នទុត​ហាន់​អា​មីរ កោញ​គ៝ប់​នើស​ទែស​ឞើស​កើយ”។ ពាង់​អី​ច្យាប់​នទ្រោក​ងក្វ៝ង់​រី​រាលាច់ ៖ “គ៝ប់​មោ​តាង់​រ៝​វៃ​លាស​ឞូនុយ្ស​ឞូក្លោ​គែស​ក៝ន​អោស”។ រាពាយ​អើស ៖ “មៃ​មោ​អ្យាត់​គ៝ប់ គ៝ប់​មោ​អ្យាត់​មៃ​ដ៝ង់។ គ៝ប់​មោ​វៃ​តាង់​នទ្រោក​ងក្វ៝ង់​ទែស​ក៝ន។ នទ្រី​ឞូរាញ​អើម​វៃ​ឆៃ វៃ​តាង់​នទ្រោក​ងក្វ៝ង់​ទែស​ក៝ន​ដ៝ង់?” ឞើស​ផូង​ឞូរាញ​អើស ៖ “មោ​វៃ​ឆៃ​អោស”។ ទឹង​លែៈ​ឞូ​លាស​កើត​រី​ទាទេ។ “លាស​នទ្រី​មើម​លាស​ក៝ន​នទ្រោក​ពាង់​ច្រាវ? លាស​នទ្រោក​ងក្វ៝ង់​ពាង់​ឞ្លាវ​ទែស​ក៝ន កោញ​គ៝ប់​ឞ្លាវ​ទែស​ក៝ន​ដ៝ង់។ លាស​នទ្រី​ក៝ន​នទ្រោក​នៃ​ក៝ន​នទ្រោក​ឞូ មោ​ទី​ក៝ន​នទ្រោក​មៃ​ក៝ន​នទ្រោក​មែ​ពាង់​អី​តី នទ្រោក​មៃ​ងក្វ៝ង់”។ កោញ​អី​ច្យាប់​នទ្រោក​ងក្វ៝ង់​តាង់​រាពាយ​លាស​នទ្រី មឞើស​ឞូ​លាស​ហាន់​អាច់​ទាច់​មា​ទុត​ញច្វាត់​ឆឹត​អា​ច្វាញ​រី៕

11.2. Tampuan Sample Text

(A drowning boy)

ទី ដារ់ ម៉ោញ អាញ់ លូ គួប អាញ់ ទី ជៀក គួប សោប ឡាំ ប៉ាគ់ ទៀក តាំងលោ័ ភឿ ហៀន រែ ទៀក។ ផះ ឡាំ ទឹល ប៉ាគ់ ទៀក ណោះ អាញ់ កា ប៉ាស្រាំ រែ ទៀក។ អាញ់ សាំលឹះ រែ អាញ់ កា ឡង់ ទៀក។ ផះ អាញ់ ឡង់ ណោះ អាញ់ អ្យូគ ខាក់។ អាញ់ កា ឡុង អើ ពូ ណាំង ប៉ាណូស តង័ អាញ់។ ណោះ ហង អាញ់ កា កាវ៉ះ័ ណាំង គួប អាញ់ គួប អាញ់ ណោះ អ៊ែ អ្លុ រែ ទៀក អ៊ែ ប៉ប័ អាញ់ ឡង់។ អ៊ែ កា ទី តង័ អាញ់ អ្យក់ អាញ់ ហាវ ឡាំ ប៉ាគ់ គូក។ កេះណោះ អ៊ែ ប៉ាំងហៀន ឡឹង កាន រែ ទៀក ណោះ អន់ កា អាញ់។ ឡៃង ឡឹង ណោះ អាញ់ កា ទី ប្រ៉គ័ ពួយ កាន អ៊ែ ប៉ាំងហៀន ណោះ។ កេះណោះ អាញ់ កា អ្លុ រែ សឹត រែ ឡាំ រែ សឹត រែ ឡាំ។ ទឹល ទុញ ទុញ អាញ់ កា អ្លុ រែ ទៀក ទឹល រ៉ប់ ដារ់ អា។ ‌៚

11.3. Brao Sample Text

លឿង ឝ្រុង ប៊ិះ

ប៊ិច លឿង មូយ អៃ អំម៉ាច ហំម៉ាច លឿង យ៉ាគ់ អាត់ញ៉ា។

យ៉ាគ់ អាត់ញ៉ា នែ ឡើ ប៊ិច កួន ប្រោះ ប៉ឹះ រ៉ា។ តង៉ៃ មូយ យ៉ាគ់ អាត់ញ៉ា ឡើ ដក់ កូវ ឡង ណគ ហឹ មឺរ អ៊ែ ទឹង ឡើ កូវ ឡង ណគ ឡើ ត្រប្លូច ប្រយ ជូង ណគ ហឹ ត្រម ឡង។ អ៊ែ ឡើ តង៉ូក ប្រយ ដើ ជូង ណគ តៃ ដើ ង៉ាយ យ៉ាវ លំកូវ ឡង ណគ អ៊ែ ឡើ ជឹ ហឹ ហន់ណាម។ អ៊ែ ឡើ ជឹ កី ក្រសឹប កឌឹប ក្លើម អ៊ែ ទ្រី ណគ ឡើ ឌឹក តង៉ា៖ «អើយ បើគ ណគ ចង់ ចា អ៊ឺម ឡះ? ហៃ ហឈិ ឡះ?» តៃ ត្រណើវ ហំប៉ើវ។ អ៊ែ ឡើ ឌឹក តង៉ា ឡឹះ អន់ណាវ៖ «អើយ ហឈិ ឡះ អ៊ែ បក់ អន់ណោះ»។ អ៊ែ ក្ល ណគ ឡើ ត្រណើវ៖ «អ៊ឺម អឈិ អ៊ឺម អតង៉ូក ដើ ជូង អៃ ឡើ ប៉ាត់ ញឹះ តៃ ដើ ង៉ាយ យ៉ាវ អង់កូវ ឡង អៃ។ អ៊ែ ប៉ាគ់ អ៊ិន អ៊ែ ទ្រី ណគ ប្រ៉ៃ ប្រយ មែ ខំឡាំង ឞាវ ដក់ សាត ជូង ណគ ទឹង មឺរ។ មែ អិះ សាត តៃ តៃ អ៊ឺម ប្រសាវ កតាម មែ ង៉ាយ កតាម «ប៉ាគ់ មន់តៃ អ៊ែ អន់អាំ កួន ប្រោះ អន់សូច ណគ អន់នែ»។

អ៊ែ ឡើ ម៉ាង ដើ មែ ទឹង ស្រ៊ុក រៀន៖ «មែ ង៉ាយ មន់តៃ ជូង អៃ ដើ មែ អ៊ែ អៃ អន់អាំ កួន ប្រោះ ដើ មែ អ៊ែ»។ អ៊ែ ម៉ើ សាត ប្រយ ត្រំ ក្រាន ម៉ើ ដក់ មែ អ៊ិះ តៃ ម៉ើ តៃ អ៊ឺម។

អែ ឡើ កឡឹ ឡើ ដក់ សាត ឡឹះ អន់ណាវ តគ់ កណូវ ឡង ណគ ឡឹះ អន់ណាវ ណាគ់ តៃ ឡើ តៃ។ អ៊ែ ឡើ កឡូវ ប្រយ៖ «ឡា អុះ ប្រះ ប្រ៊ី យ៉ាង បង៉ាង ជ្រឺវ អរ៉ាក់ ប្រ៊ី ដាក គ្រែដៃ ថែ ពឋា វន់សាត ជូង អៃ អុះ ណាគ់ អន់ឡាប់ អាំ កួន ប្រោះ ហឹ ហន់ណាម ប៉ាគ់ មន់តៃ ជូង អៃ ណាគ់ អន់ឡាប់ អាំ កួន ប្រោះ អន់សូច ហឹ ហន់ណាម តគ់»។

អ៊ែ គ្រុង ប៊ិះ នែ ឡើ ដុង ប៉ាគ់ អ៊ិន ឌិះ ម៉ាត។ អែ ឡើ តទឺត ចាក់ ណគ ឡើ វឹរ ប៊ិះ ក្រាគ់ អ៊ែ ឡើ ដក់ ប្រយ តគ់ យ៉ាគ់ អាត់ញ៉ា អ៊ែ ឡើ តង៉ា៖ «អង់ង៉ាយ យ៉ាគ់ អ ហដាំង?» យ៉ាគ់ អាត់ញ៉ា ឡើ ត្រណើវ រៀន៖ «អដាំង ជូង អៃ»។ គ្រុង ប៊ិះ ឡើ តង៉ា៖ «ឡើ បើម ង៉ាយ ដឹះ ជូង ហៃ អ៊ិន?» យ៉ាគ់ អាត់ញ៉ា ឡើ ត្រណើវ៖ «អកូវ ឡង ឡើយ ឡើ ប៉ាត់ ទឹង ណិះ ឡើយ។ ប៉ាគ់ ហន់សាត តៃ ចូវ អើយ ណាគ់ អន់ឡាប់ អង់អាំ កួន ប្រោះ ដើ ហៃ»។ អ៊ែ ឡើ ត្រណើវ៖ «ណោះ អន់សាត រួយ ឞ»។ អែ ឡើ សាត ប្រយ ឈុំ តើម ឡង អ៊ែ ឡើ តៃ ប្រយ ទឹង ត្រម ឡង។ អ៊ែ ឡើ រៀន៖ «នែ យ៉ាគ់ អ ជូង ហៃ ឡើ ទឹប ទឹង ត្រម ឡង»។ អ៊ែ យ៉ាគ់ អាត់ញ៉ា ឡើ ហួត ប្រយ ជឹ តៀត ហឹ ហន់ណាម។ អ៊ែ ឡើ កឡូវ ប្រយ មែ ខំឡាំង ឞាវ ណគ៖ «ម៉ិច វន់កួន ចូវ វន់ដក់ ទូង ប៊ិះ ក្រាគ់ ហឹ មឺរ អៃ តគ់»។ អ៊ែ ម៉ើ ដក់ ប្រយ អ៊ែ ម៉ើ តៃ ប៊ិះ ក្រាគ់ ទិះ អ៊ែ ម៉ើ រៀន៖ «យ៉ើយ ប៊ិះ ទិះ ឌិះ ងំចា អាត លំញឹម ប្រយ!»។ អ៊ែ យ៉ាគ់ អាត់ញ៉ា ឡើ ត្រណើវ៖ «ងំចា បើម ង៉ាយ យ៉ាក់ ណគ ឡើយ ឡើ តៃ ជូង អៃ ទឹង ត្រម ឡង។ ណិះៗ នែ អំប្រយ៉ង់ ប៊ឹង កួន អៃ»។ អ៊ែ ម៉ើ ជឹ ទូង ម៉ើ ចក់ ឞ ម៉ើ ទូង ជឹ តៀត ហឹ ហន់ណាម តគ់។ អ៊ែ ឡើ ប៉្រៃ ប្រយ កួន អិះ តៃ ឌីៗ មន់ចក់ បើម ក្ល។ អ៊ែ ប៊ិះ រៀន៖ «យ៉ាគ់ អើយ ហន់ដុង ប៉ាគ់ តៃ ហរែម ហង់កោះ កជែត តាក់»។ អ៊ែ ណាំង អន់សូច អ៊ែ លំចក់ បើម ក្ល។

11.4. Jarai Sample Text

  • ហ្មវ ហា រ៉ាំង មនូស ញូ ណាវ ច្យៈ បាវ ញូ ពគ ចល
  • ឆ្រង់ ទើល រើយ ដង់ ញូ កៈ លើយ បាវ ញូ ញូ ឝ៉្លៃគ ណាវ ពគ សាង ញូ ឞង់ សយ។
  • វីគ មឹង នុន អ៊ើយ មង បាត យ ញូ ប៉គ ហិ បាវ។
  • ញូ ហ្យាប់ម៍ ហ្ងំ បាវ ឡៃគ៖ «យ្វ៉ា យ៉ិត អ៊ិះម៍ លហូតម៍ ឆាត ប៊្រើយ មនូស ចុត ឌុង អ៊ិះម៍?
  • ញូ អ្និតអ្និវម៍ ឍើច អ៊ិះម៍ ឌុត ឋល នុន យ៉ិត អ៊ិះម៍ ប៊្រើយ ញូ ចុត ឌុង អ៊ិះម៍?»
  • បាវ ឡៃគ់ ឝ្ល៉ៃគ៖ «អ៊ិះម៍ អ្នាំ ឡៃគ អុះ! មនូស ផា រ៉ា ណោះ! កតាំង ណោះ! តា អូ ឃិន ញូ អុះ!» «ហ្វ៊ិម៍ យ៉ិត ឝញូ? ពគ ប៉ៈ ញូ ណាវ?»
  • បាវ នុន ឡៃគ ឝ៉្លៃគ៖ «ញូ ឝ្ល៉ៃគ ណាវ ពគ សាង បស ញូ ណាវ ឞង់ សយ ញូ។
  • គើញ ញូ រ៉ៃ បស រើយ ដង់ គើញ ញូ រ៉ៃ ដគ ឝ៉ន មើយ្គ។»

11.5. Kuay Sample Text and the Translation in English

(The Story about Angels and Men))

រើង​ទេវតា​រ៉ើ​​កួយ

រ៉ើដើម​ កួយ​ណាវ​ប៉ាយ ​ក្តែក​រ៉ើ​ពព័ក​កួ​​ជាប់​ឃ្នា​ កើត​​ទេវតា​កួ​អាទី​រ៉ើ​​កួយ​កួ​អា​ទាប​ ​ប៉ក​ជួប​ឃ្នា​ មោះ​​ឃ្នា​​ចៀ​ចូវ​បឺន​ ម៉ឹ​ទឹត​លោះ​កួ​ឆ្ងាយ​​ឃ្នា​យ៉ាំង​រងៃ​អឺ​ កើត​​អាលឹម​អិង ​ទេវតា​ណាវ​សួរ​អន​កួយ​កួ​អា​ទាប​​​ អន​អែល​​ម្ហូ​ប​អន​​ណាវ​ចា​រាល់ត្ងៃ​​ បើ​បឺន​ប៉សមូយ​ អន​ណាវលូមូយ​ បើ​បឺន​លីកកឹ​ណាវ​ទារ​ដក​អន​ណាវ​លូ​មូយ​ ។

តក់​ឌូញ​ៗ​​ចៀ​ កួយ​ណាវ​ខ្ជិល​រេប​តូន​ កឹ​ទេវតា​​ណាវ​ច្លក់​ កឹ​ណាវ​ប៉ាយ ”ផូក​​ម័ង ​កើត​ន្ទ័​បឺន​ម៉ឹ​អែល​ចូវ​អន​កឺ”​ កឹ​បេក​នឺច​ណាវ​ប៉ាយណាវ​​ក្លក់ ​បេក​នឺច​ណាវ​ប៉ាយណាវ​ខ្ជឹល​ កឹ​ទេវតា​ណាវ​វ័​​អន​​ន្ទ្រុះដាក​មា​ចូវ​ច្នាប់​​ កឹ​លិច​ក្តែក​ក្ចែត​ម៉ាត់​កួយ​ ណង់​កឹ​មេៗ​​ប៉ៃ​ណាក់ម្អិង​​ ប៉ាច់​ស្រង់​វ័​ក្បូនកឹ​នាំ​ឃ្នា​សោះ​ជិះ​ប៉ង​ក្បូន​អិង ដាក​សោះ​ម្នា​ ក្បូនកឹ​ចេះកឹ​ប៉្តល​សោះម្អិង​ កឹ​ដាក​សោះ​ច្នាប់​​ចៀ​ច្នាប់​ ក្បូន​កឹ​ប៉្តល​កឹប​រ៉ើ​ពព័ក​​ កឹ​នាំ​ឃ្នា​សោះ​ចៀ​ថ្វាយ​ង្គំ​ទេវតា​ កឹ​ទេវតា​ប៉ាយ​ ”ផូក​ម័ង​ចូវ​វ័ន្ទ័​ កឺរ៉ើផូក​​ម័ង​ម៉ឹ​ក្រៃ​ចួប​ឃ្នា​ទ័ន​អឺ​ ច្បោះផូក​ម័ង​ម៉ឹ​ច្ងាត់​តូនកឺ​ ​ក្រៃ​កឹ​យ៉ាំង​អិងបឺន​សំមុក​ផូក​ម័ង”​ កឹ​ផូក​អិងទារ​ ”អូ​ ! លោក​ទេវតា​អើយ ​អន​ហៃ​កួ​ឡង់​ ក្តែក​អា​ទាប​ដាក​លិច​ម៉ាត់​ហឺយ​ ម៉ឹ​កើត​បន​កួ​អឺ​ ហើ​យ​បេក​ស្រុក​កឹ​ក្ចែត​ម៉ាត់​ ណង់​ត្មង់​កឹ​ហៃ​ប៉ៃ​ណាក់​អឺ”​ កឹ​ទេវតា​សង់​ប៉ាយ​កួយ​អាទាប​ក្ចែត​ម៉ាត់​ក្លឹង​ កឹម្នះច្លក់​​​ កឺ​ព្រ័ម​អន​ផូក​ម័ង​កួ​អាទី មូយ​រយាក់ ។​

កឹ​កួ​ចៀ​ៗ​ផូក​អិង​អាត់​ប៉្សុក​ កឹ​ទារ​ទេវតា​ប៉ាយ ”​ហៃ​ស្អៀ​ចៀ​កួ​អាទាបចៀ​ក្រោយ​​ ហៃ​ស្អៀ​ប៉្ចួរ​រះ​យ៉ាំង​រ៉ើ​ញោង​​ ចោះ​ស្រ​ ចោះ​ប៉ន្លែ​ចា”​ កឹ​ទេវតា​ប៉ាយ​ ”តូន​កឹ​​​ចិត​ម័ង​អឺ​ កឹ​រេប​តឹង​សែង​ចៀ”​ ទេវតា​ប៉ាយ ”​អែល​ត្រៀក​នី​មូយ​ចៀ ​ដក​ប៉្ចួរ​​ប៉ៃ​ណាក់​ ចោះ​ស្រ ​ចោះ​ច្តង” បឺន​ប៉្លីចូវ​កឹ​នាំ​ឃ្នា​តាក់​ចា​ ម៉ឹ​កើត​ក្រប៉ៃ​អឺ​ ។

កឹ​កន្ទ្រូស​តឹងប៉ៃ​ណាក់​​ កួ​ឌូញៗ​​ចៀ ​កឹ​ប៉ឹតត្រៀក​អិង​ ត្រៀកអិង​​ណាវ​បោល​សោះ​ទី​ចៀ​ក្រោយ​ កឹ​នាំ​ឃ្នា​ប៉ឺរ​កឹ​ម៉ឹក​ពើក​ កឹ​សោះចៀ​​ប៉ឺរ​អា​ទី ​បឺន​ពើក​ កឹ​តឹក​អែល​ចូវ​ចៀ​ក្រោយ​ កឹ​តក់​ត្ងៃ​មូយ​អិង​​ ត្រៀក​ណាវ​បោល​សោះទ័ន​ កឹ​ចៀ​អែល​ចូវ​ទ័ន​ កឹ​ទេវតា​ណាវ​វ័រ៍​អន​ទឹត​ក្តែក​លោះ​រ៉ើ​ពព័ក​ ឆុប​​អន​ត្រៀក​បោល​សោះទ័ន​​ កឹ​ប៉ឹត​ក្រោយ​ៗ​ទ័ន ​ឆុប​​បោល​សោះ​រោច​ទ័ន​ហឺយ​ កឹ​ត្រៀក​​កួ​អាទាប​ទាល់​កឹ​ក្ចែត ។​

ត្រៀក​អិង​ក្ចែត​​ចៀ ​ពើកហន​តោល​លោះ​ចូវ​រ៉ើ​មុះ​​ កឹ​តោល​អិង​ប៉្លី​មូយ​ពឺត​ កឹតក់​តោល​អិង​ចែន​ច្នាប់ ​ណាវ​ស្លេះ​លោះរ៉ើ​ទ័ង កឹ​សង់​សម្លេង​កួ​ឡឹង​តោល​អិង​ កឹ​ផូក​អិង​ណាវ​ចោះ ចោះ​ចៀ​ពើក​កួយ​ កឹ​កួយ​លោះ​ចូវ​ម៉្រង​មូយ​ៗ​ កើត​កន្ទ្រូស​កើត​ក្រប៉ៃ​ កឹ​កួយ​លោះ​រ៉ើ​តោល​អិង​ ប៉ៃ​ត្រប៊ឺប៉ៃ​ត្ងៃ​ បឺន​ម៉ាត់​កួយ​ឡឹង​តោល​អិង​ កឹ​កើត​កូយ​ក្លឹង​កួ​ឡឹងក្តែក​តក់​រងៃ ៕

11.6. Pali Sample Text

(The Verses of the Buddha's Auspicious Victories)

(១)​ ពាហុំ សហស្សមភិនិមិ្មតសា វុធន្តំ គ្រីមេខលំ ឧទិតឃោ រសសេនមារំ

ទានាទិធម្មវិធិនា ជិតវា មុនិន្ទោ តន្តេជសា ភវតុ តេ ជយមង្គលានិ ។

(២)​ មារាតិរេកមភិយុជ្ឈិតសព្វរត្តឹ ឃោរម្បនា ឡវក មក្ខមថទ្ធយក្ខំ ខន្តីសុទន្ត

វិធិនា ជិតវា មុនិន្ទោ តន្តេជសា ភវតុ តេ ជយមង្គលានិ ។

(៣)​ នាឡាគិរី គជវរំ អតិមត្តភូតំ ទាវគ្គិចក្កមសនីវ សុទារុណន្តំ មេត្តម្ពុសេក

វិធិនា ជិតវា មុនិន្ទោ តន្តេជសា ភវតុ តេ ជយមង្គលានិ ។

(៤)​ ឧក្ខិត្តខគ្គ មតិហត្ថសុទារុណន្តំ ធាវន្តិយោ ជនបថង្គុលិមា លវន្តំ ឥទ្ធីភិសង្ខ

តមនោ ជិតវា មុនិន្ទោ តន្តេជសា ភវតុ តេ ជយមង្គលានិ ។

(៥)​ កត្វាន កដ្ឋមុទរំ ឥវ គព្ភិនីយា ចិញ្ចាយ ទុដ្ឋវចនំ ជនកាយ មជ្ឈេ សន្តេន

សោមវិធិនា ជិតវា មុនិន្ទោ តន្តេជសា ភវតុ តេ ជយមង្គលានិ ។

(៦)​ សច្ចំ វិហាយមតិ សច្ចកវាទកេតុំ វាទាភិរោ បិតមនំ អតិអន្ធ ភូតំ បញ្ញាបទី

បជលិតោ ជិតវា មុនិន្ទោ តន្តេជសា ភវតុ តេ ជយមង្គលានិ ។

(៧)​ នន្ទោបនន្ទភុជគំ វិពុធំ មហិទ្ធឹ បុត្តេន ថេរភុជគេន ទមាបយន្តោ ឥទ្ធូបទេស

វិធិនា ជិតវា មុនិន្ទោ តន្តេជសា ភវតុ តេ ជយមង្គលានិ ។

(៨)​ ទុគ្គាហទិដ្ឋិ ភុជគេន សុទដ្ឋហត្ថំ ព្រហ្មំ វិសុទ្ធិ ជុតិមិទ្ធិ ពកាភិធានំ ញាណគទេន

វិធិនា ជិតវា មុនិន្ទោ តន្តេជសា ភវតុ តេ ជយមង្គលានិ ។

(៩)​ ឯតាបិ ពុទ្ធជយមង្គលអដ្ឋគាថា យោ វាចនោ ទិនទិនេ សរតេ មតន្ទី ហិត្វាននេកវិ

វិធានី ចុបទ្ទវានី មោក្ខំ សុខំ អធិគមេយ្យ នរោ សបញ្ញោ ។

11.7. Sanskrit Sample Text

(To be obtained)

11.8. Inscription Sample Text

(K.557 Inscription, Mungkol Borey, Takeo)

១- ត្រៃត្រីឝោត្តរបញ្ចឝតឝកបរិគ្រហត្រយោទឝីកេតមាឃបុឞ្យនក្ឞត្រតូលលគ្ន។ បោញឧយឱយក្ញុំឰតក្បោញកម្រតាងអញ។ វក្លបិត១វកន្តាងស្រាង១វត្លោង១

២- វក្ចារ១កុកន្តោ១កោនកុវអលង១កុយលេង១ត្មុរ៦០ក្របិ២វវេ១០តោងត្នេំ៤០ស្រេសន្រេ២ឰអំបោង។ ក្ញុំអំនោយជំអញឰតវ្រះកម្រងតាងអញមហាគណបតិ។

៣- វន្ញា១វកន្តាង១វក្នោច១វត្មោ១វទឝមី១កុកោញវ្រះ១កុជុងបោញ១កោនកុ១កុមាន្រអញ១កុប្លស១ត្មុរ២០កន្តៃតបោសឱយ(យ)ជមានក្បោញ១តចុះត្ងៃវ្រះជោនវ្ងេជ្នៅទន្ហុំ១ចិអញ១តាញ១

12. Wordlist for Orthography Check

The database of the Chuon Nath dictionary is available at: https://code.google.com/archive/p/khmer-dictionary-tools/downloads. This is a complete dictionary with headwords, subentries, part of speech, Khmer pronunciation of certain borrowed words, meanings and cross references.

13. Summary

Khmer script is used to write Khmer, minority languages (i.e. Bunong, Tampuan, Brao, Jarai and Kuay), Pali, Sanskrit and inscription. When working with Khmer script on computer, according to the Unicode Standard, it is important to be mindful of the order of characters within each word. To compensate for sounds which do not exist in Khmer, the minority languages make use of characters that are no longer used in modern Khmer. (i.e. ឝ ឞ ៝). This paper seeks to describe each character in use in the Khmer script, including usage in ligatures, unicode encoding, text processing and how usage of the characters may differ in minority languages.

It is observed (1) that some obsolete characters are commonly used in minority languages, Sastra and inscriptions. One has to consider the outlook and rendering of those characters. (2) The character ordering is also another issue when it comes to one size fit all scenario. Khmer language uses certain characters for one purpose in one way, while they may be used differently for different purposes in minority languages.

References

Bernard, J. B. (1902). Dictionnaire cambodgien-français. Hongkong: Imprimerie de la Société des missions étrangères.

Bunong-Khmer Bilingual Dictionary. (n.d.). Retrieved from http://icc.org.kh/download/Bunong-Khmer_Bilingual%20_Dictionary.pdf.

Chan, S. (2010). វិធី​បង្កើត​ពាក្យ កម្ចី​ពាក្យ និង ការ​ប្រើប្រាស់​ពាក្យ “Creating Words, Borrowing Words and Using Words.” Phnom Penh: Royal Academy of Cambodia.

Chin, C., Ung, S., Men, P., Em, U., & Kheang, S. (2012). ភាសាបាលី ថ្នាក់​ទី១ “Pali Language, Grade 1.” National Buddhist Studies.

Chuon, N. (1967) វចនានុក្រមខ្មែរ “Khmer Dictionary”. Phnom Penh: Buddhist Institute.

Doek, K. (2000). A Study on the Evolution of Khmer Letters. Apsara. Retrieved from http://www.elibraryofcambodia.org/ka-seuk-sa-ompi-viwat-ney-aksor-khmer-ebook/.

Ehrman, M. E., & Kem, S. (1972). Contemporary Cambodian: A Grammatical Sketch. Washington. DC: Foreign Service Institute.

Ethnolinguistic Groups of Cambodia. (2011, December). Retrieved January 24, 2018, from http://www.unesco.org/new/fileadmin/MULTIMEDIA/FIELD/Phnom_Penh/pdf/ethnolinguistic_groups_of_cambodia_poster.pdf.

Headley, R. K., Rath Chim, Ok Soeum. (1997). Modern Cambodian-English Dictionary. Kensington, Md: Dunwoody Pr.

Headley, R. K., Rath Chim. (2014). Modern Cambodian-English Dictionary, Second Edition. Hyattsville, Md: Dunwoody Pr.

Henderson, E. J. A. (1952). The Main Features of Cambodian Pronunciation. Bulletin of the School of Oriental and African Studies, University of London, 14(1), 149–174.

Horton, J., Sok, M., Durdin, M., & Ty, R. (2017). Spoof-Vulnerable Rendering in Khmer Unicode Implementations. Presented at the ACIS2017.

Huffman F.E. (1970). Cambodian System of Writing and Beginning Reader with Drills and Glossary. Retrieved from http://archive.org/details/CambodianSystemOfWritingAndBeginningReader.

Hum, S. (2005). មេរៀន​ភាសាសំស្ក្រឹត “Sanskrit Language Lessons.” Retrieved February 14, 2018, from http://www.elibraryofcambodia.org/mereang-pheasa-somskrert-prer-lout-proyouk-ning-somrourl-veyeakor-banthem-ebook/.

Huot, T. (1956). វេយ្យាករណ៍សំស្ក្រឹត “Sanskrit Grammar” (2nd ed.). Phnom Penh: Buddhist Institute. Retrieved February 14, 2018, from http://www.elibraryofcambodia.org/veyeakor-sam-skert-ebook.

Jordi, J. (n.d.). Brao Ombaa Writing System. ICC & SIL.

Kanjahn, D. (2012, November 1). The Mondulkiri Font Family. Retrieved April 30, 2019, from http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=mondulkiri#23f7dd25.

Keller, C. E. (2005). Two Systems for Writing Krung. SIL.

Khin S. (2007). វេយ្យាករណ៍ភាសាខ្មែរ “Khmer Grammar.” Royal Academy of Cambodia. Retrieved from http://www.elibraryofcambodia.org/veyeakor-peasa-khmer/.

Kul, S. (2008). ភាសាខ្មែរ “Khmer Language.” Nikrotheawoan Pagoda.

Maspero, G. (1915). Grammaire de la langue khmère (cambodgien). Paris, Impr. nationale. Retrieved from http://archive.org/details/grammairedelala00maspgoog.

Muang, P. (2012). វចនានុក្រម​ទំពួន-ខ្មែរ “Tampuan-Khmer Dictionary.” (P. Tuy & S. Chhuk, Eds.) (First Draft). Phnom Penh. Retrieved from http://www.tampuanreader.com/en/dictionary.

Nhok, T. (1962). បែបរៀនថ្មី ភាសាបាលីជាន់ដំបូង “New Method to Learn Basic Pali.” Retrieved February 14, 2018, from http://www.elibraryofcambodia.org/beb-rean-thmey-pheasa-balei-jon-dombong/.

Nuon, B. (1954). អក្ខរានុក្រមខ្មែរ “Khmer Lexicon.” Phnom Penh.

Open Forum of Cambodia. (2004) How to Type Khmer Unicode, Version 1.0:7–14. Retrieved from http://khmeros.info/download/KhmerUnicodeTyping.pdf.

Developing OpenType Fonts for Khmer Script - Typography. (2018, February 8). Retrieved from https://docs.microsoft.com/en-us/typography/script-development/khmer.

Pawley, M.-S., & Pawley, E. (2013). Cambodian Jarai Khmer Based Writing System.

Pech, Y. (2006). សៀវភៅ​អក្សរ​ទំពួន “Tampuan Alphabet Book.” (T. Way, W. Wang, K. Sraen, S. Thieng, L. Thoung, T. Lan, … S. Chhuk, Eds.). Retrieved from http://tampuanreader.com/en/mlsp-dl/481/28096/76388?r=0.

Scheuren, Z. Q. (2010). Khmer Printing Types and the Introduction of Print in Cambodia: 1877–1977 (Dissertation). University of Reading. Retrieved from https://issuu.com/typefacedesign/docs/zachary_scheuren_matd_dissertationw.

SCHILLER, E. (1994). Khmer nominalizing and causativizing infixes. Papers from the Second Annual Meeting of the Southeast Asian Linguistics Society, 309–326.

Eberhard, David M., Gary F. Simons, and Charles D. Fennig (eds.). 2019. Ethnologue: Languages of the World. Twenty-second edition. Dallas, Texas: SIL International. Online version: http://www.ethnologue.com.

Sok, M. (2016). Phonological principles and automatic phonemic and phonetic transcription of Khmer words (Master’s Thesis). Payap University, Thailand. Retrieved from http://inter.payap.ac.th/wp-content/uploads/linguistics_students/Makaras-Thesis.pdf.

The Unicode Consortium. The Unicode Standard, Version 10.0.0, (Mountain View, CA: The Unicode Consortium, 2017. ISBN 978-1-936213-16-0). Retrieved January 23, 2018, from http://www.unicode.org/versions/Unicode10.0.0/.

Thun, H. (2011). វេយ្យាករណ៍ខ្មែរ “Khmer Grammar.” Retrieved January 16, 2019, from http://www.elibraryofcambodia.org/veyeakor-khmer-somrab-krob-phom-siksa-ebook/.

Um, B., & Seng, T. (2012). Khmer Grammar for Primary School. Phnom Penh: Publishing and Distribution House.

Vong, S. (2011). សិលាចារឹក​នៃ​ប្រទេស​កម្ពុជា​សម័យ​មុន​អង្គរ “Pre-Angkor Inscriptions of Cambodia” (2nd ed., Vol. 1). Phnom Penh: Editions Angkor.

Appendixes

A. Word-Initial Consonant Clusters

All possible word-initial consonant clusters as found in the headwords in Khmer-Khmer Dictionary together with their phonemic representations, series, number of instances in the dictionary and the examples are illustrated in the table below. In the official Khmer-Khmer dictionary, 4397 instances of word-initial consonant clusters are found and and there are at least 164 possible unique clusters. Nine consonants (#33, #55, #56, #57, #58, #59, #134, #135 and #169) do not take a subscript in word-initial position.

The number of instances found in the Khmer-Khmer Dictionary (KKD) is obtained by doing RegEx[21] searches on a pattern matching any words whose first member in the word-initial position is a consonant followed by a subscript without any trailing subscript after it. For instance, to find words beginning with ក្ង, look this pattern “^ក្ង([^្]|$)” up in an online dictionary (i.e. http://dictionary.tovnah.com/reg-search). This could give a list of words which begin with ក្ង followed by either any character beside a subscript or nothing at all.

No Cluster Phoneme Series Instances in KKD Grapheme Example Phonemic Transcription Gloss
1 ក្ង- 1st 21 ក្ងោក kŋaok peacock
2 ក្ដ- 1st 44 ក្ដាម kɗaam crab
3 ក្ឌ- 2nd 5 ក្ឌុក kɗuk sound of something falling
4 ក្ទ- kt 2nd 2 ក្ទម្ព ktum tree of paradise
5 ក្ន- kn 1st 6 ក្នុង knoŋ in
6 ក្ប- 1st 32 ក្បាល kɓaal head
7 ក្ម- km 1st 4 ក្មេង kmeeŋ young
8 ក្រ- kr 1st 313 ក្រោម kraom under
9 ក្ល- kl 1st 36 ក្លៀក kliək armpit
10 ក្វ- kw 1st 4 ក្វែន kwaen agile
11 ក្ស- ks 1st 17 ក្សាន្ត ksaan serene
12 ក្អ- 1st 24 ក្អម kʔɑɑm clay pot
13 ខ្ច- kc 1st 16 ខ្ចៅ kcaw snail
14 ខ្ជ- kc 2nd 23 ខ្ជិល kcɨl lazy
15 ខ្ញ- 1st 32 ខ្ញី kɲəj ginger
16 ខ្ត- kt 1st 1 ខ្តត ktɑɑt to cackle (of hen)
17 ខ្ទ- kt 2nd 30 ខ្ទម ktɔɔm hut
18 ខ្ន- kn 1st 34 ខ្នង knɑɑŋ back (of a person)
19 ខ្ព- kp 2nd 16 ខ្ពស់ kpuəh tall, high
20 ខ្ម- km 1st 34 ខ្មែរ kmae Khmer
21 ខ្យ- kj 1st 3 ខ្យង kjɑɑŋ mollusc
22 ខ្ល- kl 1st 25 ខ្លា klaa tiger
23 ខ្វ- kw 1st 53 ខ្វិត kwət wood apple
24 ខ្ស- ks 1st 33 ខ្សាច់ ksac sand
25 គ្ន- kn 2nd 6 គ្នា knie 1S, 3S (informal)
26 គ្រ- kr 2nd 143 គ្រាប់ kroap seed
27 គ្វ- kw 2nd 1 គ្វាម kwaam text, composition (Thai loanword)
28 ឃ្ញ- 2nd 1 ឃ្ញើច kɲəəc in a jerking manner
29 ឃ្ន- kn 2nd 7 ឃ្នាប kniep pincers
30 ឃ្ម- km 2nd 10 ឃ្មុំ kmum bee
31 ឃ្ល- kl 2nd 27 ឃ្លង់ kluəŋ leprosy
32 ឃ្វ- kw 2nd 9 ឃ្វាល kwiel to herd
33 ង្- 0
34 ច្ន- cn 1st 4 ច្នៃ cnaj to polish a gem
35 ច្ប- 1st 23 ច្បាំង cɓaŋ to do battle
36 ច្យ- cj 1st 2 ច្យុត cjot to die (of angel)
37 ច្រ- cr 1st 132 ច្រើន crəən much/many/a lot
38 ឆ្ក- ck 1st 20 ឆ្កែ ckae dog
39 ឆ្គ- ck 2nd 2 ឆ្គង ckɔɔŋ to be awkward
40 ឆ្ង- 1st 12 ឆ្ងាយ cŋaaj far
41 ឆ្ដ- 1st 4 ឆ្ដោ cɗao k.o. fish
42 ឆ្ន- cn 1st 21 ឆ្នុក cnok cork, stopper
43 ឆ្ព- cp 2nd 7 ឆ្ពិន cpɨn k.o. fish
44 ឆ្ម- cm 1st 11 ឆ្មា cmaa cat
45 ឆ្ល- cl 1st 26 ឆ្លើយ claəj to answer
46 ឆ្វ- cw 1st 16 ឆ្វេង cweeŋ left (side)
47 ឆ្អ- 1st 15 ឆ្អឹង cʔəŋ bone
48 ជ្រ- cr 2nd 83 ជ្រូក cruuk pig
49 ជ្វ- cw 2nd 2 ជ្វា cwie Java
50 ឈ្ង- 2nd 5 ឈ្ងោក cŋook to bow the head
51 ឈ្ន- cn 2nd 13 ឈ្នួត cnuət headband
52 ឈ្ម- cm 2nd 11 ឈ្មោល cmool male (non-human)
53 ឈ្ល- cl 2nd 19 ឈ្លើង cləəŋ leech
54 ឈ្វ- cw 2nd 4 ឈ្វេង cweeŋ to be clear, pure
55 ញ្- 0
56 ដ្- 0
57 ឋ្- 0
58 ឌ្- 0
59 ឍ្- 0
60 ណ្ហ- nh 1st 2 ណ្ហើយ nhaəj Don't bother
61 ត្ង- 1st 1 ត្ងោក tŋaok neck-fetter (for criminals)
62 ត្ន- tn 1st 1 ត្នោត tnaot sugar-palm tree
63 ត្ប- 1st 15 ត្បាល់ tɓal mortar
64 ត្ម- tm 1st 6 ត្មាត tmaat vulture
65 ត្រ- tr 1st 230 ត្រល់ trɑl weaver's shuttle
66 ត្ល- tl 1st 9 ត្លុក tlok clown
67 ត្វ- tw 1st 3 ត្វា twaa beef sausage
68 ត្អ- 1st 7 ត្អើក tʔaək to hiccup
69 ថ្ក- tk 1st 14 ថ្កល់ tkɑl to prop up
70 ថ្គ- tk 2nd 2 ថ្គាម tkiem molar (tooth)
71 ថ្ង- 1st 13 ថ្ងាស tŋaah forehead
72 ថ្ដ- 1st 4 ថ្ដោក tɗaok small wooden bells
73 ថ្ន- tn 1st 22 ថ្នាល tnaal seed bed
74 ថ្ព- tp 2nd 9 ថ្ពាល់ tpoal cheek(s)
75 ថ្ម- tm 1st 11 ថ្ម tmɑɑ rock, stone, concrete
76 ថ្ល- tl 1st 20 ថ្លើម tlaəm liver
77 ថ្វ- tw 1st 15 ថ្វាយ twaaj to give, offer (to royalty, clergy or deities)
78 ទ្យ- tj 2nd 1 ទ្យោតិសាស្ត្រ tjoo.teʔ. sah a sacred script foretelling future
79 ទ្រ- tr 2nd 129 ទ្រ trɔɔ Cambodian stringed fiddle
80 ទ្វ- tw 2nd 30 ទ្វារ twie door
81 ធ្ង- 2nd 8 ធ្ងន់ tŋuən to be heavy
82 ធ្ន- tn 2nd 13 ធ្នើរ tnəə shelf
83 ធ្ម- tm 2nd 21 ធ្មេញ tmɨɲ tooth
84 ធ្យ- tj 2nd 9 ធ្យូង tjuuŋ charcoal
85 ធ្ល- tl 2nd 21 ធ្លាក់ tleak to fall (unintentionally)
86 ធ្វ- tw 2nd 5 ធ្វើ twəə to do
87 ន្រ- nr 2nd 2 ន្រាយ nriej Narayana (epithet of Vishnu)
88 ន្អ- 1st 1 ន្អាលនឹង nʔaal.nɨŋ in order that
89 ប្ដ- 1st 4 ប្ដី pɗəj husband
90 ប្រ- pr 1st 517 ប្រាក់ prak silver
91 ប្ល- ɓl 1st 39 ប្លែក plaek to be odd
92 ប្អ- ɓʔ 1st 2 ប្អូន pʔoon younger sibling
93 ផ្ក- pk 1st 20 ផ្កាយ pkaaj star
94 ផ្គ- pk 2nd 13 ផ្គុំ pkum to group/assemble
95 ផ្ង- 1st 9 ផ្ងារ pŋaa to be face up
96 ផ្ច- pc 1st 13 ផ្ចាញ់ pcaɲ to defeat
97 ផ្ញ- 1st 7 ផ្ញើ pɲaə to send
98 ផ្ដ- 1st 33 ផ្ដិត pɗət to pat dry
99 ផ្ត- pt 1st 7 ផ្តិល ptəl copper/silver bowl for water
100 ផ្ទ- pt 2nd 30 ផ្ទះ pteah house
101 ផ្ន- pn 1st 21 ផ្នូរ pnoo tomb
102 ផ្ល- pl 1st 28 ផ្លូវ pləw road, street, path
103 ផ្ស- ps 1st 32 ផ្សែង psaeŋ smoke (n)
104 ផ្អ- 1st 31 ផ្អែម pʔaem to be sweet
105 ព្ក- pk 1st 1 ព្កុល pkol k.o. large tree
106 ព្ន- pn 2nd 8 ព្នង pnɔɔŋ Bunong ethnic group
107 ព្យ- pj 2nd 34 ព្យុះ pjuh storm
108 ព្រ- pr 2nd 201 ព្រាន prien hunter
109 ព្ល- pl 2nd 3 ព្លុក pluk ceremonial leader in cremating a corpse
110 ព្អ- 2nd 1 ព្អឹះ pʔɨh tightly
111 ភ្ង- 2nd 10 ភ្ងូត pŋuut to give someone a bath
112 ភ្ជ- pc 2nd 12 ភ្ជិត pcɨt to seal
113 ភ្ញ- 2nd 7 ភ្ញាក់ pɲeak to wake up
114 ភ្ន- pn 2nd 28 ភ្នែក pnɛɛk eyes
115 ភ្ម- pm 2nd 1 ភ្មាស pmieh entirely
116 ភ្ល- pl 2nd 49 ភ្លុក pluk tusk
117 ភ្រ- pr 2nd 4 ភ្រូន pruun stomach worm
118 ភ្ស- ps 2nd 1 ភ្សាំ psoam to accustom to
119 ម្ក- mk 1st 2 ម្កាក់ mkak k.o. fruit tree
120 ម្ខ- mkʰ 1st 1 ម្ខាង mkʰaaŋ one side
121 ម្ង- 2nd 2 ម្ង៉ៃ mŋaj one day
122 ម្ច- mc 1st 6 ម្ចាស់ mcah owner
123 ម្ជ- mc 2nd 2 ម្ជុល mcul needle
124 ម្ញ- 2nd 2 ម្ញ៉ែម្ញ៉ mɲae.mɲɑɑ to be always making excuses
125 ម្ដ- 1st 5 ម្ដង mɗɑɑŋ one time
126 ម្ន- mn 2nd 14 ម្នាក់ mneak one person
127 ម្ភ- mpʰ 2nd 1 ម្ភៃ mpʰej twenty
128 ម្យ- mj 2nd 6 ម្យ៉ាង mjaaŋ one way
129 ម្រ- mr 2nd 23 ម្រាម mriem finger/toe
130 ម្ល- ml 2nd 10 ម្លប់ mlup shade
131 ម្ស- ms 1st 7 ម្សៅ msaw flour
132 ម្ហ- mh 1st 10 ម្ហូប mhoop food
133 ម្អ- 1st 2 ម្អម mʔɑɑm k.o. aromatic grass
134 យ្- 0
135 រ្- 0
136 ល្ក- lk 1st 2 ល្កម lkɑɑm to be very tender
137 ល្ខ- lkʰ 1st 1 ល្ខោន lkʰaon theatrical performance
138 ល្គ- lk 2nd 6 ល្គឹក lkɨk if only
139 ល្ង- 2nd 10 ល្ង lŋɔɔ sesame
140 ល្ប- 1st 29 ល្បាយ lɓaaj solution, blended
141 ល្ព- lp 2nd 2 ល្ពៅ lpɨw pumpkin
142 ល្ម- lm 2nd 23 ល្មុត lmut sapodilla tree
143 ល្យ- lj 2nd 1 ល្យំ ljum to be dangling
144 ល្វ- lw 2nd 16 ល្វីង lwiiŋ bitter
145 ល្ហ- lh 1st 16 ល្ហុង lhuŋ papaya
146 ល្អ- 1st 21 ល្អាង lʔaaŋ cave
147 វ្ហ- wh 1st 3 វ្ហី whəj in a daze
148 ស្ក- sk 1st 29 ស្ករ skɑɑ sugar
149 ស្គ- sk 2nd 19 ស្គរ skɔɔ drum
150 ស្ង- 1st 12 ស្ងាប sŋaap to yawn
151 ស្ញ- 1st 15 ស្ញើប sɲaəp to shudder (in fear)
152 ស្ដ- 1st 29 ស្ដាំ sɗam right (hand side)
153 ស្ត- st 1st 16 ស្តុក stok stock
154 ស្ថ- stʰ 1st 14 ស្ថាន stʰaan place
155 ស្ទ- st 2nd 54 ស្ទង stɔɔŋ bunch (of banana)
156 ស្ន- sn 1st 60 ស្នា snaa crossbow
157 ស្ប- 1st 33 ស្បូវ sɓəw k.o. coarse grass
158 ស្ព- sp 2nd 28 ស្ពាន spien bridge
159 ស្ម- sm 1st 69 ស្មា smaa shoulder
160 ស្រ- sr 1st 318 ស្រះ srah pond
161 ស្ល- sl 1st 74 ស្លឹក slək leaf
162 ស្វ- sw 1st 67 ស្វាយ swaaj mango
163 ស្អ- 1st 25 ស្អែក sʔaek tomorrow
164 ហ្ន- n 1st 6 ហ្នឹង nəŋ this, these
165 ហ្ម- m 1st 14 ហ្មត់ mɑt all gone
166 ហ្រ- r 1st 3 ហ្រស្វ hrah.swaʔ dwarf
167 ហ្ល- l 1st 10 ហ្លួង luəŋ king
168 ហ្វ- w 1st 10 ហ្វ៊‌ីល fiil film
169 ឡ្- 0
170 អ្ង- ʔŋ 1st 1 អ្ងែង ʔŋaeŋ 2P (to younger girls)
171 អ្ន- ʔn 1st 15 អ្នក neak 2P
172 អ្វ- ʔw 1st 2 អ្វី ʔwəj what
173 អ្ហ- ʔh 1st 17 អ្ហែង ʔhaeŋ 2P (to younger people)

B. Word-Medial Consonant Clusters

The table below show a list of all possible consonant clusters occurs in word-medial position. To do so, a RegEx pattern is used to filter for them. The pattern is “[^្]C្C[^្]” which guarantees that there is no preceding or trailing subscript. Figures and examples are obtained from an online dictionary (http://dictionary.tovnah.com/reg-search). For example, [^្]ក្.[^្] matches any word containing ក and ្ which is followed by a consonant which has no subscript after it.

In order not to show the same sequences of the consonant clusters, only the ones whose sequences are different from the ones occurs in the word-initial position are presented.

-CS- Instances Example
-ក្.-
-ក្ក-
-ក្ខ-
-ក្យ-

-ក្ដ-[22]
-ក្ត-
-ក្ស-

430
46
117
4
27
9
61
តក្កមា /tak.kaʔ.maa/ ‘to be stupefied’
អក្ខរា /ʔak.kʰaʔ.raa/ ‘letters’
ពាក្យកំព្រា /piek.kɑm.prie/ ‘a single character word’
ភក្ដី /pʰeaʔ.kɗəj/ ‘loyalty’
អបសក្តិ /ʔap.sak/ ‘powerless’
ទក្សិណ /teak.sən/ ‘the south’
-ខ្.- 68 (All are similar to word-initial consonant clusters.)
-គ្.-
-គ្គ-
-គ្យ-
-គ្ឃ-
-គ្ល-
157
50
2
2
2
បុគ្គល /ɓok.kul/ ‘individual’
យោគ្យភាព /joo.kjeaʔ.pʰiep/ ‘intelligence’
ឧគ្ឃោសនា /ʔuk.kʰoo.saʔ.naa/ ‘public address’
អង់គ្លេស /ʔaŋ.kleh/ ‘English’
-ឃ្.-
-ឃ្រ-
16
1
វិឃ្រភាព /wiʔ.kreaʔ.pʰiep/ ‘destruction’
-ង្.-
-ង្ក-
-ង្ខ-
-ង្គ-
-ង្ឃ-
-ង្ប-
-ង្រ-
-ង្វ-
-ង្ស-
-ង្ហ-
-ង្អ-
640
172
33
153
27
1
59
58
30
62
46
កង្កែប /kɑŋ.kaep/ ‘frog’
ដង្ខៅ /ɗɑŋ.kʰaw/ ‘head of a commercial house’
ដង្គុំ /ɗɑŋ.kum/ ‘bunched together (of trees)’
សង្ឃឹម /sɑŋ.kʰɨm/ ‘hope (n)’
បង្បោយ /ɓɑŋ.ɓaoj/ ‘to swing the arms while walking’
ពង្រីក /puŋ.riik/ ‘to magnify’
កង្វល់ /kɑŋ.wɑl/ ‘worry (n)’
សង្ស័យ /sɑŋ.saj/ ‘to suspect’
កង្ហារ /kɑŋ.haa/ ‘fan’
បង្អែម /ɓɑŋ.ʔaem/ ‘dessert’
-ច្.-
-ច្ច-
-ច្ឆ-
167
53
88
បច្ច័យ /pac.caj/ ‘suffix’
មច្ឆា /mac.cʰaa/ ‘fish’
-ឆ្.- 28 (All are similar to the word-initial consonant cluster.)
-ជ្.-
-ជ្ញ-
-ជ្ជ-
-ជ្ឈ-
-ជ្យ-
120
6
64
23
2
ប្ដេជ្ញា /pɗac.ɲaa/ ‘to commit’
ពាណិជ្ជកម្ម /pie.nɨc.ceaʔ.kam/ ‘commercial’
មជ្ឈដ្ឋាន /mac.cʰeaʔ.tʰaan/ ‘habitat’
រាជ្យាង្គ /riec.jieŋ.keaʔ/ ‘royal administration’
-ឈ្.- 6 (All are similar to the word-initial consonant cluster.)
-ញ្.-
-ញ្ច-
-ញ្ឆ-
-ញ្ជ-
-ញ្ឈ-
-ញ្ញ-
-ញ្ហ-
320
71
33
91
9
113
1
កញ្ចក់ /kaɲ.cɑk/ ‘mirror, glass’
កញ្ឆា /kaɲ.cʰaa/ ‘marijuana’
កញ្ជើ /kaɲ.cəə/ ‘basket’
កញ្ឈូស /kaɲ.cʰuuh/ ‘to scrape the foot on the ground’
កញ្ញា /kaɲ.ɲaa/ ‘young woman’
បញ្ហា /paɲ.haa/ ‘problem’
-ដ្.-
-ដ្ឋ-
-ដ្ដ-
114
109
5
សេដ្ឋី /see.tʰəj/ ‘wealthy man’
អដ្ដប្រតិភូ /ʔat.ɗaʔ.praʔ.teʔ.pʰuu/ ‘advocate’
-ឋ្.- 0 (no result found)
-ឌ្.-
-ឌ្ឍ-
-ឌ្ឌ-
18
16
2
វុឌ្ឍិ /wut.tʰiʔ/ ‘prosperity’
លេឌ្ឌុបាត /leet.duʔ.baat/ ‘the fall of a clump of earth’
-ឍ្.-
-ឍ្យ-
1
1
អាឍ្យចរ /ʔaa.tjeaʔ.cɑɑ/ ‘to have been formerly wealthy’
-ណ្.-
-ណ្ដ-
-ណ្ឋ-
-ណ្ឌ-
-ណ្ឍ-
-ណ្ណ-
-ណ្យ-
-ណ្ហ-
138
7
75
1
11
32
1
កណ្ដឹង /kɑn.ɗəŋ/ ‘bell’
សណ្ឋាគារ /san.tʰaa.kie/ ‘hotel’
ឥណ្ឌូ /ʔən.duu/ ‘Hindu’
ឍុណ្ឍិ /tʰun.tʰiʔ/ ‘Genesha (son of Shiva)’
បណ្ណាគារ /pan.naa.kie/ ‘bookstore’
បុណ្យទិន /ɓon.jeaʔ.tɨn/ ‘holiday’
តណ្ហា /tɑn.haa/ ‘desire, passion’
-ត្.-

-ត្ត-[23]
-ត្ដ​-
-ត្យ-
-ត្ថ-
-ត្ស-
-ត្វ-

479
236
7
17
82
4
4
កិត្តិយស /kət.teʔ.juəh/ ‘reputation’
ឧត្ដម /ʔut.ɗɑm/ ‘excellent’
សត្យា /saʔ.tjaa/ ‘true words’
វត្ថុ /woat.tʰoʔ/ ‘thing’
ទសវត្សរ៍ /tuə.saʔ.woat/ ‘ a period of ten years’
ចត្វា /cat.twaa/ ‘four’
-ថ្.-
-ថ្យ-
25
1
មិថ្យាចារ /miʔ.tjaa.caa/ ‘adultery’
-ទ្.-
-ទ្ធ-
-ទ្ទ-
235
102
40
ពុទ្ធិ /put.tʰiʔ/ ‘intellect’
សទ្ទជាតិ /sat.teaʔ.ciet/ ‘sound, voice, speech’
-ធ្.- 30 (All are similar to the word-initial consonant cluster.)
-ន្.-
-ន្ត-
-ន្ថ-
-ន្ទ-
-ន្ធ-
-ន្ន-
-ន្ម-
-ន្យ-
-ន្រ-
-ន្ល-
-ន្ស-
735
141
25
224
93
34
7
21
1
141
34
ខន្តី /kʰan.təj/ ‘tolerance’
កន្ថោរ /kɑn.tʰao/ ‘spittoon’
កន្ទុយ /kɑn.tuj/ ‘tail’
កន្ធាយ /kɑn.tʰiej/ ‘soft-shelled turtle’
ទិន្នន័យ /tɨn.neaʔ.nej/ ‘data’
ប៉ុន្មាន /pon.maan/ ‘how much/many’
កន្យា /kɑn.jaa/ ‘unmarried girl’
ពន្រាយ /pun.riej/ ‘bright and shining with different colors’
កន្លាត /kɑn.laat/ ‘cockroach’
ទន្សាយ /tun.saaj/ ‘rabbit’
-ប្.-
-ប្ប-
-ប្ផ-
-ប្ត-
-ប្ដ-
-ប្ស-
-ប្យ-
319
116
9
8
2
4
1
កប្បាស /kap.ɓah/ ‘cotton’
បុប្ផា /ɓop.pʰaa/ ‘flower’
ប្រញប្តិ /prɑ.ɲap/ ‘prohibition’
ប្រាប្ដាភិលាភ /praa.pɗaa.pʰiʔ.liep/ ‘full of luck’
អប្សរា /ʔap.saʔ.raa/ ‘Apsara’
អឺរ៉ូប្យាំង /ʔəə.roo.pjaŋ/ ‘european’
-ផ្.- 76 (All are similar to the word-initial consonant cluster.)
-ព្.-
-ព្វ-
-ព្ភ-
-ព្ត-
225
53
25
1
និព្វាន /nɨp.pien/ ‘Nirvana’
ទុព្ភាសិត /tuʔ.pʰie.sət/ ‘bad language/advice’
ទេព្តា /tep.ɗaa/ ‘angel’
-ភ្.-
-ភ្យ-
63
3
អភ្យាគម /ʔaʔ.pjie.kum/ ‘war’
-ម្.-
-ម្គ-
-ម្ប-
-ម្ផ-
-ម្ព-
-ម្ភ-
-ម្ម-
660
5
81
9
83
61
66
សម្គុល /sɑm.kul/ ‘to be swollen and ugly’
តម្បៀត /tɑm.ɓiət/ ‘tweezers’
សម្ផស្ស /sɑm.pʰoah/ ‘perception, sensation’
សម្ពៀត /sɑm.piət/ ‘school bag’
អារម្មណ៍ /ʔaa.rɑm/ ‘feeling’
រម្លើង /rom.ləəŋ/ ‘to uproot’
-យ្.-
-យ្យ-
-យ្ហ-
51
50
1
ទេយ្យទាន /tej.jeaʔ.tien/ ‘gift given to a Buddhist monk’
គុយ្ហៈ /kuj.hak/ ‘genitals’
-រ្.-
-រ្ត-
-រ្ម-
-រ្យ-
-រ្ថ-
-រ្ព-
-រ្ភ-
-រ្ស-
18
5
2
7
1
1
1
1
កេរ្តិ៍ /kee/ ‘honor’
វរ្ម័ន /weaʔ.reaʔ.man/ ‘Varman (title used by Khmer kings)’
សូរ្យកាន្ត /soo.reaʔ.kaan/ ‘k.o. mythical stone’
អន្យតិរ្ថិយ /ʔɑn.ɗee.raʔ.tʰəj/ ‘non-Buddhist’
អាសិរ្ពិស /ʔaa.see.reaʔ.pɨh/ ‘poisonous snake’
ទុរ្ភិក្ស /tuʔ.reaʔ.pʰɨk/ ‘famine’
ពរ្សឺឡែន /poa.səə.laen/ ‘porcelain’
-ល្.-
-ល្ល-
68
26
ដុល្លារ /ɗol.laa/ ‘dollar’
-វ្.-
-វ្ហ-
4
4
ជិវ្ហា /ciʔ.whaa/ ‘tongue’
-ស្.-
-ស្ស-
-ស្ណ-
-ស្ត-
-ស្ច-
442
110
5
41
7
និស្សិត /niʔ.sət/ ‘student’
ពិស្ណុ​លោក /piʔ.snuʔ.look/ ‘rank given to provincial chieves’
ប៉ុស្តិ៍ /poh/ ‘post’
អស្ចារ្យ /ʔɑh.caa/ ‘wonderful’
-ហ្.-
-ហ្ឫ-
40
2
សុហ្ឫទ /soʔ.rɨt/ ‘(close) friend’
-ឡ្.-
-ឡ្ហ-
12
12
ទឡ្ហីករណ៍ /teaʔ.lhəj.kɑɑ/ ‘proof’
-អ្.- 2 (All are similar to the word-initial consonant cluster.)

C. Word-Final Consonant Clusters

Words ending in consonant clusters are Pali/Sanskrit loanwords.

-CS Instances Example
-ក្ក 6 សក្ក
-ក្ខ 37 ទុក្ខ
-ក្ដ 1 បុក​ល័ក្ដ
-ក្ត 4 ល័ក្ត
-ក្យ 8 ពាក្យ
-ក្រ 6 សុក្រ
-ក្ស 39 ប្រត្យក្ស
-ខ្យ 1 ប្រមុខ្យ
-គ្គ 20 វគ្គ
-គ្ឃ 3 អនគ្ឃ
-គ្ធ 2 ពិទ័គ្ធ
-គ្យ 3 អារោគ្យ
-គ្រ 2 ស្ម័គ្រ
-ង្ក 6 បល្ល័ង្ក
-ង្គ 50 អង្គ
-ង្ឃ 4 សង្ឃ
-ង្រ 1 ពង្រ
-ង្ស 19 ហង្ស
-ង្ហ 1 សិង្ហ
-ច្ច 32 កិច្ច
-ច្ឆ 2 អប្បិច្ឆ
-ជ្ជ 8 ពាណិជ្ជ
-ជ្ឈ 1 សជ្ឈ
-ជ្ញ 6 ប្រាជ្ញ
-ជ្យ 9 រាជ្យ
-ជ្រ 1 ពេជ្រ
-ញ្ច 4 ប្រវ័ញ្ច
-ញ្ញ 15 សាមញ្ញ
-ដ្ដ 5 វដ្ដ
-ដ្ឋ 34 រដ្ឋ
-ឌ្គ 1 ខ័ឌ្គ
-ឌ្ឍ 3 អភិវឌ្ឍ
-ណ្ដ 4 វណ្ដ
-ណ្ឌ 37 ខណ្ឌ
-ណ្ណ 16 វណ្ណ
-ណ្យ 7 បុណ្យ
-ណ្ហ 3 ឧណ្ហ
-ត្ត 93 មិត្ត
-ត្ថ 10 បរមត្ថ
-ត្ន 5 នារី​រត្ន
-ត្ម 1 អាត្ម
-ត្យ 19 ពិនិត្យ
-ត្រ 72 ក្សត្រ
-ត្វ 15 សត្វ
-ត្ស 2 បៀវត្ស
-ថ្ម 1 ពពូលថ្ម
-ទ្ទ 4 សទ្ទ
-ទ្ធ 40 ប្រយុទ្ធ
-ទ្ម 2 រចនាប័ទ្ម
-ទ្យ 13 ពេទ្យ
-ទ្រ 4 សូទ្រ
-ធ្យ 1 សូធ្យ
-ន្ត 62 យន្ត
-ន្ថ 2 និគ្រន្ថ
-ន្ទ 37 សុរិន្ទ
-ន្ធ 33 ប្រពន្ធ
-ន្ន 13 បច្ចុប្បន្ន
-ន្ម 2 ជន្ម
-ន្យ 5 សូន្យ
-ន្ល 1 ជន្ល
-ប្ត 5 ប្រញប្ត
-ប្ន 1 យល់សប្ន
-ប្ប 11 កប្ប
-ព្ទ 20 ទូរសព្ទ
-ព្ធ 3 ប្រារព្ធ
-ព្ភ 1 អព្ភ
-ព្យ 7 ទ្រព្យ
-ព្វ 14 សព្វ
-ភ្រ 1 អ័ភ្រ
-ម្ព 6 ពុម្ព
-ម្ភ 9 បារម្ភ
-ម្ម 103 កសិកម្ម
-ម្យ 6 មនោរម្យ
-ម្រ 4 កម្រ
-ម្ល 3 សម្ល
-ម្ហ 2 លម្ហ
-ម្អ 1 លម្អ
-យ្យ 19 គណនេយ្យ
-រ្យ 22 អាចារ្យ
-រ្ស 1 សិរ្ស
-ល្ក 1 សុល្ក
-ល្ង 1 បំណែកល្ង
-ល្ប 3 កល្ប
-ល្ម 1 ពាយុគុល្ម
-ល្យ 4 គរុកោសល្យ
-ល្ល 6 កោសល្ល
-ស្ក 1 សិលោរស្ក
-ស្ដ 2 ស្ប័ស្ដ
-ស្ឋ 4 ជេស្ឋ
-ស្ថ 2 គ្រហស្ថ
-ស្ន 1 ប្រតិស្ន
-ស្ប 2 អង្គាបុស្ប
-ស្ម 1 ស្លេស្ម
-ស្ស 24 សិស្ស
-ហ្ម 2 ព្រហ្ម
-ឡ្ហ 2 អវិរូឡ្ហ

D. Three Consonant Clusters

The consonant clusters with two subscripts occurs mostly in word-medial position (i.e. 170 instances found in the Khmer-Khmer Dictionary), then less frequent in the word-final position (i.e. 50 instances), and it rarely occurs in the initial position (i.e. only 4 instances found). There is no instance of when there are three subscripts follows a consonant. See the table below. Instances of three subscripts following a consonant does not exist in the KKD.

Word-Initial
^.្.្.
ស្ត្រ-

ហ្វ្រ-[24]

4
2
2
ស្ត្រី /strəj/ ‘woman’
ហ្វ្រ័ង /fraŋ/ ‘brake’
Word-Medial
-.្.្-
-ក្ស្ម-
-ង្ក្រ-
-ង្ខ្យ-
-ង្គ្ល-
-ង្គ្រ-
-ង្ឃ្រ-
-ញ្ច្រ-
-ញ្ជ្រ-
-ដ្ឋ្យ-
-ន្ត្រ-
-ន្ទ្រ-
-ន្ធ្យ-
-ស្គ្វ-
-ហ្វ្រ-
170
3
14
2
1
21
1
10
21
1
33
43
1
1
3
លក្ស្មី /leak.sməj/ ‘prosperous, wealthy’
បង្ក្រាប /ɓɑŋ.kraap/ ‘to defeat’
សង្ខ្យា /sɑŋ.kjaa/ ‘counting’
អង្គ្លេស /ʔɑŋ.kleh/ ‘English’
ចង្គ្រុង /cɑŋ.kruŋ/ ‘wide open’
សង្ឃ្រាជ /sɑŋ.kriec/ ‘head master (Buddhist monk)
ចិញ្ច្រាំ /cəɲ.cram/ ‘to chop’
បញ្ជ្រួស /ɓaɲ.cruəh/ ‘to purposely avoid (of travelling)’
បិដ្ឋ្យដ្ឋិក​សត្វ /pət.tjat.tʰeʔ.kaʔ.sat/ ‘vertebrate’
កន្ត្រោង /kɑn.traoŋ/ ‘to jump up (to get something)’
កន្ទ្រួក /kɑn.truək/ ‘to be worn out’
សន្ធ្យា /sɑn.tjie/ ‘twilight’
ប៊ិស្គ្វីត៍ /ɓih.kwii/ ‘biscuit’
អាហ្រ្វិក /ʔaa.frik/ ‘Africa’
Word-Final
.្.្.$
ក្ត្រ-
ន្ទ្រ-
ស្ត្រ-
50
1
13
36
ភក្ត្រ /peak/ ‘face’
នរេន្រ្ទ /nɔɔ.reen/ ‘reigning king
រាស្រ្ត /rieh/ ‘citizen’

E. Initial Consonant Clusters in Both Series

The following table is an extended list of Initial Consonant Clusters. The clusters modified by Consonant Shifters are included next to their respective pair, however they are not officially used or recognized by the public yet. They are there as a reminder that they might be needed in the future, especially with transliterated words.

No attempt has been made to create a list of the more complex initial consonant clusters (i.e. three consonant clusters).

1st Series 2nd Series Phoneme 1st Series 2nd Series Phoneme
ក្ង គ្ង ផ្ន ភ្ន pn
ក្ដ ក្ឌ / គ្ឌ? ផ្ល ភ្ល pl
ក្ត ក្ទ kt ផ្ស ភ្ស ps
ក្ន គ្ន kn ផ្អ ផ្អ៊ / ភ្អ៊?
ក្ប ក្ប៊ / គ្ប៊? ព្ក ព្គ pk
ក្ម គ្ម km ផ្ន ព្ន pn
ក្រ គ្រ kr ផ្យ ព្យ pj
ក្ល គ្ល kl ប្រ ព្រ pr
ក្វ គ្វ kw ផ្ល ព្ល pl
ក្ស គ្ស ks ផ្អ ព្អ
ក្អ ក្អ៊ / គ្អ៊? ផ្ង ភ្ង
ខ្ច ខ្ជ kc ផ្ច ភ្ជ pc
ខ្ញ ឃ្ញ ផ្ញ ភ្ញ
ខ្ត ខ្ទ kt ផ្ន ភ្ន pn
ខ្ន ឃ្ន kn ផ្ម ភ្ម pm
ឃ្ព ខ្ព kp ផ្រ ភ្រ pr
ខ្ម ឃ្ម km ផ្ល ភ្ល pl
ខ្យ ឃ្យ kj ផ្ស ភ្ស ps
ខ្ល ឃ្ល kl ម្ក ម្គ mk
ខ្វ ឃ្វ kw ម្ខ ម្ឃ mkʰ
ខ្ស ខ្ស៊ / ឃ្ស? ks ម្ង៉ ម្ង
ច្ន ឈ្ន cn ម្ច ម្ជ mc
ច្ប ច្ប៊ / ជ្ប៊? ម្ញ៉ ម្ញ
ច្យ ជ្យ cj ម្ដ ម្ឌ
ច្រ ជ្រ cr ម្ន៉ ម្ន mn
ឆ្ក ឆ្គ ck ម្ផ ម្ភ mpʰ
ឆ្ង ឈ្ង ម្យ៉ ម្យ mj
ឆ្ដ ឆ្ឌ ម្រ៉ ម្រ mr
ឆ្ន ឈ្ន cn ម្ល៉ ម្ល ml
ឆ្ប៉ ឆ្ព cp ម្ស ម្ស៊ ms
ឆ្ម ឈ្ម cm ម្ហ ម្ហ៊ mh
ឆ្ល ឈ្ល cl ម្អ ម្អ៊
ឆ្វ ឈ្វ cw ល្ក ល្គ lk
ឆ្អ ឆ្អ៊ / ឈ្អ៊? ល្ខ ល្ឃ lkʰ
ច្វ ជ្វ cw ល្ង៉ ល្ង
ណ្ហ ណ្ហ៊ nh ល្ប ល្ប៊
ត្ង ទ្ង ល្ប៉ ល្ព lp
ត្ន ទ្ន tn ល្ម៉ ល្ម lm
ត្ប ត្ប៊ / ទ្ប៊? ល្យ៉ ល្យ lj
ត្ម ទ្ម tm ល្វ៉ ល្វ lw
ត្រ ទ្រ tr ល្ហ ល្ហ៊ lh
ត្ល ទ្ល tl ល្អ ល្អ៊
ត្វ ទ្វ tw វ្ហ វ្ហ៊ wh
ត្អ ត្អ៊ / ទ្អ៊? ស្ក ស្គ sk
ថ្ក ថ្គ tk ស្ង ស្ង៊
ថ្ង ធ្ង ស្ញ ស្ញ៊
ថ្ដ ធ្ឌ ថ្ឌ ស្ដ ស្ឌ
ថ្ន ធ្ន tn ស្ត ស្ទ st
ថ្ប៉ ថ្ព tp ស្ថ ស្ធ stʰ
ថ្ម ធ្ម tm ស្ន ស្ន៊ sn
ថ្ល ធ្ល tl ស្ប ស្ប៊
ថ្វ ធ្វ tw ស្ប៉ ស្ព sp
ត្យ ទ្យ tj ស្ម ស្ម៊ sm
ថ្យ ធ្យ tj ស្រ ស្រ៊ sr
ន្រ៉ / ណ្រ៉ ន្រ nr ស្ល ស្ល៊ sl
ន្អ ន្អ៊ ស្វ ស្វ៊ sw
ប្ដ ប្ឌ ស្អ ស្អ៊
ប្រ ព្រ pr ហ្ន ហ្ន៊ n
ប្ល ព្ល pl ហ្ម ហ្ម៊ m
ប្អ ប្អ៊ ហ្រ ហ្រ៊ r
ផ្ក ផ្គ pk ហ្ល ហ្ល៊ l
ផ្ង ភ្ង ហ្វ ហ្វ៊ w
ផ្ច ភ្ជ pc អ្ង អ្ង៊ ʔŋ
ផ្ញ ភ្ញ អ្ន អ្ន៊ ʔn
ផ្ដ ភ្ឌ អ្វ អ្វ៊ ʔw
ផ្ត ផ្ទ pt អ្ហ អ្ហ៊ ʔh

F. Orthographic comparison charts

G. Khmer signs distribution chart

Notes

[1] Mnong is usually referred to as “Bunong” in Cambodia.

[2] Kuay is usually referred to as “Kuy” or “Kui” in Cambodia.

[3] Headley (2014ːx) uses /ɯ/.

[4] Headley (2014ːx) and Huffman (1970ː8-9) uses /ei/.

[5]Huffman (1970ː8-9) uses /əɨ/

[6] Headley (2014ːx) and Huffman (1970ː8-9) uses /ou/.

[7]There is still a discussion over whether អ is a consonant or a vowel. It is widely considered as a consonant though.

[8] When there is no final orthographic consonant, ិ is realized as /eʔ/ (Huffman 1970ː26)

[9] ឲ្យ is discouraged in formal writing.

[10] ZWNJ is used to prevent the default rendering from happening.

[11] ះ (Reahmuk) in many way behaves like a vowel, but since the Unicode has it as an After Diacritic, no other diacritic should be placed after it, or it would not be rendered correctly (i.e. ណ ះ ៎ = ណះ៎).

[12] The Sanskrit name of these months are adopted from https://en.wikipedia.org/wiki/Month#Hindu_calendar.

[13] Below Vowels are: ុ ូ ួ.

[14] The vowel can be any of these: ា, ោ and ៅ.

[15] This is only found in documents published for the fifth edition of the Chuon Nath dictionary, i.e. បាយ (ប ា យ) ‘rice’ in present day Khmer is written as បា្យ (ប ា ្យ) in Khmer-French dictionary (Bernard 1902).

[16] The paper can be accessed at: [http://www.panl10n.net/english/final%20reports/pdf%20files/Cambodia/CAM01.pdf](http://www.panl10n.net/english/final%20reports/pdf%20files/Cambodia/CAM01.pdf)

[17] The alphabet book only includes the first series of the vowels.

[18] Chrome does not render this sequence correctly, it should be shown up like the other two in the table.

[19] This sign should be added to the Unicode character inventory.

[20] Viriam is no longer used due to readability issue (ibid:16).

[21] RegEx is a shorthand for [Regular Expression](https://en.wikipedia.org/wiki/Regular_expression).

[22] -ក្ដ- and -ក្ត- look identical, but they are different.

[23] -ត្ត- and -ត្ដ​- looks identical, but they are different.

[24] The data in the Khmer-Khmer online dictionary has different character sequences. Instead of ហ្វ្រ (ហ ្វ ្រ), its sequence is ហ្រ្វ (ហ ្រ ្វ). The one used in this documentation is ហ្វ្រ (ហ ្វ ្រ).

You can’t perform that action at this time.