A Tamil romanization scheme for casual microblogging and frequent Web use.
New features in brief:
- ழ் can be written as 1 or 2
- long vowels are depicted with colons (a: e: i: o: u:)
- retroflex consonants are usually written with a period before (.t .d .l .n etc)
- once accustomed to the scheme, rapid and phonetically-accurate reading of Tamil text in Latin characters
tamil script | IPA | ITRANS | ISO | natchat |
---|---|---|---|---|
மனிதப் பிறவியினர் சகலரும் சுதந்திரமாகவே பிறக்கின்றனர் | mənid̪əp‿piriʋijinər səgələrum sud̪ən̪d̪irəmaːgəʋeː pirəkːin̺d̺ranər | ma^nitap piRaviyi^nar chakalarum chutantiramAkave piRakki^nRa^nar | maṉitap piṟaviyiṉar cakalarum cutantiramākavē piṟakkiṉṟaṉar | manidha piraviyinar sagalarum sudhandhirama:gave: pirakkindranar |
ஏழை கிழவன் வாழைப் பழத் தோல் மேல் சருசருக்கி வழுவழுக்கி கீழே விழுந்தான். | ʲeːɻəj kɪɻəʋən ʋɑːɻəjp pəɻət̪ t̪oːl meːl səɾʉsəɾʉkkɪˑ ʋəɻʉʋəɻʉkkɪˑ kiːɻeˑ ʋɪɻʉn̪d̪ɑːn. | Ezhai kizhavan vAzhaippazhath thOl mEl charucharukki vazhuvazhukki kIzhE vizhundhAn. | ēḻai kiḻavaṉ vāḻaip paḻat tōl mēl carucarukki vaḻuvaḻukki kīḻē viḻuntāṉ. | e:2ai ki2avan va:2aip pa2ath tho:l me:l sarusarukki va2uva2ukki ki:2e: vi2undha:n. |
அவளும் இவளும் அவல் அளக்காவிட்டால், எவள் அவலளப்பாள் ? | əʋəɭʉm ʲɪʋəɭʉm əʋəl əɭəkkɑːʋɪʈʈɑːl , ʲɛʋəɭ əʋələɭəppɑːɭ ? | avaLum ivaLum aval aLakkAviTTAl, evaL avalaLappAL ? | avaḷum ivaḷum aval aḷakkāviṭṭāl, evaḷ avalaḷappāḷ ? | ava.lum iva.lum aval a.lakka:vi.tta:l, eva.l avala.lappa:.l? |
கொக்கு நெட்ட கொக்கு. நெட்ட கொக்கு இட்ட முட்ட, கட்ட முட்ட. | kokkʉ nɛʈʈə kokkʉ . nɛʈʈə kokkʉ ʲɪʈʈə mʊʈʈə , kəʈʈə mʊʈʈə . | kokku neTTa kokku. neTTa kokku iTTa muTTa, kaTTa muTTa. | kokku neṭṭa kokku. neṭṭa kokku iṭṭa muṭṭa, kaṭṭa muṭṭa. | kokku netta kokku. netta kokku itta mutta, katta mutta. |
குலை குலையாய் வாழைப்பழம், மழையில் அழுகி கீழே விழுந்தது. | kʊləj kʊləjjɑːj ʋɑːɻəjppəɻəm , məɻəjjɪl əɻʉgɪˑ kiːɻeˑ ʋɪɻʉn̪d̪əd̪ʉ . | kulai kulaiyAy vAzhaippazham, mazhaiyil azhuki kIzhE vizhuntatu. | kulai kulaiyāy vāḻaippaḻam, maḻaiyil aḻuki kīḻē viḻuntatu. | kulai kulaiya:y va:2aippa2am, ma2aiyil a2ugi ki:2e: vi2undhadhu. |
tamil vowel | natchat equivalent | IPA |
---|---|---|
அ | a | ä |
ஆ | a: | äː |
இ | i | i |
ஈ | i: | iː |
உ | u | u |
ஊ | u: | uː |
எ | e | e |
ஏ | e: | eː |
ஐ | ai | aɪ̯ |
ஒ | o | o |
ஓ | o: | oː |
ஔ | au | aʊ̯ |
ஃ | kh or x 1 | archaic, g ~ x ~ ɣ |
(nasalization, spoken only) 2 | - (hyphen following vowel) | nasal vowel ([ãː], [õː], etc.) |
- This letter is not used in conversational Tamil, it is only included for completeness. This same representation may be used for intervocalic க, which in spoken Tamil shifts from [k] to [g ~ x ~ ɣ].
- Some may wonder why a spoken-only feature like nasalization is included in this romanization. This is discussed further below, but the short answer is that it preserves a distinction important to spoken Tamil, e.g. அவ(ன்) /ʌʋə̃/ vs. அவ(ள்) /ʌʋə/
- In spoken Tamil, word-final உ often undergoes a vowel shift from [u] to [ɯ ~ ʉ ~ ɨ], this phenomenon is called குற்றியலுகரம் (kut^riyalugaram). This phenomenon is semantically unimportant and is inherently understood by fluent speakers, and therefore has been left unrepresented. However, u/ can be used as a representation if need be.
Doubled retroflex consonants may be written with only one preceding period for simplicity, for example ku.tti குட்டி.
tamil consonant | natchat transliteration (with 'phonetic' alternatives) | IPA |
---|---|---|
க் | k (or g) | k, medial g ~ x ~ ɣ |
ங் | ng | ŋ |
ச் | c (or s) | t͡ɕ ~ t͡ʃ, medial s, postnasal dʑ~dʒ |
ஞ் | ~n | ɲ |
ட் | .t (or .d, or t or d 1) | ʈ, medial ɖ~ɽ |
ண் | .n | ɳ |
த் | th (or dh) | t̪, medial d̪ ~ ð |
ந் | n 2) | n̪ |
ப் | p (or b) | p, medial b~β |
ம் | m | m |
ய் | y | j |
ர் | r | ɾ |
ல் | l | l |
வ் | v | ʋ |
ழ் | 1 or 2 or zh 3 | ɻ |
ள் | .l | ɭ |
ற் | ^r (or r 4) | r |
ன் | n | n |
ற்ற | t^r (or tr 5) | tːr |
ன்ற | nd^r (or ndr 5) | ndr |
ஞ்ச | nj 5 | n̠ʲd̠ʒ |
Notes:
- Unvoiced and voiced ட can be alternatively written t and d since there is no equivalent alveolar sound in Tamil, thus no confusion in a Tamil context. However, since they are unique from AmEng "t" and "d", to preserve retroflex representation, they are written here with a period preceding.
- It is generally unnecessary to distinguish ன் from ந், which are allophonic in modern speech and can be distinguished purely grammatically. If necessary, denti-alveolar ந் can be written as _n.
- For representing retroflex approximant ழ், several options are possible depending on what is easiest to read. New romanization 2, having a slight resemblance to capital R, represents the similarity between ழ் and AmEng "r" or Chinese pinyin "r". Another romanization 1 resembles a lowercase "l" and represents its "L-ness", merger with ள் in many speakers, and existing romanization as L in words such as "Tamil" and "Eelam". zh is carried over from legacy transliteration schemes such as ITRANS and may be easier for some readers.
- If it is unnecessary to distinguish ர் from ற் in casual typing, both can be written as r.
- For ease of representing the phonetic change caused by Tamil grammar for digraphs ற்ற and ன்ற, a t or d is placed in between the consonants, respectively. Similarly, for representing the phonetic change in the digraph ஞ்ச, it can simply be written nj rather than ~nc.
Loaned Grantha consonants:
grantha consonant | natchat equivalent | IPA |
---|---|---|
ஜ | j | dʑ |
ஶ் | `s | ɕ |
ஷ் | sh | ʃ ~ ʂ |
ஸ் | s 1 | s |
ஹ் | h | h |
க்ஷ் | ksh | kʂ |
- This technically creates an overlap with word-initial ச், but ஸ் is a marginal grapheme that only occurs in loanwords so I'm not worried about it.
Table of existing Tamil transliteration schemes, focusing primarily on the areas of difference between schemes - long vowels and particular consonants. Hunterian transliteration, while the official romanization of the Government of India, does not account for non-Devanagari scripts such as Tamil.
tamil letter | IPA pronunciation | ITRANS (1991-2013) | ISO 15919 (2001) | National Library romanization | Common web usage (e.g. social media, blogs) |
---|---|---|---|---|---|
ஆ | äː | A | ā, aa | ā | a or aa |
ஈ | iː | I | ī, ii | ī | i or ee |
ஊ | uː | U | ū, uu | ū | u or uu or oo |
ஏ | eː | E | ē, ee | ē | e or ee or E or ae, etc |
ஓ | oː | O | ō, oo | ō | o |
ஃ | archaic, g ~ x ~ ɣ | q, aH | ḵ, _k | ||
க் | k, medial g ~ x ~ ɣ | k, g | k | k, g | k or g |
ங் | ŋ | ~N | ṅ, ;n | n (ṇ) | ng |
ச் | t͡ɕ ~ t͡ʃ, medial s, postnasal dʑ~dʒ | ch | c | c | ch or s or c |
ஞ் | ɲ | ~n | ñ, ~n | ñ | ny or nj or gn |
ட் | ʈ, medial ɖ~ɽ | T/D | ṭ, .t | ṭ | t |
ண் | ɳ | N | ṇ, .n | ṇ | n or nn or N |
த் | t̪, medial d̪ ~ ð | t/th/dh | t | t or th, d, dh, etc | |
ந் | n̪ | n | n | n | n |
ப் | p, medial b~β | p, b | p | p, b | p or b |
ழ் | ɻ | zh, J | ḻ, _l | ẕ | zh or l |
ள் | ɭ | L | ḷ, .l | ḷ | l |
ற் | r | R | ṟ, _r | ṟ | r or rr |
ன் | n | ^n | ṉ, _n | ṉ | n |
ற்ற | tːr | rr or tr | |||
ன்ற | ndr | nr or ndr or ntr | |||
ஞ்ச | n̠ʲd̠ʒ | GY | nj or nch | ||
ஶ் | ɕ | sh | ś, ;s | ś | |
ஷ் | ʂ | Sh | ṣ, .s | ṣ | |
க்ஷ் | kʂ | x | ksh |
Current schemes are non-intuitive to new readers (e.g. ITRANS use of mixed capitalization, 'zh' not producing a z or h sound), or rely upon diacritics which are not easy to access on a standard American/English keyboard and therefore rarely used in casual typing. This creates problems when...
- ... Casual romanizations often overlap or do not distinguish between vowel length, vowel sound, consonant articulation, etc, creating major difficulties in reading casually-typed Tamil across the web. For example, "oo" can be used to mean [o:] or [u:]; "ee" can mean [i:] or [e:]; "l" can be [l] [ɭ] or [ɻ], "t" can denote [t̪] [ʈ] [ɖ] etc.
- Most existing schemes are built first for Devanagari scripts, therefore a four-way distinction of aspiration & voicing. This is unnecessary for Tamil (which has no such distinction) and takes up letters that could be used for Tamil-specific consonants. Hunterian is the scheme used by the Indian government, but is particularly unsuited to Dravidian languages.
- ISO lacks internal logic to some diacritics: for example, a bar below a letter can indicate an alveolar ṉ, retroflex ḻ, or trill ṟ.
- Current schemes do not represent spoken changes that occur due to Tamil grammar rules (e.g. linking [t, d] in ற்ற [tːr], ன்ற [ndr]; ச் [t͡ɕ] to ஞ்ச் [n̠ʲd̠ʒ]) or nasalization (for example between reduced forms of அவ(ன்) /ʌʋə̃/ and அவ(ள்) /ʌʋə/, which might both be casually typed "ava" if nasalization is not preserved)
Therefore, goals of a new romanization scheme:
- intuitive representation of Tamil sounds
- easy to type & read by an American English speaker
- easy to mentally convert into accurate spoken Tamil
- not strict with regards to graphemes, allowing for representation of allophones such as [k, g] and [p, b]
- most importantly, easy enough to use on a regular basis, like a Tamil version of Arabizi
Qualities of this romanization scheme might be as follows:
- English loanwords may be typed with English spelling
- simple way of distinguishing retroflex and alveolar consonants, in particular
- simple and unambiguous way of distinguishing short and long vowels
- no mixed case
- no diacritics that require anything other than an ASCII keyboard
The two main problems are in
- allocating symbols to the L's ல் and ள், the R's ர் and ற், and retroflex approximant ழ் which has charisteristics of both L and R. There are simply not enough similar consonants in the Latin script, necessitating either diacritics or creative solutions.
- distinguishing vowel length in a way that does not lead to confusion about vowel quality.
- Anunaadam's excellent description of modern Tamil phonology and IPA: https://anunaadam.appspot.com/transcription
- Aksharamukha script converter: https://www.aksharamukha.com/converter