Skip to content
/ natchat Public

A Tamil transliteration and romanization scheme for microblogging and Web use

Notifications You must be signed in to change notification settings

n-stl/natchat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 

Repository files navigation

natchat

A Tamil romanization scheme for casual microblogging and frequent Web use.

New features in brief:

  • ழ் can be written as 1 or 2
  • long vowels are depicted with colons (a: e: i: o: u:)
  • retroflex consonants are usually written with a period before (.t .d .l .n etc)
  • once accustomed to the scheme, rapid and phonetically-accurate reading of Tamil text in Latin characters

sample text

tamil script IPA ITRANS ISO natchat
மனிதப் பிறவியினர் சகலரும் சுதந்திரமாகவே பிறக்கின்றனர் mənid̪əp‿piriʋijinər səgələrum sud̪ən̪d̪irəmaːgəʋeː pirəkːin̺d̺ranər ma^nitap piRaviyi^nar chakalarum chutantiramAkave piRakki^nRa^nar maṉitap piṟaviyiṉar cakalarum cutantiramākavē piṟakkiṉṟaṉar manidha piraviyinar sagalarum sudhandhirama:gave: pirakkindranar
ஏழை கிழவன் வாழைப் பழத் தோல் மேல் சருசருக்கி வழுவழுக்கி கீழே விழுந்தான். ʲeːɻəj kɪɻəʋən ʋɑːɻəjp pəɻət̪ t̪oːl meːl səɾʉsəɾʉkkɪˑ ʋəɻʉʋəɻʉkkɪˑ kiːɻeˑ ʋɪɻʉn̪d̪ɑːn. Ezhai kizhavan vAzhaippazhath thOl mEl charucharukki vazhuvazhukki kIzhE vizhundhAn. ēḻai kiḻavaṉ vāḻaip paḻat tōl mēl carucarukki vaḻuvaḻukki kīḻē viḻuntāṉ. e:2ai ki2avan va:2aip pa2ath tho:l me:l sarusarukki va2uva2ukki ki:2e: vi2undha:n.
அவளும் இவளும் அவல் அளக்காவிட்டால், எவள் அவலளப்பாள் ? əʋəɭʉm ʲɪʋəɭʉm əʋəl əɭəkkɑːʋɪʈʈɑːl , ʲɛʋəɭ əʋələɭəppɑːɭ ? avaLum ivaLum aval aLakkAviTTAl, evaL avalaLappAL ? avaḷum ivaḷum aval aḷakkāviṭṭāl, evaḷ avalaḷappāḷ ? ava.lum iva.lum aval a.lakka:vi.tta:l, eva.l avala.lappa:.l?
கொக்கு நெட்ட கொக்கு. நெட்ட கொக்கு இட்ட முட்ட, கட்ட முட்ட. kokkʉ nɛʈʈə kokkʉ . nɛʈʈə kokkʉ ʲɪʈʈə mʊʈʈə , kəʈʈə mʊʈʈə . kokku neTTa kokku. neTTa kokku iTTa muTTa, kaTTa muTTa. kokku neṭṭa kokku. neṭṭa kokku iṭṭa muṭṭa, kaṭṭa muṭṭa. kokku netta kokku. netta kokku itta mutta, katta mutta.
குலை குலையாய் வாழைப்பழம், மழையில் அழுகி கீழே விழுந்தது. kʊləj kʊləjjɑːj ʋɑːɻəjppəɻəm , məɻəjjɪl əɻʉgɪˑ kiːɻeˑ ʋɪɻʉn̪d̪əd̪ʉ . kulai kulaiyAy vAzhaippazham, mazhaiyil azhuki kIzhE vizhuntatu. kulai kulaiyāy vāḻaippaḻam, maḻaiyil aḻuki kīḻē viḻuntatu. kulai kulaiya:y va:2aippa2am, ma2aiyil a2ugi ki:2e: vi2undhadhu.

vowels

tamil vowel natchat equivalent IPA
a ä
a: äː
i i
i:
u u
u:
e e
e:
ai aɪ̯
o o
o:
au aʊ̯
kh or x 1 archaic, g ~ x ~ ɣ
(nasalization, spoken only) 2 - (hyphen following vowel) nasal vowel ([ãː], [õː], etc.)
  1. This letter is not used in conversational Tamil, it is only included for completeness. This same representation may be used for intervocalic க, which in spoken Tamil shifts from [k] to [g ~ x ~ ɣ].
  2. Some may wonder why a spoken-only feature like nasalization is included in this romanization. This is discussed further below, but the short answer is that it preserves a distinction important to spoken Tamil, e.g. அவ(ன்) /ʌʋə̃/ vs. அவ(ள்) /ʌʋə/
  3. In spoken Tamil, word-final உ often undergoes a vowel shift from [u] to [ɯ ~ ʉ ~ ɨ], this phenomenon is called குற்‌றியலுகரம்‌ (kut^riyalugaram). This phenomenon is semantically unimportant and is inherently understood by fluent speakers, and therefore has been left unrepresented. However, u/ can be used as a representation if need be.

consonants

Doubled retroflex consonants may be written with only one preceding period for simplicity, for example ku.tti குட்டி.

tamil consonant natchat transliteration (with 'phonetic' alternatives) IPA
க் k (or g) k, medial g ~ x ~ ɣ
ங் ng ŋ
ச் c (or s) t͡ɕ ~ t͡ʃ, medial s, postnasal dʑ~dʒ
ஞ் ~n ɲ
ட் .t (or .d, or t or d 1) ʈ, medial ɖ~ɽ
ண் .n ɳ
த் th (or dh) t̪, medial d̪ ~ ð
ந் n 2)
ப் p (or b) p, medial b~β
ம் m m
ய் y j
ர் r ɾ
ல் l l
வ் v ʋ
ழ் 1 or 2 or zh 3 ɻ
ள் .l ɭ
ற் ^r (or r 4) r
ன் n n
ற்ற t^r (or tr 5) tːr
ன்ற nd^r (or ndr 5) ndr
ஞ்ச nj 5 n̠ʲd̠ʒ

Notes:

  1. Unvoiced and voiced ட can be alternatively written t and d since there is no equivalent alveolar sound in Tamil, thus no confusion in a Tamil context. However, since they are unique from AmEng "t" and "d", to preserve retroflex representation, they are written here with a period preceding.
  2. It is generally unnecessary to distinguish ன் from ந், which are allophonic in modern speech and can be distinguished purely grammatically. If necessary, denti-alveolar ந் can be written as _n.
  3. For representing retroflex approximant ழ், several options are possible depending on what is easiest to read. New romanization 2, having a slight resemblance to capital R, represents the similarity between ழ் and AmEng "r" or Chinese pinyin "r". Another romanization 1 resembles a lowercase "l" and represents its "L-ness", merger with ள் in many speakers, and existing romanization as L in words such as "Tamil" and "Eelam". zh is carried over from legacy transliteration schemes such as ITRANS and may be easier for some readers.
  4. If it is unnecessary to distinguish ர் from ற் in casual typing, both can be written as r.
  5. For ease of representing the phonetic change caused by Tamil grammar for digraphs ற்ற and ன்ற, a t or d is placed in between the consonants, respectively. Similarly, for representing the phonetic change in the digraph ஞ்ச, it can simply be written nj rather than ~nc.

Loaned Grantha consonants:

grantha consonant natchat equivalent IPA
j
ஶ் `s ɕ
ஷ் sh ʃ ~ ʂ
ஸ் s 1 s
ஹ் h h
க்ஷ் ksh
  1. This technically creates an overlap with word-initial ச், but ஸ் is a marginal grapheme that only occurs in loanwords so I'm not worried about it.

discussion of existing romanization schemes and issues

Table of existing Tamil transliteration schemes, focusing primarily on the areas of difference between schemes - long vowels and particular consonants. Hunterian transliteration, while the official romanization of the Government of India, does not account for non-Devanagari scripts such as Tamil.

tamil letter IPA pronunciation ITRANS (1991-2013) ISO 15919 (2001) National Library romanization Common web usage (e.g. social media, blogs)
äː A ā, aa ā a or aa
I ī, ii ī i or ee
U ū, uu ū u or uu or oo
E ē, ee ē e or ee or E or ae, etc
O ō, oo ō o
archaic, g ~ x ~ ɣ q, aH ḵ, _k
க் k, medial g ~ x ~ ɣ k, g k k, g k or g
ங் ŋ ~N ṅ, ;n n (ṇ) ng
ச் t͡ɕ ~ t͡ʃ, medial s, postnasal dʑ~dʒ ch c c ch or s or c
ஞ் ɲ ~n ñ, ~n ñ ny or nj or gn
ட் ʈ, medial ɖ~ɽ T/D ṭ, .t t
ண் ɳ N ṇ, .n n or nn or N
த் t̪, medial d̪ ~ ð t/th/dh t t or th, d, dh, etc
ந் n n n n
ப் p, medial b~β p, b p p, b p or b
ழ் ɻ zh, J ḻ, _l zh or l
ள் ɭ L ḷ, .l l
ற் r R ṟ, _r r or rr
ன் n ^n ṉ, _n n
ற்ற tːr rr or tr
ன்ற ndr nr or ndr or ntr
ஞ்ச n̠ʲd̠ʒ GY nj or nch
ஶ் ɕ sh ś, ;s ś
ஷ் ʂ Sh ṣ, .s
க்ஷ் x ksh

Current schemes are non-intuitive to new readers (e.g. ITRANS use of mixed capitalization, 'zh' not producing a z or h sound), or rely upon diacritics which are not easy to access on a standard American/English keyboard and therefore rarely used in casual typing. This creates problems when...

  • ... Casual romanizations often overlap or do not distinguish between vowel length, vowel sound, consonant articulation, etc, creating major difficulties in reading casually-typed Tamil across the web. For example, "oo" can be used to mean [o:] or [u:]; "ee" can mean [i:] or [e:]; "l" can be [l] [ɭ] or [ɻ], "t" can denote [t̪] [ʈ] [ɖ] etc.
  • Most existing schemes are built first for Devanagari scripts, therefore a four-way distinction of aspiration & voicing. This is unnecessary for Tamil (which has no such distinction) and takes up letters that could be used for Tamil-specific consonants. Hunterian is the scheme used by the Indian government, but is particularly unsuited to Dravidian languages.
  • ISO lacks internal logic to some diacritics: for example, a bar below a letter can indicate an alveolar ṉ, retroflex ḻ, or trill ṟ.
  • Current schemes do not represent spoken changes that occur due to Tamil grammar rules (e.g. linking [t, d] in ற்ற [tːr], ன்ற [ndr]; ச் [t͡ɕ] to ஞ்ச் [n̠ʲd̠ʒ]) or nasalization (for example between reduced forms of அவ(ன்) /ʌʋə̃/ and அவ(ள்) /ʌʋə/, which might both be casually typed "ava" if nasalization is not preserved)

Therefore, goals of a new romanization scheme:

  • intuitive representation of Tamil sounds
  • easy to type & read by an American English speaker
  • easy to mentally convert into accurate spoken Tamil
  • not strict with regards to graphemes, allowing for representation of allophones such as [k, g] and [p, b]
  • most importantly, easy enough to use on a regular basis, like a Tamil version of Arabizi

Qualities of this romanization scheme might be as follows:

  • English loanwords may be typed with English spelling
  • simple way of distinguishing retroflex and alveolar consonants, in particular
  • simple and unambiguous way of distinguishing short and long vowels
  • no mixed case
  • no diacritics that require anything other than an ASCII keyboard

The two main problems are in

  • allocating symbols to the L's ல் and ள், the R's ர் and ற், and retroflex approximant ழ் which has charisteristics of both L and R. There are simply not enough similar consonants in the Latin script, necessitating either diacritics or creative solutions.
  • distinguishing vowel length in a way that does not lead to confusion about vowel quality.

see also

About

A Tamil transliteration and romanization scheme for microblogging and Web use

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published