From 436382f306fee2006f044a8601e08ab484a59ceb Mon Sep 17 00:00:00 2001 From: bilash saha Date: Tue, 27 Oct 2020 14:51:55 -0600 Subject: [PATCH] un-ori-Orya-Latn-1972 - implemented un-ori-Orya-Latn-1972 - added digits to bis-ori-Orya-Latn-13194-1991 --- maps/bis-ori-Orya-Latn-13194-1991.yaml | 19 +- maps/un-ori-Orya-Latn-1972.yaml | 247 +++++++++++++++++++++++++ 2 files changed, 264 insertions(+), 2 deletions(-) create mode 100644 maps/un-ori-Orya-Latn-1972.yaml diff --git a/maps/bis-ori-Orya-Latn-13194-1991.yaml b/maps/bis-ori-Orya-Latn-13194-1991.yaml index 9cf19ee6..f6b11f10 100644 --- a/maps/bis-ori-Orya-Latn-13194-1991.yaml +++ b/maps/bis-ori-Orya-Latn-13194-1991.yaml @@ -40,9 +40,11 @@ tests: - source: "ନବନିଯୁକ୍ତ ଓଡିଶା କଂଗ୍ରେସ ପ୍ରଭାରୀ ଏ.ଚେଲ୍ଲା କୁମାରଙ୍କୁ କରୋନା" expected: "nbniyukt ŏḍiśā kṅgrēs prbhārī ē.cēllā kumārṅku krŏnā" - source: "ଦିଲ୍ଲୀ: ଦିନ ଦ୍ବିପହରରେ ଗାଡ଼ି ଉପରକୁ ଦୁର୍ବୃତ୍ତ ଚଳାଇଲେ ୮ ରାଉଣ୍ଡ ଗୁଳି: ଚାଳକଙ୍କ ମୃତ୍ୟୁ" - expected: "dillī: din dbiphrrē gād̂i uprku durbṛtt cḷāilē ୮ rāuṇḍ guḷi: cāḷkṅk mṛtẏu" + expected: "dillī: din dbiphrrē gād̂i uprku durbṛtt cḷāilē 8 rāuṇḍ guḷi: cāḷkṅk mṛtẏu" - source: "ବୟସରେ ଆର ପାରିକୁ ଚାଲିଗଲେ କଣ୍ଠଶିଳ୍ପୀ ଅନୁରାଧା ପୋଡୱାଲଙ୍କ ପୁଅ ଆଦିତ୍ୟ" expected: "bẏsrē ār pāriku cāliglē kṇṭhśiḷpī anurādhā pēāḍୱālṅk pua āditẏ" + - source: "୦୧୭୧୬୪୨୯୭୦୦" + expected: "01716429700" map: @@ -157,4 +159,17 @@ map: '଼': '' '।': '.' "‍": ''# Used for joining - "‌": ''# Used for non joining \ No newline at end of file + "‌": ''# Used for non joining + + # Numbers + + '୦': '0' + '୧': '1' + '୨': '2' + '୩': '3' + '୪': '4' + '୫': '5' + '୬': '6' + '୭': '7' + '୮': '8' + '୯': '9' \ No newline at end of file diff --git a/maps/un-ori-Orya-Latn-1972.yaml b/maps/un-ori-Orya-Latn-1972.yaml new file mode 100644 index 00000000..5df1beb6 --- /dev/null +++ b/maps/un-ori-Orya-Latn-1972.yaml @@ -0,0 +1,247 @@ +--- +authority_id: ungegn +id: 1972 +language: iso-639-2:ori +source_script: Orya +destination_script: Latn +name: REPORT ON THE CURRENT STATUS OF UNITED NATIONS ROMANIZATION SYSTEMS FOR GEOGRAPHICAL NAMES -- Oriya Romanization, 1972 +url: http://www.eki.ee/wgrs/v2_2/rom1_or.pdf +creation_date: 1972 +confirmation_date: 2003 +description: | + The United Nations recommended system was approved in 1972 (II/11), based on a report + prepared by D. N. Sharma. The note on the system was published in volume II of the + conference reports. + + There is no evidence of the use of the system either in India or in international cartographic + products. + + Oriya uses an alphasyllabic script whereby each character represents a syllable rather than one sound. + Vowels and diphthongs are marked in two ways: as independent characters (used syllable-initially) and in an + abbreviated form, to denote vowels after consonants. The romanization table is unambiguous. The system is mostly + reversible but there may exist some ambiguities in the romanization of vowels (independent vs. abbreviated characters) + and consonants (combinations with subscript consonants vs. character sequences). + +notes: + - Combinations with r as the first component are written with a special superscript symbol, e.g. ର୍କ rka. + +tests: + - source: "ର୍କ" + expected: "rka" + - source: "ଓଡ଼ିଆ" + expected: "oṙiā" + - source: "ଓଡ଼ିଶା" + expected: "oṙishā" + - source: "ଭୁବନେଶ୍ୱର" + expected: "bhubaneshvara" + - source: "ଆଇପିଏଲ୍‌-୧୩: ଦିଲ୍ଲୀ କ୍ୟାପିଟାଲ୍ସକୁ ୮୮ ରନ୍‌ ପରାସ୍ତ କଲା ସନରାଇଜର୍ସ ହାଇଦ୍ରାବାଦ" + expected: "āipiel-13: dillī kyāpiṭālsaku 88 ran parāsta kalā sanarāijarsa hāidrābāda" + - source: "ପ୍ରେମ ସମ୍ପର୍କରେ ଭଟ୍ଟା: ରାଗରେ ପ୍ରେମିକାର ତଣ୍ଟି କାଟି ନିଜେ ବିଷ ପିଇଲା ପ୍ରେମିକ" + expected: "prema samparkare bhaṭṭā: rāgare premikāra taṇṭi kāṭi nije biṣha piilā premika" + - source: "ପ୍ରେମ ସମ୍ପର୍କରେ ଭଟ୍ଟା: ରାଗରେ ପ୍ରେମିକାର ତଣ୍ଟି କାଟି ନିଜେ ବିଷ ପିଇଲା ପ୍ରେମିକ" + expected: "prema samparkare bhaṭṭā: rāgare premikāra taṇṭi kāṭi nije biṣha piilā premika" + - source: "ହୋଟେଲ, ଲଜ୍‌ରେ ରୁମ୍‌ ମିଳୁନି: ନେତା‌ଙ୍କ ନାଁରେ ଆଗୁଆ ହୋଇଯାଇଛି ବୁକିଂ" + expected: "heāṭela, lajre rum miḷuni: netāṅka nāmre āguā heāiỵāichhhi bukiṃ" + - source: "ପର୍ଯ୍ୟଟକମାନଙ୍କ ନିମନ୍ତେ ନଭେମ୍ବର ୧ରୁ ଖୋଲିବ ଶିମିଳିପାଳ ଅଭୟାରଣ୍ୟ" + expected: "parỵyaṭakamānaṅka nimante nabhembara 1ru kholiba shimiḷipāḷa abhayāraṇya" + - source: "ପାରିବାରିକ ଅଶାନ୍ତିର କରୁଣ ପରିଣତି: କୂଅକୁ ଡେଇଁଲେ ମା’-ଝିଅ, ଝିଅ ମୃତ" + expected: "pāribārika ashāntira karuṇa pariṇati: kūaku ḍeimle mā’-jhia, jhia mṛta" + - source: "‘ଭ୍ରଷ୍ଟାଚାରର ବଂଶବାଦ’ ଏବେ ସାଜିଛି ଦେଶ ପାଇଁ ନୂଆ ସମସ୍ୟା; ପ୍ରଧାନମନ୍ତ୍ରୀ ମୋଦୀ" + expected: "‘bhraṣhṭāchārara baṃshabāda’ ebe sājichhhi desha pāim nūā samasyā; pradhānamantrī modī" + - source: "ପାହାଡ଼ି ଇଲାକାବାସୀଙ୍କ ଆଶାର ବତୀ ‘ପାର୍ବତୀ’" + expected: "pāhāṙi ilākābāsīṅka āshāra batī ‘pārbatī’" + + +map: + + rules: + - pattern: ([କ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'k' + - pattern: ([ଖ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'kh' + - pattern: ([ଗ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'g' + - pattern: ([ଘ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'gh' + - pattern: ([ଙ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'ṅ' + - pattern: ([ଚ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'ch' + - pattern: ([ଛ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'chhh' + - pattern: ([ଜ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'j' + - pattern: ([ଝ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'jh' + - pattern: ([ଞ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'ñ' + - pattern: ([ଟ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'ṭ' + - pattern: ([ଠ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'ṭh' + - pattern: ([ଡ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'ḍ' + - pattern: ([ଡ଼]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'ṙ' + - pattern: ([ଢ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'ḍh' + - pattern: ([ଢ଼]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'ṙh' + - pattern: ([ଣ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'ṇ' + - pattern: ([ତ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 't' + - pattern: ([ଥ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'th' + - pattern: ([ଦ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'd' + - pattern: ([ଧ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'dh' + - pattern: ([ନ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'n' + - pattern: ([ପ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'p' + - pattern: ([ଫ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'ph' + - pattern: ([ବ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'b' + - pattern: ([ଭ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'bh' + - pattern: ([ମ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'm' + - pattern: ([ଯ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'ỵ' + - pattern: ([ୟ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'y' + - pattern: ([ର]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'r' + - pattern: ([ଲ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'l' + - pattern: ([ଳ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'ḷ' + - pattern: ([ଶ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'sh' + - pattern: ([ଷ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'ṣh' + - pattern: ([ସ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 's' + - pattern: ([ହ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'h' + - pattern: ([କ୍ଷ]=?)(?=[\u0b4d\u0b3e\u0b3f\u0b40\u0b41\u0b42\u0b43\u0b47\u0b48\u0b4b\u0b4c]) + result: 'kṣh' + + characters: + 'ଅ': 'a' + 'ଆ': 'ā' + 'ଇ': 'i' + 'ଈ': 'ī' + 'ଉ': 'u' + 'ଊ': 'ū' + 'ଋ': 'ṛ' + 'ୠ': 'ṝ' + 'ଌ': 'ḻ' + 'ଏ': 'e' + 'ଐ': 'ai' + 'ଓ': 'o' + 'ୱ': 'va' + 'ଔ': 'au' + + # II. Consonants (see Note 2) + # Gutturals + 'କ': 'ka' + 'ଖ': 'kha' + 'ଗ': 'ga' + 'ଘ': 'gha' + 'ଙ': 'ṅa' + + # Palatals + 'ଚ': 'cha' + 'ଛ': 'chhha' + 'ଜ': 'ja' + 'ଝ': 'jha' + 'ଞ': 'ña' + + # Cerebrals + 'ଟ': 'ṭa' + 'ଠ': 'ṭha' + 'ଡ': 'ḍa' + 'ଡ଼': 'ṙa' + 'ଢ': 'ḍha' + 'ଢ଼': 'ṙha' + 'ଣ': 'ṇa' + + # Dentals + 'ତ': 'ta' + 'ଥ': 'tha' + 'ଦ': 'da' + 'ଧ': 'dha' + 'ନ': 'na' + + # Labials + 'ପ': 'pa' + 'ଫ': 'pha' + 'ବ': 'ba' + 'ଭ': 'bha' + 'ମ': 'ma' + + # Semivowels + 'ଯ': 'ỵa' + 'ୟ': 'ya' + 'ର': 'ra' + 'ଲ': 'la' + 'ଳ': 'ḷa' + + # Sibilants + 'ଶ': 'sha' + 'ଷ': 'ṣha' + 'ସ': 'sa' + + + # Aspirate + 'ହ': 'ha' + + 'କ୍ଷ': 'kṣha' + + # Chandrabindu + 'ଁ': 'm' + + # Bisarga + 'ଃ': 'ḥ' + + # Anusvāra + 'ଂ': 'ṃ' + + # Medials # Needed for connecting constants + + 'ା': 'ā' + 'ି': 'i' + 'ୀ': 'ī' + 'ୁ': 'u' + 'ୂ': 'ū' + 'ୃ': 'ṛ' + 'େ': 'e' + 'ୈ': 'ai' + 'ୋ': 'o' + 'ୌ': 'au' + + '्': '' + '୍': '' + '़': '' + '଼': '' + '।': '.' + "‍": ''# Used for joining + "‌": ''# Used for non joining + + # Numbers + + '୦': '0' + '୧': '1' + '୨': '2' + '୩': '3' + '୪': '4' + '୫': '5' + '୬': '6' + '୭': '7' + '୮': '8' + '୯': '9' +