diff --git a/.gitignore b/.gitignore index 7c02ddd1..4497f41f 100644 --- a/.gitignore +++ b/.gitignore @@ -2,3 +2,4 @@ *.swp .vscode/ pkg/ +untracked-files/ diff --git a/maps/bgnpcgn-prs-Arab-Latn-2007.yaml b/maps/bgnpcgn-prs-Arab-Latn-2007.yaml new file mode 100755 index 00000000..bcce5859 --- /dev/null +++ b/maps/bgnpcgn-prs-Arab-Latn-2007.yaml @@ -0,0 +1,492 @@ +--- +authority_id: bgnpcgn +id: 2007 +language: prs # prs stands for Dari (https://iso639-3.sil.org/code/prs&_ga=GA1.2.2054538372.1574092823) +source_script: Arab +destination_script: Latn +name: BGN/PCGN NATIONAL ROMANIZATION SYSTEM FOR AFGHANISTAN -- BGN/PCGN 2007 System +url: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/693661/ROMANIZATION_FOR_AFGHANISTAN.pdf +creation_date: 2007 +confirmation_date: 2017-11 +description: | + This romanization system agreed by BGN and PCGN in November 2007, + accommodates the linguistic complexity of Afghanistan as manifest in + its geographical names. + + The following tabulation shows the original Perso-Arabic script with + accompanying Unicode value (columns 1a and b), the Yaghoubi + romanization (column 2), the BGN/PCGN romanization with accompanying + Unicode value (columns 3a and b), an English phonetic example (column + 4), and an example toponym (columns 5b and c). + + [The Yaghoubi romanization system was developed in 1959 by + Muzaffarud Din Yaqubi (commonly seen as Yaghoubi). It is a native + official system designed to reflect Afghan names, both Dari and Pashto, + and both pronunciation and genuine linguistic truth.] + + The tables function as both a romanization system for Afghanistan (i.e. + with access to the original script, these tables can be applied to get + a standardized Roman result - moving from columns 1 to 3) and as a + means of converting the available Yaghoubi Roman-script spellings, as + appear on the Fairchild Aerial Surveys map series, to standard BGN/PCGN + spellings (moving from columns 2 to 3). + + The points used in Arabic to mark short vowels and certain other + diacritical marks are infrequently written in Afghanistan. + Consequently, a reference source may sometimes be required to aid + correct identification of the standard spellings and proper vowels and + elimination of dialectal and idiosyncratic variations. In the interests + of clarity, the example columns show script with vowel pointing from + Arabic to indicate the short vowels that are included alongside the + unpointed form that will usually be encountered. However it should be + noted that the pronunciation of short vowels will vary. + + Note: it is recommended that a font such as Scheherazade, available + from www.sil.org, which includes the Unicode extended Arabic sub-range, + be used to view this system. [Please note that the identification of a + particular font does not represent an endorsement of any specific + product or manufacturer.] + +notes: + - | + Alif (ا) should be romanized as follows: + + a. Initially, it indicates that the word begins with a vowel or + diphthong; the alif itself is not romanized, but rather the short vowel + it “carr es” is romanized; e.g., ميړ أَسَلم ژرَندَه → Mī Aslam Zhrandah + b. When it carries a maddah (آ) (see vowel table, row 6), it represents ā; e.g., آب بَند → Āb Band. + c. Medially and finally it represents ā (see vowel table, row 5); e.g., ماڼۍ → Māṉêy + d. Medially and finally in words of Arabic origin, alif may serve as the bearer of hamzah, e.g. رأس → ra’s. + + - Occasionally the letter sequences سه ,زه ,که, and گه occur without + intervening vowels. They may be romanized k·h, z·h, s·h, and g·h in + order to differentiate these romanizations from the digraphs kh, zh, + sh, and gh, which are used to represent the letters ش ,ژ ,خ, and غ. + Additionally, the Pashto letters څ and ځ, routinely romanized ts and + dz, may be alternatively romanized s and z تس when for special reasons + it is desired that confusion be avoided with the character sequences + (ts) and دز (dz), respectively. + + - "The vagaries of written Afghan languages, as pertains to spacing + and word division, are addressed as follows: + Spaces may be added to or subtracted from Afghan words written in + Arabic script, for the purposes of standardization. This is + particularly relevant when the words are hand-written, are rendered + “art st cally”, or express other s ch non-standard flourishes, as long + as the sense of the toponym, word, or phrase is not compromised. + Romanized toponyms are typically divided into constituent words + (spaces and other grammatical rules applied) when those words can stand + independently, for purposes of standardization and minimization of + confusion, particularly in situations where Afghan writers are + inconsistent in their application of spacing and word breaks. When the + Afghan word or suffix is only used in combination with other nouns or + adjectives, then it should be appended to the preceding word in its + romanization. This includes (but is not limited to) - ābā , -zaī, -zā + ah, - ū, -wand, -gaī, -kaī, -pūr, - ēsh, -lar, -lī, -lū and ullāh, as, + for example, seen in Raḩmatābād (رحمت آباد) and Raḩmatullāh (رحمت االله), + but Raḩmat Khēl (رحمتخيل) and Raḩmat Shahr (رحمتشهر)." + + - The one-letter words د (Pashto) and و (Dari) are romanized dê and + wa, respectively. + + - The word الله, meaning God, should always be romanized Allāh, + except as specified in note 3. Note that the Unicode value FDF2 spells + Allāh, but omits the alif in some common fonts, including Times New + Roman. If in doubt, try in Arial Unicode MS to verify. Also note that + the “dagger al f” ( ) above the second ل (lām) n the ord الله, is not + written but should be romanized ā, like a full-size alif. + + - In names of Arabic origin, the l of the definite article al s ass m + lated before the ‘s n letters’ , , , , r, z, s, sh, ş, ẕ, , z, l and n. + In its romanization, the article should be separated from the name it + precedes and should not be capitalized except at the beginning of a + name, e.g. جبل السراج→ Jabal + as Sarāj + + - In Arabic names, a shaddah, ّ is used to denote the doubling of a + particular consonant character, e.g. ُم َح َمد → Muḩammad. Ho ever, n + Pashto th s ‘do bl ng’ s freq ently om tted n both Perso- Arabic script + and the resulting romanization. Guidance on doubling may be taken from + an authoritative names source, such as an Afghan government source or + Pashto dictionary; for example, it is usual to see Ḩājī without and + ‘Abbās with the doubled consonant. The doubled y consonant is almost + always retained, as in Sayyid or Qayyūm. + + - In Afghan names which contain an iẕāfah, it should be romanized as + -e or –ye according to + common pronunciation, but generally, -e is used if the preceding word + ends with a consonant other + than silent heh, and -ye if the preceding word ends with a vowel + sound e.g. غر ِحصار → Ghar-e ِ + Ḩ şār; َقل َع ٔه َنو → Qal‘ah-ye Now. Scholarly sources indicate that + heh is silent in darah and qal‘ah (thus darah-ye, qal‘ah-ye), but + lightly spoken in kōh and chāh (thus kōh-e, chāh-e). + + - The character sequence خو, where followed by ا or ی should be + romanized khwā or khwī, although the w is either not pronounced, or + only weakly so, as in خواجه → khwājah. + + - Plural nouns ending in -hā or -ān should always be romanized as a + single word, regardless of whether a space appears in a Perso-Arabic + script source. + + - Unicode values listed in the tables above are required to ensure + standardization and to minimize confusion from competing + representations of a given character. It should be noted that the + Persian Unicode value 0643 or FEDA( ك Unicode value 06A9) is + recommended rather than the Arabic( ک or FED9), the Persian گ (Unicode + value 06AF) is recommended rather than ګ (Unicode value 06AB) or ڰ + (Unicode value 06B0) or ك (Unicode value 0643 or FEDA or FED9), and the + Pashto character ځ (Unicode value 0681) is recommended rather than the + heh with a dot above and a dot below (no Unicode value). For the letter ی + in its many variations, care must be exercised to follow this romanization + guide's recommendations to eliminate confusion for search engines + and software. BGN/PCGN does not use the Unicode encoding FEEF for the + character ی in any Afghan word. + + - | + An inventory of letter-diacritic combinations in addition to the + unmodified letters of the basic Roman script is: + + ‘ (U+2018) + Ā (U+0100) + Á (U+00C1) + Ḏ (U+0044+0031) + Ē (U+9112) + Ê (U+00CA) + Ḩ (U+1E28) + Ī (U+012A) + N-bar-top (U+004E+0304) + Ō (U+014C) + R-bar-bottom (U+0052+0031) + Ş (U+015E) + S-bar-top (U+0053+0304) + Ṯ (U+0054+0031) + Ţ (U+0162) + Ū (U+918A) + Z-comma-bottom (U+005A+0327) + Z-bar-top (U+005A+0304) + Ẕ (U+005A+0331) + ẔH (U+005A+0048+035F) + + + ʼ (U+2019) + ā (U+0101) + á (U+00E1) + ḏ (U+0064+00031) + ē (U+0113) + ê (U+00EA) + ḩ (U+1E29) + ī (U+912B) + n-bar-top (U+004E+0304) + ō (U+014D) + r-bar-bottom (U+0072+0031) + ş (U+015F) + s-bar-top (U+0073+0304) + ṯ (U+0074+0031) + ţ (U+0163) + ū (U+918B) + z-comma-bottom (U+007A+0327) + z-bar-top (U+007A+0304) + ẕ (U+007A+0331) + zh-under-bar (U+007A+0068+035F) + + + - The Romanization columns show only lowercase forms but, when + romanizing, uppercase and lowercase Roman letters as appropriate should + be used. + + +tests: + - source: بَغْلان + expected: Baghlān + + - source: پوټكى + expected: Pōṯakay + + - source: شِرين تَگَاب + expected: Shīrīn Tagāb + + - source: کُوْټ + expected: Kōṯ + + - source: ثَابِر + expected: Sā̄bir + + - source: جَلال آبَاد + expected: Jalālābād + + - source: چَاريكَار + expected: Chārīkār + + - source: سُلْطَان حَضْرَتِ + expected: Ḩaẕrat-e Sulţān + + - source: خُوْسْت + expected: Khōst + + - source: ځَدْرَاڼ + expected: Dzadrāṉ + + - source: څَوْآۍ + expected: Tsowkêy + + - source: سْپِين بُوْلْدَک + expected: Spīn Bōldak + + - source: ډَنْډ وَ پَتَان + expected: Ḏanḏ wa Patān + + - source: گُذَرْگَاهٔ نُور + expected: Guz̄argāh-e Nūr + + - source: آَنْدَهَار + expected: Kandahār + + - source: اَنْدَړ + expected: Andaṟ + + - source: آُنْدُز + expected: Kunduz + + - source: ژْرَنْدَه مِيراَسْلَم + expected: Mīr Aslam Zhrandah + + - source: ږِيَره + expected: Zh̲ī̲rah + + - source: سَمَنْگَان + expected: Samangān + + - source: مَزَارِ شَريف + expected: Mazār-e Sharīf + + - source: آښتَه آَلا + expected: Ks̲h̲êtah Kalā + + - source: قَيْصَار + expected: Qayşār + + - source: فَيْض آبَاد + expected: Faīẕābād + + - source: سُلْطَان حَضْرَتِ + expected: Ḩaẕrat-e Sulţān + + - source: ظَاهِر آَلا + expected: Zā̧hir Kalā + + - source: پُلِ عَلَم + expected: Pul-e ‘Alam + + - source: غَزْنِى + expected: Ghaznī + + - source: مَزَارِ شَريف + expected: Mazār-e Sharīf + + - source: قَيْصَار + expected: Qayşār + + - source: آَنْدَهَار + expected: Kandahār + + - source: گَرْدېز + expected: Gardēz + + - source: کَابُل + expected: Kābul + + - source: مَيمَنَه + expected: Maīmanah + + - source: خَان آبَاد + expected: Khānābād + + - source: مَاڼۍ + expected: Māṉêy + + - source: وَاخَان + expected: Wākhān + + - source: هِرَات + expected: Herāt + + - source: يَنْگِی قَلعَه + expected: Yangī Qal‘ah + + - source: جَلال آبَاد + expected: Jalālābād + + - source: پُلِ حِصَار هِرات + expected: Herāt, Pul-e Ḩişār + + - source: کَابُل مُرْغَاب + expected: Murghāb, Kābul + + - source: گردُون + expected: Gêrdōn + + - source: آب بَنْد + expected: Āb Band + + - source: بُوْلْدَک سْپِين + expected: Spīn Bōldak + + - source: بَالا بُلُوک + expected: Bālā Bulūk + + - source: جَوزجَان + expected: Jowzjān + + - source: ، سْپِين غَزْنِى + expected: Ghaznī, Spīn + + - source: ، ريگ مَيوَنْد + expected: Maywand, Rēg + + - source: گَرْدېز + expected: Gardēz + + - source: مَيدان شَهْر + expected: Maīdān Shahr + + - source: ډَنْډِ سُفْلىٰ + expected: Ḏanḏ-e Suflá + + - source: څَوْآۍ + expected: Tsowkêy + + - source: هَوائِى ډَگَر + expected: Hawā’ī D̲agar + + - source: شَريف مَزارِ + expected: Mazār-e Sharīf + + - source: دايکندی + expected: Dāykundī + + - source: زيارت + expected: Zīārat + + - source: غوريان + expected: Ghōriyān + + - source: ميا + expected: Myā + +map: + characters: + + # These characters are not available with a single Unicode + # codepoint, so cannot be displayed here. When typing, the independent + # character’s codepoint will automatically display the the appropriate + # word-medial or word-final form where so appearing in a word. + '\u0627': '-' + + '\u0628': 'b' + '\u067E': 'p' + '\u062A': 't' + '\u067C': 'ṯ' + '\u062B': '\u0073\u0304' + '\u062C': 'j' + '\u0686': 'ch' + + # The variant form ج is seen infrequently and does not have a + # single Unicode encoding. + '\u0681': 'dz' # Note 2 + '\u0685': 'ts' # Note 2 + + '\u062D': 'ḩ' + '\u062E': 'kh' + '\u062F': 'd' + '\u0689': 'ḏ' + '\u0630': '\u007A\u0304' + '\u0631': 'r' + '\u0693': '\u1E5F' + '\u0632': 'z' + '\u0698': 'zh' + '\u0696': '\u007A\u0332\u0068\u0332' + '\u0633': 's' + '\u0634': 'sh' + '\u069A': '\u0073\u0332\u0068\u0332' + '\u0635': 'ş' + '\u0636': 'ẕ' + '\u0637': 'ţ' + '\u0638': '\u007A\u0327' + '\u0639': '‘' + '\u063A': 'gh' + '\u0641': 'f' + '\u0642': 'q' + '\u06A9': 'k' + '\u06AF': 'g' + '\u0644': 'l' + '\u0645': 'm' + '\u0646': 'n' + '\u06BC': 'ṉ' + '\u0648': 'w' + '\u0647': 'h' + '\u0649': 'y' + + # Vowel, Diphthong and Diacritical Characters + + '\u064E': 'a' + + # Both e and i are available to romanize this short vowel, + # depending on local usage and/or root language. In cases where the sound + # is uncertain, i is the default romanization in BGN/PCGN standardization + # procedures. + '\u0650': + - 'e' + - 'i' + + # Both o and u are available to romanize this short vowel, + # depending on local usage and/or root language. In cases where the sound + # is uncertain, u is the default romanization in BGN/PCGN standardization + # procedures. + '\u064F': + - 'o' + - 'u' + '\u0659': 'ê' + + # An alif with mad ( آ ) is written only in the initial position by + # BGN/PCGN standardization procedures, in keeping with Persian language + # family standards of use of the Arabic alphabet. The same letter written + # in a medial or final position is written . . . + '\u0622': 'ā' + + '\u0648': 'ō' + '\u0648': 'ū' + '\u0648': '\u006F\u0077' + '\u06CC': 'ī' + + # Or 'ē'. The character ی should be romanized ay or ē according to + # its root language or local pronunciation. In case of uncertainty a + # reference source (such as the Fairchild Aerial Surveys map series, or a + # BGN/PCGN approved policy document/list of recommended spellings) should + # be consulted. + '\u06CC': 'ay' + '\u06D0': 'ē' + + # Or 'aī'. Both the combination ay and aī are available to romanize + # this character according to its root language or local pronunciation. + # In cases where the sound is uncertain ay is the default romanization in + # BGN/PCGN standardization procedures + '\u06CC': + - 'ay' + - 'á' + '\u06CD': 'êy' + '\u0621': '’' + '\u0674': + - '-e' + - '-ye' + + # Other Diacritical Marks and Language Conventions + + '\u0627': 'āy' + + '\u0648': 'w' + '\u0626': '’' + '\u06C0': '' + '\u0651': '' + '\uFDF2': 'Allāh' # See note 5 diff --git a/maps/bgnpcgn-prs-Arab-Latn-yaghoubi.yaml b/maps/bgnpcgn-prs-Arab-Latn-yaghoubi.yaml new file mode 100644 index 00000000..609d6019 --- /dev/null +++ b/maps/bgnpcgn-prs-Arab-Latn-yaghoubi.yaml @@ -0,0 +1,335 @@ +--- +authority_id: bgnpcgn +id: yaghoubi +language: prs # prs stands for Dari (https://iso639-3.sil.org/code/prs&_ga=GA1.2.2054538372.1574092823) +source_script: Arab +destination_script: Latn +name: BGN/PCGN NATIONAL ROMANIZATION SYSTEM FOR AFGHANISTAN -- BGN/PCGN 2007 System (Yaghoubi) +url: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/693661/ROMANIZATION_FOR_AFGHANISTAN.pdf +creation_date: 2007 +confirmation_date: 2017-11 +description: | + This romanization system agreed by BGN and PCGN in November 2007, + accommodates the linguistic complexity of Afghanistan as manifest in + its geographical names. + + The following tabulation shows the original Perso-Arabic script with + accompanying Unicode value (columns 1a and b), the Yaghoubi + romanization (column 2), the BGN/PCGN romanization with accompanying + Unicode value (columns 3a and b), an English phonetic example (column + 4), and an example toponym (columns 5b and c). + + [The Yaghoubi romanization system was developed in 1959 by + Muzaffarud Din Yaqubi (commonly seen as Yaghoubi). It is a native + official system designed to reflect Afghan names, both Dari and Pashto, + and both pronunciation and genuine linguistic truth.] + + The tables function as both a romanization system for Afghanistan (i.e. + with access to the original script, these tables can be applied to get + a standardized Roman result - moving from columns 1 to 3) and as a + means of converting the available Yaghoubi Roman-script spellings, as + appear on the Fairchild Aerial Surveys map series, to standard BGN/PCGN + spellings (moving from columns 2 to 3). + + The points used in Arabic to mark short vowels and certain other + diacritical marks are infrequently written in Afghanistan. + Consequently, a reference source may sometimes be required to aid + correct identification of the standard spellings and proper vowels and + elimination of dialectal and idiosyncratic variations. In the interests + of clarity, the example columns show script with vowel pointing from + Arabic to indicate the short vowels that are included alongside the + unpointed form that will usually be encountered. However it should be + noted that the pronunciation of short vowels will vary. + + Note: it is recommended that a font such as Scheherazade, available + from www.sil.org, which includes the Unicode extended Arabic sub-range, + be used to view this system. [Please note that the identification of a + particular font does not represent an endorsement of any specific + product or manufacturer.] + +notes: + - | + Alif (ا) should be romanized as follows: + + a. Initially, it indicates that the word begins with a vowel or + diphthong; the alif itself is not romanized, but rather the short vowel + it “carr es” is romanized; e.g., ميړ أَسَلم ژرَندَه → Mī Aslam Zhrandah + b. When it carries a maddah (آ) (see vowel table, row 6), it + represents ā; e.g., آب بَند → Āb Band. + c. Medially and finally it represents ā (see vowel table, row 5); + e.g., ماڼۍ → Māṉêy + d. Medially and finally in words of Arabic origin, alif may serve + as the bearer of hamzah, e.g. رأس → ra’s. + + - Occasionally the letter sequences سه ,زه ,که, and گه occur without + intervening vowels. They may be romanized k·h, z·h, s·h, and g·h in + order to differentiate these romanizations from the digraphs kh, zh, + sh, and gh, which are used to represent the letters ش ,ژ ,خ, and غ. + Additionally, the Pashto letters څ and ځ, routinely romanized ts and + dz, may be alternatively romanized s and z تس when for special reasons + it is desired that confusion be avoided with the character sequences + (ts) and دز (dz), respectively. + + - "The vagaries of written Afghan languages, as pertains to spacing + and word division, are addressed as follows: + Spaces may be added to or subtracted from Afghan words written in + Arabic script, for the purposes of standardization. This is + particularly relevant when the words are hand-written, are rendered + “art st cally”, or express other s ch non-standard flourishes, as long + as the sense of the toponym, word, or phrase is not compromised. + Romanized toponyms are typically divided into constituent words + (spaces and other grammatical rules applied) when those words can stand + independently, for purposes of standardization and minimization of + confusion, particularly in situations where Afghan writers are + inconsistent in their application of spacing and word breaks. When the + Afghan word or suffix is only used in combination with other nouns or + adjectives, then it should be appended to the preceding word in its + romanization. This includes (but is not limited to) - ābā , -zaī, -zā + ah, - ū, -wand, -gaī, -kaī, -pūr, - ēsh, -lar, -lī, -lū and ullāh, as, + for example, seen in Raḩmatābād (رحمت آباد) and Raḩmatullāh (رحمت االله), + but Raḩmat Khēl (رحمتخيل) and Raḩmat Shahr (رحمتشهر)." + + - The one-letter words د (Pashto) and و (Dari) are romanized dê and + wa, respectively. + + - The word الله, meaning God, should always be romanized Allāh, + except as specified in note 3. Note that the Unicode value FDF2 spells + Allāh, but omits the alif in some common fonts, including Times New + Roman. If in doubt, try in Arial Unicode MS to verify. Also note that + the “dagger al f” ( ) above the second ل (lām) n the ord الله, is not + written but should be romanized ā, like a full-size alif. + + - In names of Arabic origin, the l of the definite article al s ass m + lated before the ‘s n letters’ , , , , r, z, s, sh, ş, ẕ, , z, l and n. + In its romanization, the article should be separated from the name it + precedes and should not be capitalized except at the beginning of a + name, e.g. جبل السراج→ Jabal + as Sarāj + + - In Arabic names, a shaddah, ّ is used to denote the doubling of a + particular consonant character, e.g. ُم َح َمد → Muḩammad. Ho ever, n + Pashto th s ‘do bl ng’ s freq ently om tted n both Perso- Arabic script + and the resulting romanization. Guidance on doubling may be taken from + an authoritative names source, such as an Afghan government source or + Pashto dictionary; for example, it is usual to see Ḩājī without and + ‘Abbās with the doubled consonant. The doubled y consonant is almost + always retained, as in Sayyid or Qayyūm. + + - In Afghan names which contain an iẕāfah, it should be romanized as + -e or –ye according to + common pronunciation, but generally, -e is used if the preceding word + ends with a consonant other + than silent heh, and -ye if the preceding word ends with a vowel + sound e.g. غر ِحصار → Ghar-e ِ + Ḩ şār; َقل َع ٔه َنو → Qal‘ah-ye Now. Scholarly sources indicate that + heh is silent in darah and qal‘ah (thus darah-ye, qal‘ah-ye), but + lightly spoken in kōh and chāh (thus kōh-e, chāh-e). + + - The character sequence خو, where followed by ا or ی should be + romanized khwā or khwī, although the w is either not pronounced, or + only weakly so, as in خواجه → khwājah. + + - Plural nouns ending in -hā or -ān should always be romanized as a + single word, regardless of whether a space appears in a Perso-Arabic + script source. + + - Unicode values listed in the tables above are required to ensure + standardization and to minimize confusion from competing + representations of a given character. It should be noted that the + Persian Unicode value 0643 or FEDA( ك Unicode value 06A9) is + recommended rather than the Arabic( ک or FED9), the Persian گ (Unicode + value 06AF) is recommended rather than ګ (Unicode value 06AB) or ڰ + (Unicode value 06B0) or ك (Unicode value 0643 or FEDA or FED9), and the + Pashto character ځ (Unicode value 0681) is recommended rather than the + heh with a dot above and a dot below (no Unicode value). For the letter ی + in its many variations, care must be exercised to follow this romanization + guide's recommendations to eliminate confusion for search engines + and software. BGN/PCGN does not use the Unicode encoding FEEF for the + character ی in any Afghan word. + + - | + An inventory of letter-diacritic combinations in addition to the + unmodified letters of the basic Roman script is: + + ‘ (U+2018) + Ā (U+0100) + Á (U+00C1) + Ḏ (U+0044+0031) + Ē (U+9112) + Ê (U+00CA) + Ḩ (U+1E28) + Ī (U+012A) + N-bar-top (U+004E+0304) + Ō (U+014C) + R-bar-bottom (U+0052+0031) + Ş (U+015E) + S-bar-top (U+0053+0304) + Ṯ (U+0054+0031) + Ţ (U+0162) + Ū (U+918A) + Z-comma-bottom (U+005A+0327) + Z-bar-top (U+005A+0304) + Ẕ (U+005A+0331) + ẔH (U+005A+0048+035F) + + + ʼ (U+2019) + ā (U+0101) + á (U+00E1) + ḏ (U+0064+00031) + ē (U+0113) + ê (U+00EA) + ḩ (U+1E29) + ī (U+912B) + n-bar-top (U+004E+0304) + ō (U+014D) + r-bar-bottom (U+0072+0031) + ş (U+015F) + s-bar-top (U+0073+0304) + ṯ (U+0074+0031) + ţ (U+0163) + ū (U+918B) + z-comma-bottom (U+007A+0327) + z-bar-top (U+007A+0304) + ẕ (U+007A+0331) + zh-under-bar (U+007A+0068+035F) + + - The Romanization columns show only lowercase forms but, when + romanizing, uppercase and lowercase Roman letters as appropriate should + be used. + + +tests: + - source: بَغْلان + expected: Baghlān + - source: پوټكى + expected: Pōṯakay + - source: شِرين تَگَاب + expected: Shīrīn Tagāb + - source: کُوْټ + expected: Kōṯ + - source: ثَابِر + expected: Sā̄bir + +map: + characters: + + # These characters are not available with a single Unicode + # codepoint, so cannot be displayed here. When typing, the independent + # character’s codepoint will automatically display the the appropriate + # word-medial or word-final form where so appearing in a word. + '\u0627': '-' + + '\u0628': 'b' + '\u067E': 'p' + '\u062A': '\u0074\u0304' + '\u067C': 't' + '\u062B': '\u0073\u0304' + '\u062C': 'j' + '\u0686': 'č' + + # The variant form ج is seen infrequently and does not have a single Unicode encoding. + '\u0681': '\u006A\u0304' # Note 2 + '\u0685': 'c' # Note 2 + + '\u062D': 'ẖ' + '\u062E': 'kh' + '\u062F': 'ḏ' + '\u0689': 'd' + '\u0630': '\u007A\u0304' + '\u0631': 'ṟ' + '\u0693': 'r' + '\u0632': 'z' + '\u0698': 'ž' + '\u0696': '\u017E\u0332' + '\u0633': 's' + '\u0634': 'š' + '\u069A': '\u0161\u0332' + '\u0635': '\u0073\u0332' + '\u0636': '\u0064\u0332\u007A' + '\u0637': 'ṯ' + '\u0638': 'ẕ' + '\u0639': '’' + '\u063A': 'gh' + '\u0641': 'f' + '\u0642': 'q' + '\u06A9': 'k' + '\u06AF': 'g' + '\u0644': 'l' + '\u0645': 'm' + '\u0646': 'n' + '\u06BC': 'ṉ' + '\u0648': 'w' + '\u0647': 'h' + '\u0649': 'y' + + # Vowel, Diphthong and Diacritical Characters + + '\u064E': + - 'a' + - 'â' + + # Both e and i are available to romanize this short vowel, + # depending on local usage and/or root language. In cases where the sound + # is uncertain, i is the default romanization in BGN/PCGN standardization + # procedures. + '\u0650': + - 'e' + - 'i' + + # Both o and u are available to romanize this short vowel, + # depending on local usage and/or root language. In cases where the sound + # is uncertain, u is the default romanization in BGN/PCGN standardization + # procedures. + '\u064F': + - 'o' + - 'u' + + '\u0659': + - 'ə' + - 'ê' + '\u0622': 'ā' + + # An alif with mad ( آ ) is written only in the initial position by + # BGN/PCGN standardization procedures, in keeping with Persian language + # family standards of use of the Arabic alphabet. The same letter written + # in a medial or final position is written . . . + '\u0648': 'ō' + + '\u0648': + - 'u' + - 'ū' + + '\u0648': 'aw' # Or 'āw' + '\u06CC': 'i' # Or 'ī' + + # Or 'ē'. The character ی should be romanized ay or ē according to + # its root language or local pronunciation. In case of uncertainty a + # reference source (such as the Fairchild Aerial Surveys map series, or a + # BGN/PCGN approved policy document/list of recommended spellings) should + # be consulted. + '\u06CC': 'ay' + + '\u06D0': 'ē' # Or 'ay' + '\u06CC': 'ay' # Or 'āy'. + + # Both the combination ay and aī are available to romanize this + # character according to its root language or local pronunciation. In + # cases where the sound is uncertain ay is the default romanization in + # BGN/PCGN standardization procedures + '\u06CC': 'ā' + + '\u06CD': 'ə y' # Or 'ay' + '\u0621': '’' + '\u0674': + - '-i-' + - 'e' + - 'ī' + + # Other Diacritical Marks and Language Conventions + + '\u0627': 'ay' # Or 'āy' + '\u06CC': 'ya' # Or 'yā' + + '\u0648': 'w' + '\u06C0': '. . .h-e' diff --git a/reference-docs/bgn-pcgn/ROMANIZATION_FOR_AFGHANISTAN.pdf b/reference-docs/bgn-pcgn/ROMANIZATION_FOR_AFGHANISTAN.pdf new file mode 100644 index 00000000..6afb9f2e Binary files /dev/null and b/reference-docs/bgn-pcgn/ROMANIZATION_FOR_AFGHANISTAN.pdf differ