diff --git a/.gitignore b/.gitignore
index 7c02ddd1..4497f41f 100644
--- a/.gitignore
+++ b/.gitignore
@@ -2,3 +2,4 @@
 *.swp
 .vscode/
 pkg/
+untracked-files/
diff --git a/maps/bgnpcgn-prs-Arab-Latn-2007.yaml b/maps/bgnpcgn-prs-Arab-Latn-2007.yaml
new file mode 100755
index 00000000..bcce5859
--- /dev/null
+++ b/maps/bgnpcgn-prs-Arab-Latn-2007.yaml
@@ -0,0 +1,492 @@
+---
+authority_id: bgnpcgn
+id: 2007
+language: prs # prs stands for Dari (https://iso639-3.sil.org/code/prs&_ga=GA1.2.2054538372.1574092823)
+source_script: Arab
+destination_script: Latn
+name: BGN/PCGN NATIONAL ROMANIZATION SYSTEM FOR AFGHANISTAN -- BGN/PCGN 2007 System
+url: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/693661/ROMANIZATION_FOR_AFGHANISTAN.pdf
+creation_date: 2007
+confirmation_date: 2017-11
+description: |
+  This romanization system agreed by BGN and PCGN in November 2007,
+  accommodates the linguistic complexity of Afghanistan as manifest in
+  its geographical names.
+
+  The following tabulation shows the original Perso-Arabic script with
+  accompanying Unicode value (columns 1a and b), the Yaghoubi
+  romanization (column 2), the BGN/PCGN romanization with accompanying
+  Unicode value (columns 3a and b), an English phonetic example (column
+  4), and an example toponym (columns 5b and c).
+
+  [The Yaghoubi romanization system was developed in 1959 by
+  Muzaffarud Din Yaqubi (commonly seen as Yaghoubi). It is a native
+  official system designed to reflect Afghan names, both Dari and Pashto,
+  and both pronunciation and genuine linguistic truth.]
+
+  The tables function as both a romanization system for Afghanistan (i.e.
+  with access to the original script, these tables can be applied to get
+  a standardized Roman result - moving from columns 1 to 3) and as a
+  means of converting the available Yaghoubi Roman-script spellings, as
+  appear on the Fairchild Aerial Surveys map series, to standard BGN/PCGN
+  spellings (moving from columns 2 to 3).
+
+  The points used in Arabic to mark short vowels and certain other
+  diacritical marks are infrequently written in Afghanistan.
+  Consequently, a reference source may sometimes be required to aid
+  correct identification of the standard spellings and proper vowels and
+  elimination of dialectal and idiosyncratic variations. In the interests
+  of clarity, the example columns show script with vowel pointing from
+  Arabic to indicate the short vowels that are included alongside the
+  unpointed form that will usually be encountered. However it should be
+  noted that the pronunciation of short vowels will vary.
+
+  Note: it is recommended that a font such as Scheherazade, available
+  from www.sil.org, which includes the Unicode extended Arabic sub-range,
+  be used to view this system. [Please note that the identification of a
+  particular font does not represent an endorsement of any specific
+  product or manufacturer.]
+
+notes:
+  - |
+    Alif (ا) should be romanized as follows:
+
+    a. Initially, it indicates that the word begins with a vowel or
+      diphthong; the alif itself is not romanized, but rather the short vowel
+      it “carr es” is romanized; e.g., ميړ أَسَلم ژرَندَه → Mī Aslam Zhrandah
+    b. When it carries a maddah (آ) (see vowel table, row 6), it represents ā; e.g., آب بَند → Āb Band.
+    c. Medially and finally it represents ā (see vowel table, row 5); e.g., ماڼۍ → Māṉêy
+    d. Medially and finally in words of Arabic origin, alif may serve as the bearer of hamzah, e.g. رأس → ra’s.
+
+  - Occasionally the letter sequences سه ,زه ,که, and گه occur without
+    intervening vowels. They may be romanized k·h, z·h, s·h, and g·h in
+    order to differentiate these romanizations from the digraphs kh, zh,
+    sh, and gh, which are used to represent the letters ش ,ژ ,خ, and غ.
+    Additionally, the Pashto letters څ and ځ, routinely romanized ts and
+    dz, may be alternatively romanized s and z تس when for special reasons
+    it is desired that confusion be avoided with the character sequences
+    (ts) and دز (dz), respectively.
+
+  - "The vagaries of written Afghan languages, as pertains to spacing
+    and word division, are addressed as follows:
+    Spaces may be added to or subtracted from Afghan words written in
+    Arabic script, for the purposes of standardization. This is
+    particularly relevant when the words are hand-written, are rendered
+    “art st cally”, or express other s ch non-standard flourishes, as long
+    as the sense of the toponym, word, or phrase is not compromised.
+    Romanized toponyms are typically divided into constituent words
+    (spaces and other grammatical rules applied) when those words can stand
+    independently, for purposes of standardization and minimization of
+    confusion, particularly in situations where Afghan writers are
+    inconsistent in their application of spacing and word breaks. When the
+    Afghan word or suffix is only used in combination with other nouns or
+    adjectives, then it should be appended to the preceding word in its
+    romanization. This includes (but is not limited to) - ābā , -zaī, -zā
+    ah, - ū, -wand, -gaī, -kaī, -pūr, - ēsh, -lar, -lī, -lū and ullāh, as,
+    for example, seen in Raḩmatābād (رحمت آباد) and Raḩmatullāh (رحمت االله),
+    but Raḩmat Khēl (رحمتخيل) and Raḩmat Shahr (رحمتشهر)."
+
+  - The one-letter words د (Pashto) and و (Dari) are romanized dê and
+    wa, respectively.
+
+  - The word الله, meaning God, should always be romanized Allāh,
+    except as specified in note 3. Note that the Unicode value FDF2 spells
+    Allāh, but omits the alif in some common fonts, including Times New
+    Roman. If in doubt, try in Arial Unicode MS to verify. Also note that
+    the “dagger al f” ( ) above the second ل (lām) n the ord الله, is not
+    written but should be romanized ā, like a full-size alif.
+
+  - In names of Arabic origin, the l of the definite article al s ass m
+    lated before the ‘s n letters’ , , , , r, z, s, sh, ş, ẕ, , z, l and n.
+    In its romanization, the article should be separated from the name it
+    precedes and should not be capitalized except at the beginning of a
+    name, e.g. جبل السراج→ Jabal
+    as Sarāj
+
+  - In Arabic names, a shaddah, ّ is used to denote the doubling of a
+    particular consonant character, e.g. ُم َح َمد → Muḩammad. Ho ever, n
+    Pashto th s ‘do bl ng’ s freq ently om tted n both Perso- Arabic script
+    and the resulting romanization. Guidance on doubling may be taken from
+    an authoritative names source, such as an Afghan government source or
+    Pashto dictionary; for example, it is usual to see Ḩājī without and
+    ‘Abbās with the doubled consonant. The doubled y consonant is almost
+    always retained, as in Sayyid or Qayyūm.
+
+  - In Afghan names which contain an iẕāfah, it should be romanized as
+    -e or –ye according to
+    common pronunciation, but generally, -e is used if the preceding word
+    ends with a consonant other
+    than silent heh, and -ye if the preceding word ends with a vowel
+    sound e.g. غر ِحصار → Ghar-e ِ
+    Ḩ şār; َقل َع ٔه َنو → Qal‘ah-ye Now. Scholarly sources indicate that
+    heh is silent in darah and qal‘ah (thus darah-ye, qal‘ah-ye), but
+    lightly spoken in kōh and chāh (thus kōh-e, chāh-e).
+
+  - The character sequence خو, where followed by ا or ی should be
+    romanized khwā or khwī, although the w is either not pronounced, or
+    only weakly so, as in خواجه → khwājah.
+
+  - Plural nouns ending in -hā or -ān should always be romanized as a
+    single word, regardless of whether a space appears in a Perso-Arabic
+    script source.
+
+  - Unicode values listed in the tables above are required to ensure
+    standardization and to minimize confusion from competing
+    representations of a given character. It should be noted that the
+    Persian Unicode value 0643 or FEDA( ك Unicode value 06A9) is
+    recommended rather than the Arabic( ک or FED9), the Persian گ (Unicode
+    value 06AF) is recommended rather than ګ (Unicode value 06AB) or ڰ
+    (Unicode value 06B0) or ك (Unicode value 0643 or FEDA or FED9), and the
+    Pashto character ځ (Unicode value 0681) is recommended rather than the
+    heh with a dot above and a dot below (no Unicode value). For the letter ی
+    in its many variations, care must be exercised to follow this romanization
+    guide's recommendations to eliminate confusion for search engines
+    and software. BGN/PCGN does not use the Unicode encoding FEEF for the
+    character ی in any Afghan word.
+
+  - |
+    An inventory of letter-diacritic combinations in addition to the
+    unmodified letters of the basic Roman script is:
+
+    ‘ (U+2018)
+    Ā (U+0100)
+    Á (U+00C1)
+    Ḏ (U+0044+0031)
+    Ē (U+9112)
+    Ê (U+00CA)
+    Ḩ (U+1E28)
+    Ī (U+012A)
+    N-bar-top (U+004E+0304)
+    Ō (U+014C)
+    R-bar-bottom (U+0052+0031)
+    Ş (U+015E)
+    S-bar-top (U+0053+0304)
+    Ṯ (U+0054+0031)
+    Ţ (U+0162)
+    Ū (U+918A)
+    Z-comma-bottom (U+005A+0327)
+    Z-bar-top (U+005A+0304)
+    Ẕ (U+005A+0331)
+    ẔH (U+005A+0048+035F)
+
+
+    ʼ (U+2019)
+    ā (U+0101)
+    á (U+00E1)
+    ḏ (U+0064+00031)
+    ē (U+0113)
+    ê (U+00EA)
+    ḩ (U+1E29)
+    ī (U+912B)
+    n-bar-top (U+004E+0304)
+    ō (U+014D)
+    r-bar-bottom (U+0072+0031)
+    ş (U+015F)
+    s-bar-top (U+0073+0304)
+    ṯ (U+0074+0031)
+    ţ (U+0163)
+    ū (U+918B)
+    z-comma-bottom (U+007A+0327)
+    z-bar-top (U+007A+0304)
+    ẕ (U+007A+0331)
+    zh-under-bar (U+007A+0068+035F)
+
+
+  - The Romanization columns show only lowercase forms but, when
+    romanizing, uppercase and lowercase Roman letters as appropriate should
+    be used.
+
+
+tests:
+  - source: بَغْلان
+    expected: Baghlān
+
+  - source: پوټكى
+    expected: Pōṯakay
+
+  - source: شِرين تَگَاب
+    expected: Shīrīn Tagāb
+
+  - source: کُوْټ
+    expected: Kōṯ
+
+  - source: ثَابِر
+    expected: Sā̄bir
+
+  - source: جَلال آبَاد
+    expected: Jalālābād
+
+  - source: چَاريكَار
+    expected: Chārīkār
+
+  - source: سُلْطَان حَضْرَتِ
+    expected: Ḩaẕrat-e Sulţān
+
+  - source: خُوْسْت
+    expected: Khōst
+
+  - source: ځَدْرَاڼ
+    expected: Dzadrāṉ
+
+  - source: څَوْآۍ
+    expected: Tsowkêy
+
+  - source: سْپِين بُوْلْدَک
+    expected: Spīn Bōldak
+
+  - source: ډَنْډ وَ پَتَان
+    expected: Ḏanḏ wa Patān
+
+  - source: گُذَرْگَاهٔ نُور
+    expected: Guz̄argāh-e Nūr
+
+  - source: آَنْدَهَار
+    expected: Kandahār
+
+  - source: اَنْدَړ
+    expected: Andaṟ
+
+  - source: آُنْدُز
+    expected: Kunduz
+
+  - source: ژْرَنْدَه مِيراَسْلَم
+    expected: Mīr Aslam Zhrandah
+
+  - source: ږِيَره
+    expected: Zh̲ī̲rah
+
+  - source: سَمَنْگَان
+    expected: Samangān
+
+  - source: مَزَارِ شَريف
+    expected: Mazār-e Sharīf
+
+  - source: آښتَه آَلا
+    expected: Ks̲h̲êtah Kalā
+
+  - source: قَيْصَار
+    expected: Qayşār
+
+  - source: فَيْض آبَاد
+    expected: Faīẕābād
+
+  - source: سُلْطَان حَضْرَتِ
+    expected: Ḩaẕrat-e Sulţān
+
+  - source: ظَاهِر آَلا
+    expected: Zā̧hir Kalā
+
+  - source: پُلِ عَلَم
+    expected: Pul-e ‘Alam
+
+  - source: غَزْنِى
+    expected: Ghaznī
+
+  - source: مَزَارِ شَريف
+    expected: Mazār-e Sharīf
+
+  - source: قَيْصَار
+    expected: Qayşār
+
+  - source: آَنْدَهَار
+    expected: Kandahār
+
+  - source: گَرْدېز
+    expected: Gardēz
+
+  - source: کَابُل
+    expected: Kābul
+
+  - source: مَيمَنَه
+    expected: Maīmanah
+
+  - source: خَان آبَاد
+    expected: Khānābād
+
+  - source: مَاڼۍ
+    expected: Māṉêy
+
+  - source: وَاخَان
+    expected: Wākhān
+
+  - source: هِرَات
+    expected: Herāt
+
+  - source: يَنْگِی قَلعَه
+    expected: Yangī Qal‘ah
+
+  - source: جَلال آبَاد
+    expected: Jalālābād
+
+  - source: پُلِ حِصَار هِرات
+    expected: Herāt, Pul-e Ḩişār
+
+  - source: کَابُل مُرْغَاب
+    expected: Murghāb, Kābul
+
+  - source: گردُون
+    expected: Gêrdōn
+
+  - source: آب بَنْد
+    expected: Āb Band
+
+  - source: بُوْلْدَک سْپِين
+    expected: Spīn Bōldak
+
+  - source: بَالا بُلُوک
+    expected: Bālā Bulūk
+
+  - source: جَوزجَان
+    expected: Jowzjān
+
+  - source: ، سْپِين غَزْنِى
+    expected: Ghaznī, Spīn
+
+  - source: ، ريگ مَيوَنْد
+    expected: Maywand, Rēg
+
+  - source: گَرْدېز
+    expected: Gardēz
+
+  - source: مَيدان شَهْر
+    expected: Maīdān Shahr
+
+  - source: ډَنْډِ سُفْلىٰ
+    expected: Ḏanḏ-e Suflá
+
+  - source: څَوْآۍ
+    expected: Tsowkêy
+
+  - source: هَوائِى ډَگَر
+    expected: Hawā’ī D̲agar
+
+  - source: شَريف مَزارِ
+    expected: Mazār-e Sharīf
+
+  - source: دايکندی
+    expected: Dāykundī
+
+  - source: زيارت
+    expected: Zīārat
+
+  - source: غوريان
+    expected: Ghōriyān
+
+  - source: ميا
+    expected: Myā
+
+map:
+  characters:
+
+    # These characters are not available with a single Unicode
+    # codepoint, so cannot be displayed here. When typing, the independent
+    # character’s codepoint will automatically display the the appropriate
+    # word-medial or word-final form where so appearing in a word.
+    '\u0627': '-'
+
+    '\u0628': 'b'
+    '\u067E': 'p'
+    '\u062A': 't'
+    '\u067C': 'ṯ'
+    '\u062B': '\u0073\u0304'
+    '\u062C': 'j'
+    '\u0686': 'ch'
+
+    # The variant form ج is seen infrequently and does not have a
+    # single Unicode encoding.
+    '\u0681': 'dz' # Note 2
+    '\u0685': 'ts' # Note 2
+
+    '\u062D': 'ḩ'
+    '\u062E': 'kh'
+    '\u062F': 'd'
+    '\u0689': 'ḏ'
+    '\u0630': '\u007A\u0304'
+    '\u0631': 'r'
+    '\u0693': '\u1E5F'
+    '\u0632': 'z'
+    '\u0698': 'zh'
+    '\u0696': '\u007A\u0332\u0068\u0332'
+    '\u0633': 's'
+    '\u0634': 'sh'
+    '\u069A': '\u0073\u0332\u0068\u0332'
+    '\u0635': 'ş'
+    '\u0636': 'ẕ'
+    '\u0637': 'ţ'
+    '\u0638': '\u007A\u0327'
+    '\u0639': '‘'
+    '\u063A': 'gh'
+    '\u0641': 'f'
+    '\u0642': 'q'
+    '\u06A9': 'k'
+    '\u06AF': 'g'
+    '\u0644': 'l'
+    '\u0645': 'm'
+    '\u0646': 'n'
+    '\u06BC': 'ṉ'
+    '\u0648': 'w'
+    '\u0647': 'h'
+    '\u0649': 'y'
+
+    # Vowel, Diphthong and Diacritical Characters
+
+    '\u064E': 'a'
+
+    # Both e and i are available to romanize this short vowel,
+    # depending on local usage and/or root language. In cases where the sound
+    # is uncertain, i is the default romanization in BGN/PCGN standardization
+    # procedures.
+    '\u0650':
+      - 'e'
+      - 'i'
+
+    # Both o and u are available to romanize this short vowel,
+    # depending on local usage and/or root language. In cases where the sound
+    # is uncertain, u is the default romanization in BGN/PCGN standardization
+    # procedures.
+    '\u064F':
+      - 'o'
+      - 'u'
+    '\u0659': 'ê'
+
+    # An alif with mad ( آ ) is written only in the initial position by
+    # BGN/PCGN standardization procedures, in keeping with Persian language
+    # family standards of use of the Arabic alphabet. The same letter written
+    # in a medial or final position is written . . .
+    '\u0622': 'ā'
+
+    '\u0648': 'ō'
+    '\u0648': 'ū'
+    '\u0648': '\u006F\u0077'
+    '\u06CC': 'ī'
+
+    # Or 'ē'. The character ی should be romanized ay or ē according to
+    # its root language or local pronunciation. In case of uncertainty a
+    # reference source (such as the Fairchild Aerial Surveys map series, or a
+    # BGN/PCGN approved policy document/list of recommended spellings) should
+    # be consulted.
+    '\u06CC': 'ay'
+    '\u06D0': 'ē'
+
+    # Or 'aī'. Both the combination ay and aī are available to romanize
+    # this character according to its root language or local pronunciation.
+    # In cases where the sound is uncertain ay is the default romanization in
+    # BGN/PCGN standardization procedures
+    '\u06CC':
+      - 'ay'
+      - 'á'
+    '\u06CD': 'êy'
+    '\u0621': '’'
+    '\u0674':
+      - '-e'
+      - '-ye'
+
+    # Other Diacritical Marks and Language Conventions
+
+    '\u0627': 'āy'
+
+    '\u0648': 'w'
+    '\u0626': '’'
+    '\u06C0': ''
+    '\u0651': ''
+    '\uFDF2': 'Allāh' # See note 5
diff --git a/maps/bgnpcgn-prs-Arab-Latn-yaghoubi.yaml b/maps/bgnpcgn-prs-Arab-Latn-yaghoubi.yaml
new file mode 100644
index 00000000..609d6019
--- /dev/null
+++ b/maps/bgnpcgn-prs-Arab-Latn-yaghoubi.yaml
@@ -0,0 +1,335 @@
+---
+authority_id: bgnpcgn
+id: yaghoubi
+language: prs # prs stands for Dari (https://iso639-3.sil.org/code/prs&_ga=GA1.2.2054538372.1574092823)
+source_script: Arab
+destination_script: Latn
+name: BGN/PCGN NATIONAL ROMANIZATION SYSTEM FOR AFGHANISTAN -- BGN/PCGN 2007 System (Yaghoubi)
+url: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/693661/ROMANIZATION_FOR_AFGHANISTAN.pdf
+creation_date: 2007
+confirmation_date: 2017-11
+description: |
+  This romanization system agreed by BGN and PCGN in November 2007,
+  accommodates the linguistic complexity of Afghanistan as manifest in
+  its geographical names.
+
+  The following tabulation shows the original Perso-Arabic script with
+  accompanying Unicode value (columns 1a and b), the Yaghoubi
+  romanization (column 2), the BGN/PCGN romanization with accompanying
+  Unicode value (columns 3a and b), an English phonetic example (column
+  4), and an example toponym (columns 5b and c).
+
+  [The Yaghoubi romanization system was developed in 1959 by
+  Muzaffarud Din Yaqubi (commonly seen as Yaghoubi). It is a native
+  official system designed to reflect Afghan names, both Dari and Pashto,
+  and both pronunciation and genuine linguistic truth.]
+
+  The tables function as both a romanization system for Afghanistan (i.e.
+  with access to the original script, these tables can be applied to get
+  a standardized Roman result - moving from columns 1 to 3) and as a
+  means of converting the available Yaghoubi Roman-script spellings, as
+  appear on the Fairchild Aerial Surveys map series, to standard BGN/PCGN
+  spellings (moving from columns 2 to 3).
+
+  The points used in Arabic to mark short vowels and certain other
+  diacritical marks are infrequently written in Afghanistan.
+  Consequently, a reference source may sometimes be required to aid
+  correct identification of the standard spellings and proper vowels and
+  elimination of dialectal and idiosyncratic variations. In the interests
+  of clarity, the example columns show script with vowel pointing from
+  Arabic to indicate the short vowels that are included alongside the
+  unpointed form that will usually be encountered. However it should be
+  noted that the pronunciation of short vowels will vary.
+
+  Note: it is recommended that a font such as Scheherazade, available
+  from www.sil.org, which includes the Unicode extended Arabic sub-range,
+  be used to view this system. [Please note that the identification of a
+  particular font does not represent an endorsement of any specific
+  product or manufacturer.]
+
+notes:
+  - |
+    Alif (ا) should be romanized as follows:
+
+    a. Initially, it indicates that the word begins with a vowel or
+      diphthong; the alif itself is not romanized, but rather the short vowel
+      it “carr es” is romanized; e.g., ميړ أَسَلم ژرَندَه → Mī Aslam Zhrandah
+    b. When it carries a maddah (آ) (see vowel table, row 6), it
+      represents ā; e.g., آب بَند → Āb Band.
+    c. Medially and finally it represents ā (see vowel table, row 5);
+      e.g., ماڼۍ → Māṉêy
+    d. Medially and finally in words of Arabic origin, alif may serve
+      as the bearer of hamzah, e.g. رأس → ra’s.
+
+  - Occasionally the letter sequences سه ,زه ,که, and گه occur without
+    intervening vowels. They may be romanized k·h, z·h, s·h, and g·h in
+    order to differentiate these romanizations from the digraphs kh, zh,
+    sh, and gh, which are used to represent the letters ش ,ژ ,خ, and غ.
+    Additionally, the Pashto letters څ and ځ, routinely romanized ts and
+    dz, may be alternatively romanized s and z تس when for special reasons
+    it is desired that confusion be avoided with the character sequences
+    (ts) and دز (dz), respectively.
+
+  - "The vagaries of written Afghan languages, as pertains to spacing
+    and word division, are addressed as follows:
+    Spaces may be added to or subtracted from Afghan words written in
+    Arabic script, for the purposes of standardization. This is
+    particularly relevant when the words are hand-written, are rendered
+    “art st cally”, or express other s ch non-standard flourishes, as long
+    as the sense of the toponym, word, or phrase is not compromised.
+    Romanized toponyms are typically divided into constituent words
+    (spaces and other grammatical rules applied) when those words can stand
+    independently, for purposes of standardization and minimization of
+    confusion, particularly in situations where Afghan writers are
+    inconsistent in their application of spacing and word breaks. When the
+    Afghan word or suffix is only used in combination with other nouns or
+    adjectives, then it should be appended to the preceding word in its
+    romanization. This includes (but is not limited to) - ābā , -zaī, -zā
+    ah, - ū, -wand, -gaī, -kaī, -pūr, - ēsh, -lar, -lī, -lū and ullāh, as,
+    for example, seen in Raḩmatābād (رحمت آباد) and Raḩmatullāh (رحمت االله),
+    but Raḩmat Khēl (رحمتخيل) and Raḩmat Shahr (رحمتشهر)."
+
+  - The one-letter words د (Pashto) and و (Dari) are romanized dê and
+    wa, respectively.
+
+  - The word الله, meaning God, should always be romanized Allāh,
+    except as specified in note 3. Note that the Unicode value FDF2 spells
+    Allāh, but omits the alif in some common fonts, including Times New
+    Roman. If in doubt, try in Arial Unicode MS to verify. Also note that
+    the “dagger al f” ( ) above the second ل (lām) n the ord الله, is not
+    written but should be romanized ā, like a full-size alif.
+
+  - In names of Arabic origin, the l of the definite article al s ass m
+    lated before the ‘s n letters’ , , , , r, z, s, sh, ş, ẕ, , z, l and n.
+    In its romanization, the article should be separated from the name it
+    precedes and should not be capitalized except at the beginning of a
+    name, e.g. جبل السراج→ Jabal
+    as Sarāj
+
+  - In Arabic names, a shaddah, ّ is used to denote the doubling of a
+    particular consonant character, e.g. ُم َح َمد → Muḩammad. Ho ever, n
+    Pashto th s ‘do bl ng’ s freq ently om tted n both Perso- Arabic script
+    and the resulting romanization. Guidance on doubling may be taken from
+    an authoritative names source, such as an Afghan government source or
+    Pashto dictionary; for example, it is usual to see Ḩājī without and
+    ‘Abbās with the doubled consonant. The doubled y consonant is almost
+    always retained, as in Sayyid or Qayyūm.
+
+  - In Afghan names which contain an iẕāfah, it should be romanized as
+    -e or –ye according to
+    common pronunciation, but generally, -e is used if the preceding word
+    ends with a consonant other
+    than silent heh, and -ye if the preceding word ends with a vowel
+    sound e.g. غر ِحصار → Ghar-e ِ
+    Ḩ şār; َقل َع ٔه َنو → Qal‘ah-ye Now. Scholarly sources indicate that
+    heh is silent in darah and qal‘ah (thus darah-ye, qal‘ah-ye), but
+    lightly spoken in kōh and chāh (thus kōh-e, chāh-e).
+
+  - The character sequence خو, where followed by ا or ی should be
+    romanized khwā or khwī, although the w is either not pronounced, or
+    only weakly so, as in خواجه → khwājah.
+
+  - Plural nouns ending in -hā or -ān should always be romanized as a
+    single word, regardless of whether a space appears in a Perso-Arabic
+    script source.
+
+  - Unicode values listed in the tables above are required to ensure
+    standardization and to minimize confusion from competing
+    representations of a given character. It should be noted that the
+    Persian Unicode value 0643 or FEDA( ك Unicode value 06A9) is
+    recommended rather than the Arabic( ک or FED9), the Persian گ (Unicode
+    value 06AF) is recommended rather than ګ (Unicode value 06AB) or ڰ
+    (Unicode value 06B0) or ك (Unicode value 0643 or FEDA or FED9), and the
+    Pashto character ځ (Unicode value 0681) is recommended rather than the
+    heh with a dot above and a dot below (no Unicode value). For the letter ی
+    in its many variations, care must be exercised to follow this romanization
+    guide's recommendations to eliminate confusion for search engines
+    and software. BGN/PCGN does not use the Unicode encoding FEEF for the
+    character ی in any Afghan word.
+
+  - |
+    An inventory of letter-diacritic combinations in addition to the
+    unmodified letters of the basic Roman script is:
+
+      ‘ (U+2018)
+      Ā (U+0100)
+      Á (U+00C1)
+      Ḏ (U+0044+0031)
+      Ē (U+9112)
+      Ê (U+00CA)
+      Ḩ (U+1E28)
+      Ī (U+012A)
+      N-bar-top (U+004E+0304)
+      Ō (U+014C)
+      R-bar-bottom (U+0052+0031)
+      Ş (U+015E)
+      S-bar-top (U+0053+0304)
+      Ṯ (U+0054+0031)
+      Ţ (U+0162)
+      Ū (U+918A)
+      Z-comma-bottom (U+005A+0327)
+      Z-bar-top (U+005A+0304)
+      Ẕ (U+005A+0331)
+      ẔH (U+005A+0048+035F)
+
+
+      ʼ (U+2019)
+      ā (U+0101)
+      á (U+00E1)
+      ḏ (U+0064+00031)
+      ē (U+0113)
+      ê (U+00EA)
+      ḩ (U+1E29)
+      ī (U+912B)
+      n-bar-top (U+004E+0304)
+      ō (U+014D)
+      r-bar-bottom (U+0072+0031)
+      ş (U+015F)
+      s-bar-top (U+0073+0304)
+      ṯ (U+0074+0031)
+      ţ (U+0163)
+      ū (U+918B)
+      z-comma-bottom (U+007A+0327)
+      z-bar-top (U+007A+0304)
+      ẕ (U+007A+0331)
+      zh-under-bar (U+007A+0068+035F)
+
+  - The Romanization columns show only lowercase forms but, when
+    romanizing, uppercase and lowercase Roman letters as appropriate should
+    be used.
+
+
+tests:
+  - source: بَغْلان
+    expected: Baghlān
+  - source: پوټكى
+    expected: Pōṯakay
+  - source: شِرين تَگَاب
+    expected: Shīrīn Tagāb
+  - source: کُوْټ
+    expected: Kōṯ
+  - source: ثَابِر
+    expected: Sā̄bir
+
+map:
+  characters:
+
+    # These characters are not available with a single Unicode
+    # codepoint, so cannot be displayed here. When typing, the independent
+    # character’s codepoint will automatically display the the appropriate
+    # word-medial or word-final form where so appearing in a word.
+    '\u0627': '-'
+
+    '\u0628': 'b'
+    '\u067E': 'p'
+    '\u062A': '\u0074\u0304'
+    '\u067C': 't'
+    '\u062B': '\u0073\u0304'
+    '\u062C': 'j'
+    '\u0686': 'č'
+
+    # The variant form ج is seen infrequently and does not have a single Unicode encoding.
+    '\u0681': '\u006A\u0304' # Note 2
+    '\u0685': 'c' # Note 2
+
+    '\u062D': 'ẖ'
+    '\u062E': 'kh'
+    '\u062F': 'ḏ'
+    '\u0689': 'd'
+    '\u0630': '\u007A\u0304'
+    '\u0631': 'ṟ'
+    '\u0693': 'r'
+    '\u0632': 'z'
+    '\u0698': 'ž'
+    '\u0696': '\u017E\u0332'
+    '\u0633': 's'
+    '\u0634': 'š'
+    '\u069A': '\u0161\u0332'
+    '\u0635': '\u0073\u0332'
+    '\u0636': '\u0064\u0332\u007A'
+    '\u0637': 'ṯ'
+    '\u0638': 'ẕ'
+    '\u0639': '’'
+    '\u063A': 'gh'
+    '\u0641': 'f'
+    '\u0642': 'q'
+    '\u06A9': 'k'
+    '\u06AF': 'g'
+    '\u0644': 'l'
+    '\u0645': 'm'
+    '\u0646': 'n'
+    '\u06BC': 'ṉ'
+    '\u0648': 'w'
+    '\u0647': 'h'
+    '\u0649': 'y'
+
+    # Vowel, Diphthong and Diacritical Characters
+
+    '\u064E':
+      - 'a'
+      - 'â'
+
+    # Both e and i are available to romanize this short vowel,
+    # depending on local usage and/or root language. In cases where the sound
+    # is uncertain, i is the default romanization in BGN/PCGN standardization
+    # procedures.
+    '\u0650':
+      - 'e'
+      - 'i'
+
+    # Both o and u are available to romanize this short vowel,
+    # depending on local usage and/or root language. In cases where the sound
+    # is uncertain, u is the default romanization in BGN/PCGN standardization
+    # procedures.
+    '\u064F':
+      - 'o'
+      - 'u'
+
+    '\u0659':
+      - 'ə'
+      - 'ê'
+    '\u0622': 'ā'
+
+    # An alif with mad ( آ ) is written only in the initial position by
+    # BGN/PCGN standardization procedures, in keeping with Persian language
+    # family standards of use of the Arabic alphabet. The same letter written
+    # in a medial or final position is written . . .
+    '\u0648': 'ō'
+
+    '\u0648':
+      - 'u'
+      - 'ū'
+
+    '\u0648': 'aw' # Or 'āw'
+    '\u06CC': 'i' # Or 'ī'
+
+    # Or 'ē'. The character ی should be romanized ay or ē according to
+    # its root language or local pronunciation. In case of uncertainty a
+    # reference source (such as the Fairchild Aerial Surveys map series, or a
+    # BGN/PCGN approved policy document/list of recommended spellings) should
+    # be consulted.
+    '\u06CC': 'ay'
+
+    '\u06D0': 'ē' # Or 'ay'
+    '\u06CC': 'ay' # Or 'āy'.
+
+    # Both the combination ay and aī are available to romanize this
+    # character according to its root language or local pronunciation. In
+    # cases where the sound is uncertain ay is the default romanization in
+    # BGN/PCGN standardization procedures
+    '\u06CC': 'ā'
+
+    '\u06CD': 'ə y' # Or 'ay'
+    '\u0621': '’'
+    '\u0674':
+      - '-i-'
+      - 'e'
+      - 'ī'
+
+    # Other Diacritical Marks and Language Conventions
+
+    '\u0627': 'ay' # Or 'āy'
+    '\u06CC': 'ya' # Or 'yā'
+
+    '\u0648': 'w'
+    '\u06C0': '. . .h-e'
diff --git a/reference-docs/bgn-pcgn/ROMANIZATION_FOR_AFGHANISTAN.pdf b/reference-docs/bgn-pcgn/ROMANIZATION_FOR_AFGHANISTAN.pdf
new file mode 100644
index 00000000..6afb9f2e
Binary files /dev/null and b/reference-docs/bgn-pcgn/ROMANIZATION_FOR_AFGHANISTAN.pdf differ