# English-Gujarati Transliteration

Transliteration is the process of transferring a word from the alphabet of one language to another. Here, I, Manav Desai aim to transliterate English words in Gujarati. 

The main challenge in this is fact that English words do not contain pronunciation information by default whereas Gujarati is written in the same way as it is spoken.

This is a transliteration scheme designed by me for tranliterating English words to Gujarati text. I have designed a parser which parses English words and prints them after transliterating them. The parser still does not work for words with all sounds and there is scope for improvement in the same.

## Method

First, all the major sounds in English sounds are listed and mapped to corresponding Gujarati letters. We also note the following:
- Sounds with 'h' in them (like kha, gha, tha) haven't been added and only the base sounds (like ka, ga, ta) have been added.
- h has been added as a vowel as it becomes a sound rather than a letter in Gujarati.

In [8]:
consonant_list = ['M', 'N', 'V', 'T', 'S', 'R', 'Y', 'D', 'L', 'G', 'K', 'C', 'J', 'P', 'B']
vowel_list = ['A', 'E', 'I', 'O', 'U', 'H']

half_letter = '\u0ACD'

consonant_map = {
    'M':'\u0AAE',
    'N':'\u0AA8',
    'V':'\u0AB5',
    'T':'\u0AA4',
    'S':'\u0AB8',
    'R':'\u0AB0',
    'Y':'\u0AAF',
    'D':'\u0AA6',
    'L':'\u0AB2',
    'G':'\u0A97',
    'K':'\u0A95',
    'C':'\u0A9A',
    'J':'\u0A9C',
    'P':'\u0AAA',
    'B':'\u0AAD'
}

independent_vowel_map = {
    'A':'\u0A85',
    'AA':'\u0A86',
    'I':'\u0A87',
    'II':'\u0A88',
    'U':'\u0A89',
    'UU':'\u0A8A',
    'H':'\u0AB9',
    'E':'\u0A8F',
    'O':'\u0A93'
}

dependent_vowel_map = {
    '\u0A85':'',
    '\u0A86':'\u0ABE',
    '\u0A87':'\u0ABF',
    '\u0A88':'\u0AC0',
    '\u0A89':'\u0AC1',
    '\u0A8A':'\u0AC2',
    '\u0A8F':'\u0AC7',
    '\u0A93':'\u0ACB'
}

This function checks for h sounds in the word and updates the consonants accordingly. On observing Unicode for Gujarati letters, we observe that, except sa, all h sounding letters are just after their corresponding base sounds. For eg. tha comes just after ta.

In [9]:
def check_h_sound(transliterated_word):    
    i = 1
    while(i < len(transliterated_word)):
        if transliterated_word[i] == '\u0AB9': # H
            
            prefix = transliterated_word[:(i - 1)]            
            suffix = transliterated_word[(i + 1):]
            
            if(transliterated_word[i - 1] == '\u0AB8'):
                middle = '\u0AB6'
            else:
                middle = chr(ord(transliterated_word[i - 1]) + 1)
            
            transliterated_word = prefix + middle + suffix
            word_len = len(transliterated_word)
            i = i - 1
        i = i + 1
    return transliterated_word

It is observed that whenever we have 2 consecutive consonants in English language, the transliterated word will have a half letter and a complete letter. This function enters the viram sign to convert a complete letter into half letter.

In [10]:
def insert_half_letters(transliterated_word):
    i = 1
    while(i < len(transliterated_word)):
        curr_unicode = ord(transliterated_word[i])
        prev_unicode = ord(transliterated_word[i - 1])
        if(((curr_unicode >= 0x0A95) and (curr_unicode <= 0x0AB9)) and ((prev_unicode >= 0x0A95) and (prev_unicode <= 0x0AB9))):
            prefix = transliterated_word[:(i - 1)]
            suffix = transliterated_word[(i + 1):]
            middle = transliterated_word[i - 1] + half_letter + transliterated_word[i]
            
            transliterated_word = prefix + middle + suffix
            i = i + 2
        i = i + 1

    return transliterated_word

The longer maatras in Gujarati are required when we have ii, uu, aa in English language. This function takes care of this. It does this by converting 2 consecutive short maatras into a longer maatra.

In [11]:
def update_maatras(transliterated_word):
    i = 1
    while(i < len(transliterated_word)):
        if(transliterated_word[i] in independent_vowel_map.values()):
            if(transliterated_word[i - 1] == transliterated_word[i]):
                
                prefix = transliterated_word[:(i - 1)]
                suffix = transliterated_word[(i + 1):]
                middle = chr(ord(transliterated_word[i]) + 1)
                
                transliterated_word = prefix + middle + suffix
                i = i - 1
        i = i + 1
    return transliterated_word

Except the case when the Gujarati word starts with a vowel, the vowels are represented as symbols (dependent vowels) in conjunction to the consonants. This function changes the independent vowels to dependent vowels.

In [12]:
def convert_to_dependent_vowels(transliterated_word):
    i = 1
    while(i < len(transliterated_word)):
        if(transliterated_word[i] in independent_vowel_map.values()):
            prefix = transliterated_word[:i]
            suffix = transliterated_word[(i + 1):]
            middle = dependent_vowel_map.get(transliterated_word[i])
            
            transliterated_word = prefix + middle + suffix
        i = i + 1
    return transliterated_word

The transliteration algorithm transliterates the english word to Gujarati text by parsing in multiple passes. As a preprocessing step, the word is converted to upper case. This does not alter the transliterated word as Gujarati does not differentiate between upper and lower case letters. Also, we remove the letter E if it comes at the end as it is almost invariably silent. The details of each pass are as follows:

1. Replace the English letters with the corresponding Gujarati consonants and vowels. We can now discard the English word.
2. Update the consonants with h sound in transliterated word.
3. Update half letters by adding Virama at appropriate places.
4. Add longer maatras wherever required.
5. Convert independent vowels to dependent vowels.

In [13]:
# list of English words to be transliterated
# edit this list to check for other words
english_word_list = ['mAAnAv', 'maanath', 'maanyataa', 'shriimaan', 'dhruvin', 'devAans', 'vatsal', 'SORI', 'agatya']

# initiaize transliterated word list
gujarati_words = []

# loop through the words
for eng_word in english_word_list:
    eng_word = eng_word.upper()
    
    # drop e if it occurs at the end
    if ('E' == eng_word[len(eng_word) - 1]):
        eng_word = eng_word[:-1]        
    
    transliterated_word = ''    
    
    # get consonant and independent vowel representation of word
    try:
        for letter in list(eng_word):        
            if letter in consonant_list:
                transliterated_word += consonant_map.get(letter)
            else:
                transliterated_word += independent_vowel_map.get(letter)
    
    except TypeError:
        print(letter)
        break

    # update consonants with h sound
    transliterated_word = check_h_sound(transliterated_word)
    
    # handle half letters
    transliterated_word = insert_half_letters(transliterated_word)

    # update maatras
    transliterated_word = update_maatras(transliterated_word)
    
    # convert to dependent vowels
    transliterated_word = convert_to_dependent_vowels(transliterated_word)
        
    print(eng_word, transliterated_word)
        

MAANAV માનવ
MAANATH માનથ
MAANYATAA માન્યતા
SHRIIMAAN શ્રીમાન
DHRUVIN ધ્રુવિન
DEVAANS દેવાન્સ
VATSAL વત્સલ
SORI સોરિ
AGATYA અગત્ય
