# Pig Latin Translator

## Pig Latin is a frolic language to encode messages in English; herein, a first appoach of a translator has been implemented in Python. In order to learn how to speak it, one has to follow the next rules:

### Rules: 

#### If the word starts with a consonant and the second letter is a vowel, the first letter is thrown to the end followed by an -ay.  E.G. pelican => elican-pay
#### If the word starts with a vowel, an -yay is added to the end. E.G. american => american-yay
####  If the word starts with a cluster of consonants, the cluster is thrown to the end followed by an -ay. E.G. smart => art-smay
#### but, if a -y is inside of the cluster, then the cluster is cut up to the -y and afterwards it's thrown to the end followed by an -ay. E.G. Gryffindor => Yffindor-Gray
#### If a word is compound, it must be splitted and then apply the previous rules individually. E.G. waistcoat => aist-way oat-cay

## To assess whether words are compound or not, an implementation harnessing web scaping and multithreading was developed. The linguistic source is [WordReference.com](https://www.wordreference.com/) and the frame to examine goes about Etymology.
## The ```isWordCompound``` function takes care of the scraping and ```compoundWordsInString``` function raises the threads and returns which word is compound and its parts. 
## The ```pigLatinWord2Word``` function comprises the rules explained previously and ```pigLatinStringTranslator``` brings together all previous functions in order to translate full documents.  

### Libraries:

#### BeautifulSoup 4 (Parse REST)
#### requests (REST)
#### concurrent (Multithreading)
#### re (REGEX)

In [1]:
from bs4 import BeautifulSoup as bs
import requests
import concurrent.futures
import re

def isWordCompound(word):
    '''
        Check if a word is a compound word in English language. 
        If it is a compound word, it returns a list with its parts,
        otherwise, it return False.
        
        This function is based scaping WordReference.com website and
        extracting the parts from the Etymology frame. 
        
        input -> word: String
        output -> words: List or False
    '''
    
    url = ''.join(["https://www.wordreference.com/definition/", word]) # build URL
    r = requests.get(url)
    try:
        # substract averything except letters, only filter text content
        words = [re.sub(r'[^a-zA-Z ]', '', word.get_text()) for word in bs(r.text).find(class_ = "etyLi").find_all("span")]
    except Exception as e:
        # compensate for words which are not found or present any other error
        words = []
    
        # only return if there are two part, for the moment
    if len(words) >= 2 and ''.join(words) == word: return words
    else: return False

def compoundWordsInString(string):
    '''
        To optimize the scraping process for the isWordCompund function,
        this mutlithreading approach will lower the execution time by running
        in parallel the get requests if there are more than 1 words.
        The output is a list comprising the results from each word examined
        by the isWordCompound function.
        
        input -> sentence: String
        outpir -> compoundWords: List
    '''
    MAX_THREADS = 30
    if type(string) == str: string = string.split()
    threads = min(MAX_THREADS, len(string)) 
    
    with concurrent.futures.ThreadPoolExecutor(max_workers = threads) as executor:
        compoundWords = [output for output in executor.map(isWordCompound, string)]
        
    return compoundWords

def pigLatinWord2Word(word):
    '''
        Translate a single word into its Pig Latin version in a lowercase fashion.
        
        input -> word: String
        output -> pl_word: String
    '''
    
    vowels = "aeiou" # vowels pool
    word = word.strip().lower() #lower case homologation and strip padding spaces and tabs
    
    # handle single letter words. e.g. I and a
    if len(word) == 1:
        pl_word = '-'.join([word, "yay"])
        return pl_word
    
    fstLetter = word[0]
    sndLetter = word[1]
    # Assess vowel rule: If word starts with vowel, just add -yay to the end
    if fstLetter in vowels:
        pl_word = '-'.join([word, "yay"])
        return pl_word
    
    # Assess consonants rules
    elif sndLetter in vowels:
        # If word starts with single consonant, send it to the end and add -ay
        pl_word = '-'.join([word[1:], fstLetter + 'ay'])
        return pl_word
    else:
        # If word starts with consonat cluster, send the cluster to the end and add -ay
        i = float('inf')
        # If cluster ends with -y, send the cluster up to -y to the end and add -ay
        for vowel in vowels + 'y':
            i_ = word.find(vowel)
            if i_ < i and i_ > 0: i = i_ # find which vowel including -y appears first
        pl_word = '-'.join([word[i:], word[:i] + 'ay'])
        return pl_word

def pigLatinStringTranslator(string):
    '''
        It returns the Pig Latin Translation for any string with one or more words in
        English. 
        
        input -> string: String
        output -> pl_string: String
    '''
    string_ = re.sub(r'[^a-zA-Z ]', '', string) # keep letters only
    pl_string = ' ' + string_ + ' ' # copy to replace translations in
    string_ = string_.split() # copy to clean up string and run per-word translation
    
    # check for compound words
    compoundWords = compoundWordsInString(string_)
    
    for cW, word in zip(compoundWords, string_):
        
        # translate accordingly whether it's a compound word or not
        if cW: pl_word = ' '.join([pigLatinWord2Word(word_) for word_ in cW])
        else: pl_word = pigLatinWord2Word(word)
        
        # replace translations; add padding spaces to avoid in-word replacements
        pl_string = pl_string.replace(' ' + word + ' ', ' ' + pl_word + ' ') 

    return pl_string.strip()

In [2]:
sentence = "I wear an American waistcoat when I leave my bedroom, but I do not use it if I play football in Canada."
print(f"The word by word Pig Latin translation of \n {sentence} \n is \n {pigLatinStringTranslator(sentence)} \n ")

The word by word Pig Latin translation of 
 I wear an American waistcoat when I leave my bedroom, but I do not use it if I play football in Canada. 
 is 
 i-yay ear-way an-yay american-yay aist-way oat-cay en-whay i-yay eave-lay y-may ed-bay oom-ray ut-bay i-yay o-day ot-nay use-yay it-yay if-yay i-yay ay-play oot-fay all-bay in-yay anada-cay 
 
