## Part2-NLP Language Model : *Text Generation, AutoComplete, Autocorrection*

In this notebook we will improve the performance of the Ngram Model then implement the text generation, autocorrector and a method for autocomplete

### 1.0 Importing Necessary Libraries

In [1]:
from typing import Counter
from collections import defaultdict
from ast import literal_eval
from auto_corrector import Auto_Corrector
import configparser
import numpy as np
import logging
import json
import math
import re
import os

### 2.0 Handling Configuration File

These functions manage the `configuration.json` file:

- **Checking File Existence:** Verifying if the configuration file exists.
- **Reading File Line by Line:** Reading the contents of the configuration file line by line.
- **Loading JSON Configuration:** Loading the JSON data from the configuration file and handling any exceptions that may occur during the process.

In [2]:
def read_file(text):
    with open(text, "r", encoding="utf-8") as file:
        all_lines = file.readlines()
    return all_lines

In [3]:
def file_exists(file_path):
    return os.path.exists(file_path)

In [4]:
def load_json(file_path):
    try:
        with open(file_path, 'r') as json_file:
            data = json.load(json_file)
        return data

    except FileNotFoundError:
        logging.error(f"File '{file_path}' not found.")
        return None

    except json.JSONDecodeError as e:
        logging.error(f"decoding JSON... {e}")
        return None

    except Exception as e:
        logging.error(f"An error occurred:{e}")
        return None

In [5]:
def save_dict_to_json(data, file_path):
    try:
        with open(file_path, 'w') as json_file:
            json.dump(data, json_file)
            logging.info(f"Dictionary saved to '{file_path}' successfully.")
    except Exception as e:
        logging.error(f"An error occurred:{e}")

### 3.0 Building the Ngram Language Model

In [20]:

class NgramLanguageModel:

    """ 
        This class represents an N-gram language model for text generation, autocorrection and textcomplete.

        Attributes
        ----------
        ngram_size : int
            The size of the N-grams used in the model.

        unigram_counts : Counter
            Counter object to store counts of individual words.

        trigram_counts : Counter
            Counter object to store counts of trigrams.

        bigram_counts : Counter
            Counter object to store counts of bigrams.

        probabilities_bigram : dict
            Dictionary to store probabilities of bigrams.

        probabilities_trigram : dict
            Dictionary to store probabilities of trigrams.

        k : float
            Smoothing parameter for Laplace smoothing.

        alphabets : str
            Regular expression pattern for alphabetic characters.

        prefixes : str
            Regular expression pattern for prefixes.

        suffixes : str
            Regular expression pattern for suffixes.

        starters : str
            Regular expression pattern for sentence starters.

        acronyms : str
            Regular expression pattern for acronyms.

        websites : str
            Regular expression pattern for website URLs.

        digits : str
            Regular expression pattern for digits.

        multiple_dots : str
            Regular expression pattern for multiple consecutive dots.

        characters : list
            List of characters used for replacements.

        replacements : dict
            Dictionary for character replacements.

        lines_processed : int
            Number of lines processed during training.
        """ 

    def __init__(self, ngram_size):
        
        logging.basicConfig(filename='ngram_language_model.log', level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

        """ 
        Initialize the NgramLanguageModel object with the specified N-gram size.

        Parameters
        ----------
        ngram_size : int
            The size of the N-grams used in the model.
        """ 
        
        logging.info("Initializing NgramLanguageModel with ngram size: %d", ngram_size)
        file_path = 'configuration.json'

        # cheking if the configuration file exist if not the constructor initialize the variale and save them in the configuration file

        if  not file_exists(file_path):
            logging.info("Configuration file doesn't exist. Initializing variables and saving to configuration file.")

            self.ngram_size = ngram_size

            self.unigram_counts = Counter()
            self.trigram_counts = Counter()
            self.bigram_counts = Counter()

            self.probabilities_bigram = defaultdict()
            self.probabilities_trigram = defaultdict()

            self.k = 0.01

            self.alphabets= "([A-Za-z])"
            self.prefixes = "(Mr|St|Mrs|Ms|Dr)[.]"
            self.suffixes = "(Inc|Ltd|Jr|Sr|Co|www)"
            self.starters = "(Mr|Mrs|Ms|Dr|Prof|Capt|Cpt|Lt|He\s|She\s|It\s|They\s|Their\s|Our\s|We\s|But\s|However\s|That\s|This\s|Wherever)"
            self.acronyms = "([A-Z][.][A-Z][.](?:[A-Z][.])?)"
            self.websites = "[.](com|net|org|io|gov|edu|me)"
            self.digits = "([0-9])"
            self.multiple_dots = r'\.{2,}'

            self.characters = ["”", "\"", "!", "!", "?"]
            self.replacements = {'.': '.<stop>', '?': '?<stop>', '!': '!<stop>',':)':':)<stop>', '<prd>': '.','/n':' ', '#': '<hashtag>'}

            self.lines_processed = 0

            self.save_configurations()

        # if the configuration file already exist the contrustor load it and intialize the variable with the value in the configuration file
        else:

          try:

            logging.info("Loading configurations from existing configuration file.")

            setup = configparser.ConfigParser()
            setup.read('config.ini')

            configuration = load_json(file_path)
            dictionary = configuration["dictionary"]

            self.ngram_size = dictionary["ngram_size"]

            self.ngram_size = ngram_size
            
            self.unigram_counts =  Counter(dictionary["unigram_count"])
            self.trigram_counts = Counter(dictionary["trigram_count"])
            self.bigram_counts = Counter(dictionary["bigram_count"])

            self.probabilities_bigram = dictionary["probabilities_bigram"]
            self.probabilities_trigram = dictionary["probabilities_trigram"]


            self.k = literal_eval(setup['settings']["k"]) 

            self.alphabets= setup['settings']["alphabets"]
            self.prefixes = setup['settings']["prefixes"]
            self.suffixes =  setup['settings']["suffixes"]
            self.starters = setup['settings']["starters"]
            self.acronyms = setup['settings']["acronyms"]
            self.websites = setup['settings']["websites"]
            self.digits = setup['settings']["digits"]
            self.multiple_dots = setup['settings']["multiple_dots"]

            self.characters =literal_eval(setup['settings']["characters"]) 
            self.replacements =literal_eval(setup['settings']["replacements"]) 

            self.lines_processed =literal_eval( setup['settings']["lines_processed"])
            

          except Exception as e :
            logging.error("Error occurred during initialization: %s", e)


    def split_into_sentences(self,text):

        """ 
        Split the input text into sentences and preprocess each sentence.

        This method splits a text into sentences by the tag <stop>, considering various sentence delimiters 
        and respecting prefixes, suffixes, acronyms, etc. It also transforms the text into lowercase format.
        ps : an end of a sentence is defined by !, ?, ., :) + the . does not only define a end of a sentence! 

        Parameters
        ----------
        text : str
            The input text to be split into sentences.

        Returns
        -------
        str
            The preprocessed text with <stop> tags indicating the end of sentences.
        """ 

        text = text.replace("\n"," ")
        text = re.sub(self.prefixes,"\\1<prd>",text)
        text = re.sub(self.websites,"<prd>\\1",text)
        text = re.sub(self.digits + "[.]" + self.digits,"\\1<prd>\\2",text)
        text = re.sub(self.multiple_dots, lambda match: "<prd>" * len(match.group(0)) + "<stop>", text)

        if "Ph.D" in text: text = text.replace("Ph.D","Ph<prd>D")

        text = re.sub("\s" + self.alphabets + "[.] "," \\1<prd> ",text)
        text = re.sub(self.acronyms+" "+self.starters,"\\1<stop> \\2",text)
        text = re.sub(self.alphabets + "[.]" + self.alphabets + "[.]" + self.alphabets + "[.]","\\1<prd>\\2<prd>\\3<prd>",text)
        text = re.sub(self.alphabets + "[.]" + self.alphabets + "[.]","\\1<prd>\\2<prd>",text)
        text = re.sub(" "+self.suffixes+"[.] "+self.starters," \\1<stop> \\2",text)
        text = re.sub(" "+self.suffixes+"[.]"," \\1<prd>",text)
        text = re.sub(" " + self.alphabets + "[.]"," \\1<prd>",text)
        text = re.sub(r'[!?.,|]+', lambda match: match.group(0)[0], text)

        text = text.replace(".”","”.")
        for char in self.characters:
            if char in text:
                text = text.replace(f"{char}\"", f"\"{char}")

        for char, replacement in self.replacements.items():
            text = text.replace(char, replacement)

        text = text.lower()

        return text


    def prepare_data(self,file,batch_size = 300) -> list[str]:

        """ 
        Preprocess the training data and count N-grams.

        This method processes the training data by splitting it into sentences, extracting tokens, 
        replacing low-frequency words with '<unk>', and counting unigrams, bigrams, and trigrams.

        Parameters
        ----------
        file : str
            File path to the training corpus.

        batch_size : int, optional
            Number of lines to process at once, by default 1.

        Returns
        -------
        dict
            counts for unigrams, bigrams, and trigrams.
        """ 

        text_lines = read_file(file)

        for i in range(self.lines_processed, len(text_lines),batch_size):

            selected_lines = text_lines[i:(min(i + batch_size, len(text_lines)))]
            text = ""
            for s in selected_lines:
                text +=  s + " "
            text = self.split_into_sentences(text)
            sentences = text.split("<stop>")
            sentences = [s.strip().lower() for s in sentences]
            sentences = ['<s> '* (self.ngram_size -1) + item + " </s>" for item in sentences]

            if sentences and not sentences[-1]: sentences = sentences[:-1]

            for s in sentences:
                matches = re.findall(r'\b(?![<>\w]*>)\w+\b|<s>|<unk>|</s>|[.,?!]', s)
                self.unigram_counts += Counter(matches)
                self.bigram_counts += Counter([matches[i].strip()+" "+matches[i+1].strip() for i in range(len(matches)-1)])
                self.trigram_counts += Counter([matches[i].strip()+" "+matches[i+1].strip()+" "+matches[i+2].strip() for i in range(len(matches)-2)])

                
            self.lines_processed = min(i + batch_size,len(text_lines))
            print(f"Line {i} Successfully proccessed")
            logging.info(f"Line {i} Successfully proccessed")
            try:
                self.save_configurations()
            except:
                print(f"failed to save batch number {i}")
                logging.warning(f"failed to save batch number {i}")

        counter_one = 0
        for key, value in self.unigram_counts.items():
            if value <= 1:
                counter_one+=value

        modified_counter = Counter({key if value > 1 else '<unk>': value for key, value in self.unigram_counts.items()})
        modified_counter['<unk>'] = counter_one

        modified_bigram_counter = Counter()

        for combo in self.bigram_counts.keys():
            word1, word2 = combo.split(" ")[0].strip(), combo.split(" ")[1].strip()
            word1 = word1 if word1 in modified_counter.keys() else '<unk>'
            word2 = word2 if word2 in modified_counter.keys() else '<unk>'
            if word1 + " " + word2 in modified_bigram_counter.keys():
                modified_bigram_counter[word1 + " " + word2] +=  self.bigram_counts[combo]
            else:
                modified_bigram_counter[word1 + " " + word2] =  self.bigram_counts[combo]

        modified_trigram_counter = Counter()

        for combo in self.trigram_counts.keys():

            word1, word2, word3, = combo.split(" ")[0].strip(), combo.split(" ")[1].strip(), combo.split(" ")[2].strip()
            word1 = word1 if word1 in modified_counter.keys() else '<unk>'
            word2 = word2 if word2 in modified_counter.keys() else '<unk>'
            word3 = word3 if word3 in modified_counter.keys() else '<unk>'

            if word1 + " " + word2 + " " + word3 in modified_trigram_counter.keys():
                modified_trigram_counter[word1 + " " + word2 + " " + word3] +=  self.trigram_counts[combo]
            else:
                modified_trigram_counter[word1 + " " + word2+ " " + word3] =  self.trigram_counts[combo]

        return modified_counter , modified_bigram_counter, modified_trigram_counter


    def train(self, infile,batch_size = 300):

        """ 
        Train the N-gram language model on the provided corpus.

        This method preprocesses the training data, calculates probabilities of N-grams(using smoothing algorithm),
        and saves the model configurations.

        Parameters
        ----------
        infile : str 
            File path to the training corpus.

        batch_size : int, optional
            Number of lines to process at once, by default 1.

        Returns
        -------
        dict
            Probabilities of bigrams or trigrams depending on the ngram_size.
        """ 
        

        unigram , bigram , trigram = self.prepare_data(infile,batch_size)
        self.unigram_counts = unigram
        self.bigram_counts = bigram
        self.trigram_counts = trigram

        if self.ngram_size == 2:

            nb_tokens = len(set(self.unigram_counts)) 

            for bigram in self.bigram_counts:
                self.probabilities_bigram[bigram] = math.log((self.bigram_counts[bigram] + 1 * self.k) / (self.unigram_counts[bigram.split(" ")[0].strip()] + nb_tokens * self.k))

            self.save_configurations()
            return self.probabilities_bigram
            
        
        if self.ngram_size == 3:

            nb_tokens = len(set(self.bigram_counts))

            for trigram in self.trigram_counts.keys():

                self.probabilities_trigram[trigram] = math.log((self.trigram_counts[trigram] + 1 * self.k) / (self.bigram_counts[trigram.split(" ")[0].strip() + " " +trigram.split(" ")[1].strip()] + nb_tokens * self.k))
            
            self.save_configurations()
            return self.probabilities_trigram
       

    def predict_ngram(self, sentence):

        """ 
        Predict the probability of a given sentence using the N-gram language model.

        This method preprocesses the input sentence, calculates the probability of the sentence
        based on the probabilities of N-grams calculated in the train step, and returns the probability.
        In an other way, it look in the dictionary of probabilities, if the combinaison of words exists
        it returns its probability else it calculates it

        Parameters
        ----------
        sentence : str
            The input sentence to predict its probability.

        Returns
        -------
        float
            The probability of the input sentence according to the N-gram language model.
        """ 

        probability = 0.0
        text  = self.split_into_sentences(sentence)

        sentences = text.split("<stop>")
        sentences = [s.strip().lower() for s in sentences]

        tokens = [s.split(" ") for s in sentences][0]

        corpus = ['<UNK>' if word not in self.unigram_counts else word for word in tokens]


        if self.ngram_size==2:

            nb_tokens = len(set(self.unigram_counts))
            bigrams = Counter(zip(corpus, corpus[1:]))

            for bigram in bigrams :

                if bigram in self.bigram_counts:
                    probability += self.probabilities_bigram[bigram]

                else :
                    previous_word_count = self.unigram_counts[bigram[0]]
                    probability += math.log(1 * self.k / (previous_word_count + nb_tokens * self.k))


        elif self.ngram_size == 3:

            nb_tokens = len(set(self.bigram_counts))
            trigrams = Counter(zip(corpus, corpus[1:], corpus[2:]))

            for trigram in trigrams:

                if trigram in self.trigram_counts:
                    
                    probability += self.probabilities_trigram[trigram]

                else:

                    previous_bigram = (trigram[0], trigram[1])

                    if previous_bigram in self.bigram_counts:

                        previous_words_count = self.bigram_counts[previous_bigram]
                        probability += math.log(1 * self.k / (previous_words_count + nb_tokens * self.k))

                    else :
                        probability +=  math.log(1 * self.k / nb_tokens * self.k)
    
        return probability


    def generate_text(self):

        """ 
        Generate a new sentence using the N-gram language model.

        This method generates a new sentence by randomly choosing tokens based on the probabilities 
        of N-grams stored in the model.

        Returns
        -------
        str
            A newly generated sentence.
        """ 

        current_token = "<s> " * (self.ngram_size - 1)
        generated_text = current_token.strip() + " "

        if self.ngram_size == 2:
            while current_token != "</s> ":
                next_tokens = [token for token in self.probabilities_bigram.keys() if token.startswith(current_token)]
                if not next_tokens:
                    break  

                probabilities = [np.exp(-self.probabilities_bigram[token]) for token in next_tokens]
                probabilities_sum = sum(probabilities)
                probabilities_normalized = [p / probabilities_sum for p in probabilities]

                next_token = np.random.choice(next_tokens, p=probabilities_normalized).split(" ")[1].strip()

                current_token = next_token
                generated_text += current_token + " "

        elif self.ngram_size == 3:
            while current_token.split(" ")[1] != "</s>":
                next_tokens = [token for token in self.probabilities_trigram.keys() if token.startswith(current_token)]
                if not next_tokens:
                    break  

                probabilities = [np.exp(-self.probabilities_trigram[token]) for token in next_tokens]
                probabilities_sum = sum(probabilities)
                probabilities_normalized = [p / probabilities_sum for p in probabilities]

                next_token = np.random.choice(next_tokens, p=probabilities_normalized).split(" ")[2].strip()

                previous_token = current_token.split(" ")[1].strip()
                current_token = previous_token + " " + next_token

                generated_text += next_token + " "

        else:
            logging.error("[Number Exception] number different than 2 or 3")

        return generated_text.strip()
    

    def auto_complete(self, sentence):

        """ 
        Generate a the rest of a sentence using the N-gram language model.

        Returns
        -------
        str
            The rest of a sentence.
        """ 

        probabilities = self.probabilities_bigram if self.ngram_size==2 else self.probabilities_trigram

        text  = self.split_into_sentences(sentence)
        sentences = text.split("<stop>")
        sentences = [s.strip().lower() for s in sentences]
        tokens = [s.split(" ") for s in sentences][0]


        # selecting the last token to base on
        words = ['<unk>' if word not in self.unigram_counts else word for word in tokens]
        corpus = words[-(self.ngram_size)+1:]

        current_token = ""
        generated_text = ""

        for word in corpus:
            current_token += word.strip() +" "

        if self.ngram_size == 2:

            while current_token != "</s> ":

                next_tokens = [token for token in probabilities.keys() if token.startswith(current_token.strip())]

                if not next_tokens:
                    break  # No next tokens available, end the generation

                probabilities_ = [np.exp(-probabilities[token]) for token in next_tokens]
                probabilities_sum = sum(probabilities_)
                probabilities_normalized = [p / probabilities_sum for p in probabilities_]

                next_token = np.random.choice(next_tokens, p=probabilities_normalized).split(" ")[1].strip()

                current_token = next_token
                if next_token != '</s>':
                    generated_text += next_token + " "

        elif self.ngram_size == 3:

            while current_token.split(" ")[1] != "</s>":

                next_tokens = [token for token in probabilities.keys() if token.startswith(current_token.strip())]

                if not next_tokens:
                    break  # No next tokens available, end the generation

                probabilities_ = [np.exp(-probabilities[token]) for token in next_tokens]
                probabilities_sum = sum(probabilities_)
                probabilities_normalized = [p / probabilities_sum for p in probabilities_]

                next_token = np.random.choice(next_tokens, p=probabilities_normalized).split(" ")[2].strip()

                previous_token = current_token.split(" ")[1].strip()
                current_token = previous_token + " " + next_token

                if next_token != '</s>':
                    generated_text += next_token + " "

        else:
            logging.error("[Number Exception] number different than 2 or 3")

        return sentence + " " + generated_text.strip()


    def auto_correct(self,sentence):

        """ 
        Perform auto-correction using the N-gram language model.

        Parameters
        ----------
        sentence : str
            The input sentence to be auto-corrected.

        Returns
        -------
        list of tuples
            A list of candidate words with their scores, sorted by score in descending order.
        """

        probabilities = self.probabilities_bigram if self.ngram_size==2 else self.probabilities_trigram

        text  = self.split_into_sentences("<s> "*(self.ngram_size-1) +  sentence)
        sentences = text.split("<stop>")
        sentences = [s.strip().lower() for s in sentences]
        tokens = [s.split(" ") for s in sentences][0]


        # selecting the last token to base on
        words = ['<unk>' if word not in self.unigram_counts else word for word in tokens]
        auto_corrector = Auto_Corrector(self.unigram_counts)

        for i in range(len(words)):

            if words[i] != tokens[i] : # found an unkonwn word
                surrounding_words = tokens[i-(self.ngram_size)+1:i]
                pre_words = ""

                for word in surrounding_words:
                    pre_words+= word + " "

                candidates = auto_corrector.candidates(tokens[i]) #bring the candidates
                list_candidates = []

                for j in range(len(candidates)):

                    if candidates[j][0] != '<unk>':

                        score = candidates[j][1]**(2) + probabilities[pre_words + candidates[j][0]] if probabilities.get(pre_words + candidates[j][0]) is not None else candidates[j][1]**(2)
                        list_candidates.append((candidates[j][0],score))

        return(sorted(list_candidates, key=lambda x: x[1]))


    def save_configurations(self):

        """ 
        Save the configurations of the N-gram language model to files.

        This method saves the configurations of the model, including counts, probabilities, and other parameters, 
        to configuration files(config.ini + configuration.json).
        """ 
        
        configuration = {}
        dictionary = {}

        setup = configparser.ConfigParser()

        dictionary["ngram_size"] = self.ngram_size

        dictionary["unigram_count"] = self.unigram_counts
        dictionary["bigram_count"] = self.bigram_counts
        dictionary["trigram_count"] = self.trigram_counts

        dictionary["probabilities_bigram"] = self.probabilities_bigram 
        dictionary["probabilities_trigram"] = self.probabilities_trigram  # Add this line

        setup['settings'] = {
            "k" : self.k,
            "alphabets" : self.alphabets,
            "prefixes" : self.prefixes,
            "suffixes" : self.suffixes,
            "starters" : self.starters,
            "acronyms" : self.starters,
            "websites" : self.websites,
            "digits" : self.digits,
            "multiple_dots" : self.multiple_dots,
            "characters" : self.characters,
            "replacements" : self.replacements,
            "lines_processed" : self.lines_processed
        }

        configuration["dictionary"] = dictionary
        with open('config.ini', 'w') as configfile:
            setup.write(configfile)

        save_dict_to_json(configuration, "configuration.json")


### 4.0 Testing the model

In [21]:
# Initializing the model
model = NgramLanguageModel(ngram_size=2)

In [22]:
model.lines_processed

47961

In [23]:
# Training the model
model.train("data/big_data.txt")

{'<s> how': -4.937424477244247,
 'how are': -3.034988894618542,
 'are you': -1.9515964826892505,
 'you ?': -4.155291844604722,
 '? </s>': -0.021870640597409487,
 '<s> btw': -7.512744329500963,
 'btw thanks': -4.776947122604469,
 'thanks for': -0.8627461830802949,
 'for the': -1.6931693419110425,
 'the rt': -5.483720762934675,
 'rt .': -4.287381265303947,
 '. </s>': -0.3480631950291979,
 '<s> you': -3.8325038614856166,
 'you gonna': -6.542530363754041,
 'gonna be': -1.7126321365784125,
 'be in': -3.5291803338267957,
 'in dc': -5.797244494798235,
 'dc anytime': -5.514187002212487,
 'anytime soon': -3.147199794536901,
 'soon ?': -3.5059757173040675,
 '<s> love': -5.282380625035937,
 'love to': -2.894914021895187,
 'to see': -3.4184375298332896,
 'see you': -1.5901129404961984,
 'you .': -3.2785781977283666,
 '<s> been': -7.246180662125708,
 'been way': -6.287201776067501,
 'way ,': -3.6537970589539697,
 ', way': -8.038363179092046,
 'way too': -3.23409222803467,
 'too long': -4.2670859195

In [24]:
# predicting the probability of the sentence "i am curious"
model.predict_ngram("i am curious")

['i', 'am', 'curious']


-25.961718175430185

In [25]:
# generating a sentence
model.generate_text()

'<s> remarkable wife crazy horse se page ? </s>'

In [26]:
model.auto_complete("i don't know what")




"i don't know what one job breaking responding . 4k fans suck miss 3 waiting come yikes i repeatedly with several surrounding areas . pat for photo and approaches <unk> emerald city schools bullying needs <unk> guests can count no charge oh emma wright atl so anything more acting and fought everyone have created this update for yah successful prank call hope things alone cause <unk> 995 <unk> turtle mountain sound weird on california dairy sorbet , blunts on slow for thinking if austin protecting pittsburgh pa you babe on pretty light platinum !"

In [27]:
model.auto_correct("i see you evyrday")

[('everyday', -0.42350719208873855),
 ('day', 7.264677199129078),
 ('every', 7.264677199129078),
 ('vday', 9.0),
 ('pray', 15.576492807911261),
 ('1day', 15.576492807911261),
 ('ya', 15.576492807911261),
 ('ed', 15.576492807911261),
 ('era', 16.0),
 ('rda', 16.0),
 ('eva', 16.0),
 ('yay', 16.0),
 ('ray', 16.0),
 ('nerdy', 16.0),
 ('da', 16.668482555818876),
 ('friday', 17.177978392154866),
 ('very', 17.764877548678342),
 ('ever', 19.255671899927236),
 ('yr', 25.0),
 ('vera', 25.0),
 ('daya', 25.0),
 ('levy', 25.0),
 ('yard', 25.0),
 ('bday', 25.0),
 ('er', 25.0),
 ('heyy', 25.0),
 ('dayy', 25.0),
 ('byrd', 25.0),
 ('days', 25.0),
 ('tray', 25.0),
 ('sevy', 25.0),
 ('herd', 25.0),
 ('every1', 25.0),
 ('2day', 25.0),
 ('yoda', 25.0),
 ('rays', 25.0),
 ('va', 25.0),
 ('nerd', 25.0),
 ('evan', 25.0),
 ('payday', 25.0),
 ('cray', 25.0),
 ('ra', 25.0),
 ('yayy', 25.0),
 ('sday', 25.0),
 ('earthday', 25.0),
 ('vary', 25.0),
 ('oneday', 25.0),
 ('gray', 25.0),
 ('davy', 25.0),
 ('ry', 25.0),
 

***
**Made By :**
- *Houda Moudni* : houda.moudni@etu.uae.ac.ma
- *Chadi Mountassir* : chadi.mountassir@etu.uae.ac.ma