# Team: Jinfeng Hong (jh6011), Hejing Liu (hl3620)

## Task Assignments:
- "Utility for Cleaning Text" function, preprocessingText(): Hejing Liu
- "Text Cleaning" parts: Hejing Liu
- "Read Onion / Economist" parts: Jinfeng Hong
- Functions:
    - Test_Train_Split(): Jinfeng Hong
    - counting(): Jinfeng Hong
    - P_class_doc(): Hejing Liu
    - CM_Description(): Hejing Liu
- Experiments parts:
    - Create 10-fold Cross Validation: Jinfeng Hong
    - Create test-train set and counting works: Jinfeng Hong
    - Computations of prior and posterior probabilities: Hejing Liu
    - Prediction work: Hejing Liu
- "Most/Least Representative Words" parts: Jinfeng Hong

import packages

In [38]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

import fnmatch
import os

# Packages for cleaning data
import string
import re
from nltk.stem.porter import PorterStemmer
from nltk.corpus import stopwords
from collections import Counter
from nltk.stem import WordNetLemmatizer
from spellchecker import SpellChecker
import random
from sklearn.metrics import confusion_matrix
from math import log
from sklearn.metrics import classification_report
import itertools 

# Utility for cleaning text

In [39]:
def preprocessingText(s, stopword = 0, freqword = 0, rareword = 0, stem_lemma = -1, spellcheck = 0):
    s = s.lower()
    s = re.sub(r'\[\d+\]',' ',s)
    s = re.sub(r"[^\w\s]",' ',s)
    
    # Remove stopping words
    if stopword == 1:
        STOPWORDS = set(stopwords.words('english'))
        s = " ".join([word for word in str(s).split() if word not in STOPWORDS])
                      
    # Remove of frequent words
    if freqword == 1:
        cnt = Counter()
        for word in s.split():
            cnt[word] += 1
        FREQWORDS = set([w for (w, wc) in cnt.most_common(10)])
        s = " ".join([word for word in str(s).split() if word not in FREQWORDS])
    
    if rareword == 1:
        cnt = Counter()
        for word in s.split():
            cnt[word] += 1
        n_rare_words = 10
        RAREWORDS = set([w for (w, wc) in cnt.most_common()[:-n_rare_words-1:-1]])
        s = " ".join([word for word in str(s).split() if word not in RAREWORDS])
        
    # Stemming or Lemmatization¶
    if stem_lemma == 1:
        lemmatizer = WordNetLemmatizer()
        s = " ".join([lemmatizer.lemmatize(word) for word in s.split()])
        
    elif stem_lemma == 0:
        stemmer = PorterStemmer()
        s = " ".join([stemmer.stem(word) for word in s.split()])
    
    # Spelling check:
    if spellcheck == 1:
        spell = SpellChecker()
        corrected_text = []
        misspelled_words = spell.unknown(s.split())
        for word in s.split():
            if word in misspelled_words:
                corrected_text.append(spell.correction(word))
            else:
                corrected_text.append(word)
        s = " ".join(corrected_text)
#     print(s)
    return s

## Read Onions

In [40]:
onion_txt = []
for root, dirs, files in os.walk('./onion'):
    for _file in files:
        if fnmatch.fnmatch(_file,'*.txt'):
            onion_txt.append(_file)

# N is for number of total files, N_onion is number of onion files
N_onion = len(onion_txt)
N = N_onion 
print('onion',N)

# txtnames contains all files' name for cross-validation purpose
txtnames = onion_txt.copy()

onion 192


## Read Economist

In [41]:
econ_dir = os.listdir('./economist')
location_txt = {} # contains all economist txt based on area
N_econ = {} # recode the number of files from each area

for d in econ_dir:
    txt = []
    for root, dirs, files in os.walk(f'./economist/{d}/'):
        for _file in files:
            if fnmatch.fnmatch(_file,'*.txt'):
                txt.append(_file)
    location_txt[d] = txt
    txtnames += txt.copy()

for key in location_txt.keys():
    print(key, len(location_txt[key]))
    N += len(location_txt[key])
    N_econ[key] = len(location_txt[key])
    
print('\nNumbers of total files: ',N, N==len(txtnames))

africa 74
asia 83
britain 100
europe 112
international 38
latin_america 66
north_america 60

Numbers of total files:  725 True


## Text Cleaning

In [42]:
onions = []
for txt in onion_txt:
    path = "./onion/" + txt
    s = ''
    with open(path, 'r', errors='ignore') as f:
        for line in f:
            # look at line in loop
            line = line.replace("\n"," ").strip() + ' '
            s += line
    clean_s = preprocessingText(s.strip(),stopword = 1, freqword = 1, rareword = 0, stem_lemma = 0, spellcheck=0)
    onions.append(list(clean_s.split()))
    
print('onions',len(onions))

onions 192


In [43]:
economists = {}
for key in location_txt.keys():
    temp = []
    for txt in location_txt[key]:
        path =  f"./economist/{key}/" + txt
        s = ''
        with open(path, 'r', errors='ignore') as f:
            for line in f:
                # look at line in loop
                line = line.replace("\n"," ").strip() + ' '
                s += line
        clean_s = preprocessingText(s.strip(),stopword = 1, freqword = 1, rareword = 0, stem_lemma = 0, spellcheck=0)
        temp.append(list(clean_s.split()))
    economists[key] = temp
    print(key,len(economists[key]))

africa 74
asia 83
britain 100
europe 112
international 38
latin_america 66
north_america 60


Right now we have all clean data for onion and economist 
  - where onion is a list, and economist is in distribution containing data from different area

# Experiment 1

In [44]:
from math import log

def Test_Train_Split(fold, alldata, num, org_n):
    """
    [0,533) = Econ -> 0
    [533,735] = Onion -> 1
    """
    accum = list(itertools.accumulate(org_n))
    
    count_class = [0] * len(org_n)
    test = []
    for ind in fold:
        test.append(alldata[ind])
        num.remove(ind)
        res = list(map(lambda i: i > ind, accum)).index(True)
        count_class[res] += 1
        
    train = []
    for i in num:
        train.append(alldata[i])
    
    counts = [org_n[i] - count_class[i] for i in range(len(org_n))]
        
    return test, train, counts, num.copy()

def counting(train, order, org_n):
    V = set()
    for lst in train:
        V.update(lst)
    
    lst = [{} for _ in range(len(org_n))]
    C_class = [len(V)]*len(org_n)
    
    accum = list(itertools.accumulate(org_n))

    for i,pos in enumerate(order):
        cnt = Counter(train[i])
        res = list(map(lambda i: i > pos, accum)).index(True)
        C_class[res] += len(train[i])
        for k,v in cnt.items():
            if k not in lst[res]:
                lst[res][k] = 0
            lst[res][k] += v
              
    return lst,C_class.copy()

def P_class_doc(P_x_,P_,doc,denom):
    log_p = log(P_)
    diff = set(doc).difference(P_x_.keys())
    for i in doc:
        if i in diff:
            log_p += log(1/(denom+1))
        else:
            log_p += log(P_x_[i])
    return log_p

def CM_Descprition(y_test, y_pred):
    print(confusion_matrix(y_test, y_pred))
    cm = confusion_matrix(y_test, y_pred)
    n = len(cm)
    
    # Precision:
    precisions = []
    for i in range(n):
        c_ii = cm[i][i]
        sum_p = 0
        for j in range(n):
            sum_p += cm[j][i]
        if sum_p != 0: 
            precisions.append(c_ii/sum_p)
        else:
            precisions.append(0)
        
    # Recall:
    recalls = []
    for i in range(n):
        c_ii = cm[i][i]
        sum_r = 0
        for j in range(n):
            sum_r += cm[i][j]
        if sum_r != 0:
            recalls.append(c_ii/sum_r)
        else:
            recalls.append(0)
    
    # Accuracy:
    accuracies = []
    for i in range(n):
        accuracies.append(cm[i][i]/sum(map(sum, cm)))    
    
    # F - test
    F = []
    for i in range(n):
        if precisions[i] + recalls[i] != 0:
            F.append(2 * precisions[i] * recalls[i] / (precisions[i] + recalls[i]))
        else:
            F.append(0)
        
    return precisions, recalls, accuracies, F
    

### 1-1

In [45]:
# Create 10 Fold Cross Validation
econ = []
for i in economists.values():
    econ += i
    
alldata = onions + econ # remember this order does matter
num = [i for i in range(725)] # number from 0-192 are onino, others are econ
random.shuffle(num)

folds = []
start = 0
end = N // 10
for i in range(9):
    folds.append(num[start:end])
    start = end
    end += N//10
folds.append(num[start:])

In [46]:
PRECISIONS = {i:[] for i in range(2)}
RECALLS = {i:[] for i in range(2)}
ACCURACIES = {i:[] for i in range(2)}
FSCORES = {i:[] for i in range(2)}

for ind,fold in enumerate(folds):
    test,train,counts,order = Test_Train_Split(fold, alldata, num.copy(),[192,533])
    P_area = [i/sum(counts) for i in counts]
    dic_area, C_area = counting(train,order, [192,533])

    # Conditional Probabilities
    P_x_area = []
    for i in range(2):
        _x_area = {j:(dic_area[i][j]+1)/C_area[i] for j in dic_area[i]}
        P_x_area.append(_x_area)

    # give a test set and predict
    y_pred = []
    for t in range(len(test)):
        p_class_doc = []
        for j in range(2):
            p = P_class_doc(P_x_area[j],P_area[j],test[t],C_area[j])
            p_class_doc.append(p)
        
        y_pred.append(p_class_doc.index(max(p_class_doc)))
            
    print('Confusion Matrix of fold {}'.format(ind))
    
    # Confusion Matrix
    y_test = [0 if i < 192 else 1 for i in fold]
    Precisions, Recalls, Accuracies, F = CM_Descprition(y_test, y_pred)
    for i in range(2):
        PRECISIONS[i].append(Precisions[i])
        RECALLS[i].append(Recalls[i])
        ACCURACIES[i].append(Accuracies[i])
        FSCORES[i].append(F[i])

for ind,v in enumerate(['Onion','Economist']):
    print(v)
    print('Overall Precision: ', sum(PRECISIONS[ind])/10)
    print('Overall Recall: ',sum(RECALLS[ind])/10)
    print('Overall Accuracy: ',sum(ACCURACIES[ind])/10)
    print('Overall F: ',sum(FSCORES[ind])/10)
    
    print('='*20)

Confusion Matrix of fold 0
[[18  1]
 [ 1 52]]
Confusion Matrix of fold 1
[[22  0]
 [ 0 50]]
Confusion Matrix of fold 2
[[24  0]
 [ 1 47]]
Confusion Matrix of fold 3
[[17  0]
 [ 0 55]]
Confusion Matrix of fold 4
[[15  0]
 [ 2 55]]
Confusion Matrix of fold 5
[[20  1]
 [ 2 49]]
Confusion Matrix of fold 6
[[13  0]
 [ 0 59]]
Confusion Matrix of fold 7
[[16  0]
 [ 0 56]]
Confusion Matrix of fold 8
[[22  0]
 [ 0 50]]
Confusion Matrix of fold 9
[[22  1]
 [ 2 52]]
Onion
Overall Precision:  0.9615478937986678
Overall Recall:  0.985627111256402
Overall Accuracy:  0.26051587301587303
Overall F:  0.9730863028692817
Economist
Overall Precision:  0.9942264150943396
Overall Recall:  0.9848958299528574
Overall Accuracy:  0.724476911976912
Overall F:  0.9895008263350482


### 1-2

### Summary:
<l> The statistic data for both classes, Onion and Economists are very impressive because all data are close to 1. Total accruacy $\approx$ 0.72+0.26  = 0.98. This indicates that the model perform very well to classify almost all files correctly. The main reason is that this two classes are very different to each other. As confusion matrix shown, there are only a litter number of false negative and false positive classifications. Overall, this model is great enough to predict two classes.The result is what I expected because I used all words in a test set to do the prediction, which means even though it cost time to calculation the posterior probabilities, also I did remove the stopwords, such as, 'we','the','I', and etc,so that the training and test set has more valuable words for training. But it still has some systematic errors otherwise the model would predict perfectly. These systematic errors perhaps coming from text cleaning, for example, one files has words that two classes gives similar weights, in result, a slightly higher weight causing the result lean to the incorrect direction. This error could be blame to the insufficient text cleaning or over text cleaning, For example, spelling correction is not used in this experiments.

### 1-3: Most Representative Words

In [49]:
fold = []
test,train,counts, order = Test_Train_Split(fold, alldata, num.copy(),[192,533])

n_econ,n_onion= counts[0],counts[1]
P_econ = n_econ / (n_econ+n_onion)
P_onion = 1 - P_econ

dic_area, C_area = counting(train,order, [192,533])

In [52]:
# Onion
k = Counter(dic_area[0])
high = k.most_common(5)
for i in high: 
    print(i[0]," :",i[1]," ") 
    
print('='*20)
    
# Econ
k = Counter(dic_area[1])
high = k.most_common(5)
for i in high: 
    print(i[0]," :",i[1]," ") 

one  : 253  
year  : 252  
time  : 243  
ad  : 243  
imag  : 241  
year  : 1110  
say  : 880  
one  : 873  
countri  : 669  
govern  : 666  


### Summary:

After cleaning the top 10 high frequency words in each class, this results are what I expected they should be are explanable. Economists files metion and use more words from human, government, trade, economist, finance, and etc. As we see, 'countri','govern','say' are used most frequently in the economists class. But'year' and 'one' have a high frequencies showing up in both classes. Therefore, 'year' and 'one' can be a bad words in predictions and causes the false negative and false positive. What I want to list out are words with prefix "financ-" and "econom-". They are more representitive in Economists class. While Onion class gives the most representative words "ad (advertisement)", "time", and "imag (image)", which makes sense since people need to plant and sell the onion, so advertisement and image are necessary. But again we can change words "one" and "year" to "cooking" and "spicy" because onion is a seasoning in our food and rarely being used in economist.

### 1-3: Least Representative Words

In [55]:
n = 5

# Onion
k = Counter(dic_area[0])
low = k.most_common()[:-n-1:-1]
for i in low: 
    print(i[0]," :",i[1]," ") 
    
print('='*50)

# Econ
k = Counter(dic_area[1])
low = k.most_common()[:-n-1:-1]
for i in low: 
    print(i[0]," :",i[1]," ") 

guerrilla  : 1  
blum  : 1  
neuropsychologist  : 1  
mayo  : 1  
stanford  : 1  
ineptitud  : 1  
brawl  : 1  
gasolin  : 1  
hamo  : 1  
blagojevich  : 1  


### Summary:
From both classes, it hard to say why they are the least common words. From my guess, their orders in the list are random so they are picked up by chances because as we see, they all have same numbers, which suggest that they have same numbers of appearances. But we can still find some interesting things. From Onion Class, the words "neuropsychologist" and "stanford" are from some reseach article or science article. So these "types" of article are the least in the classes. In economist class, the name "blagojevich" is mentioned and appears only once, which makes sense. The "gasolin" is a typical economic word, for some reason it is here. But overall, most of words are not readable and rare, so they are the least representative words.

# Experiment 2

    [0,74) = africa => 0
    [74,157) - 74 = [0,83) = asia => 1
    [157,257) - 157 = [0,100) = britain => 2
    [257,369) - 257 = [0,112) = europe => 3
    [369,407) - 369 = [0,38) = international => 4
    [407,473) - 407 = [0,66) = latin_america => 5
    [473,533) - 473 = [0,60) = north_america => 6

In [56]:
def counting_area(train, order, org_n):
    V = set()
    for lst in train:
        V.update(lst)
    
    lst = [{} for _ in range(len(org_n))]
    C_class = [len(V)]*len(org_n)
    
    accum = list(itertools.accumulate(org_n))
    
    for i,pos in enumerate(order):
        cnt = Counter(train[i])
        
        res = list(map(lambda i: i > pos, accum)).index(True)
        C_class[res] += len(train[i])
        for k,v in cnt.items():
            if k not in lst[res]:
                lst[res][k] = 0
            lst[res][k] += v
              
    return lst,C_class.copy()

get 10 fold crossover validations and train/test set

In [57]:
# Create 10 Fold Cross Validation
econ = []
for i in economists.values():
    econ += i
    
N = len(econ)
num = [i for i in range(N)]
random.shuffle(num)

folds = []
start = 0
end = N // 10
for i in range(9):
    folds.append(num[start:end])
    start = end
    end += N//10
    
folds.append(num[start:])


In [59]:

PRECISIONS = {i:[] for i in range(7)}
RECALLS = {i:[] for i in range(7)}
ACCURACIES = {i:[] for i in range(7)}
FSCORES = {i:[] for i in range(7)}

for fold in folds:
    test,train,counts,order = Test_Train_Split(fold, econ, num.copy(),list(N_econ.values()))
        
    P_area = [i/sum(counts) for i in counts]
    
    dic_area, C_area = counting(train,order, list(N_econ.values()))
    
    # Conditional Probabilities
    P_x_area = []
    for i in range(7):
        _x_area = {j:(dic_area[i][j]+1)/C_area[i] for j in dic_area[i]}
        P_x_area.append(_x_area)

    # give a test set and predict
    y_pred = []
    for t in range(len(test)):
        p_class_doc = []
        for j in range(7):
            p = P_class_doc(P_x_area[j],P_area[j],test[t],C_area[j])
            p_class_doc.append(p)
        
        y_pred.append(p_class_doc.index(max(p_class_doc)))

    y_test = []
    for ind in fold:
        if ind < 74:
            y_test.append(0)
        elif ind < 157:
            y_test.append(1)
        elif ind < 257:
            y_test.append(2)
        elif ind < 369:
            y_test.append(3)
        elif ind < 407:
            y_test.append(4)
        elif ind < 473:
            y_test.append(5)
        else:
            y_test.append(6)
    
    # Confusion Matrix
    Precisions, Recalls, Accuracies, F = CM_Descprition(y_test, y_pred)
    for i in range(7):
        PRECISIONS[i].append(Precisions[i])
        RECALLS[i].append(Recalls[i])
        ACCURACIES[i].append(Accuracies[i])
        FSCORES[i].append(F[i])
    
    print('='*20)

for ind,v in enumerate(econ_dir):
    print(v)
    print('Overall Precision: ', sum(PRECISIONS[ind])/10)
    print('Overall Recall: ',sum(RECALLS[ind])/10)
    print('Overall Accuracy: ',sum(ACCURACIES[ind])/10)
    print('Overall F: ',sum(FSCORES[ind])/10)
    
    print('='*20)

[[ 9  0  0  0  0  0  0]
 [ 0  7  0  1  0  0  0]
 [ 0  0  6  0  0  0  0]
 [ 0  0  1 12  0  0  0]
 [ 2  0  1  1  1  0  0]
 [ 0  0  0  0  0  5  0]
 [ 0  0  3  0  0  1  3]]
[[ 3  0  0  0  0  0  0]
 [ 1  4  0  1  1  0  0]
 [ 0  0 15  0  0  0  0]
 [ 0  0  1  9  0  0  0]
 [ 2  0  2  1  0  0  0]
 [ 0  0  1  1  0  6  0]
 [ 0  0  2  0  0  0  3]]
[[7 0 1 0 1 0 0]
 [0 6 1 1 1 0 0]
 [0 0 7 1 0 0 0]
 [0 0 0 6 0 0 0]
 [0 0 2 1 3 0 1]
 [0 1 2 2 0 7 0]
 [0 0 0 0 0 0 2]]
[[ 8  0  1  0  0  0  0]
 [ 1  5  2  1  0  0  0]
 [ 0  0 13  0  0  0  0]
 [ 2  0  1  4  1  0  0]
 [ 0  0  0  0  2  0  0]
 [ 0  0  0  0  0  7  0]
 [ 0  0  0  0  0  0  5]]
[[ 2  0  0  0  0  0  0]
 [ 0  6  0  2  0  0  0]
 [ 0  0  9  0  0  0  0]
 [ 1  0  2 10  0  0  1]
 [ 1  0  3  0  0  0  0]
 [ 0  1  0  0  0  9  0]
 [ 0  0  0  0  0  1  5]]
[[12  0  0  0  1  0  0]
 [ 1  6  0  0  0  0  0]
 [ 0  0  6  1  0  0  0]
 [ 0  0  0 15  0  0  0]
 [ 1  0  1  1  0  0  0]
 [ 0  0  0  0  1  3  0]
 [ 0  0  2  0  0  0  2]]
[[ 7  0  0  1  0  0  0]
 [ 1  6  4 

### 1-2

### Summary
From the above data summary we know that, international and britain got lower precision, to some extent, it makes sense because international is a broad word that is most likely covering or being coverd by other 6 areas. To understand this, we check its recall is also very low, which indicates that the sense is highly likely to be true. While Britain has high recall, which means it has a good prediction in True Positive and a little bad in flase negatives, but it has a high false positives in high probability as the same reason as International. Here, I doubt that those files published dates. It might be published before the Brexit. If so, then Europe class has an influence to the prediction of Britain class. Other areas have high precision in general and over 70% recall score which seems good. We know that there must be systematic errors from classsifications in NLP due to text, the input, I gave to the model. As before, it is the same reansom most likely to the previous experiments between Onion and Economists. But here we need to do more text cleaning work because we are analyzing the classes from the same general topic, the Economist, which results in high similarities of words in 6 areas, if all area all have high weights on common words, then the model would suffer from which class should be choose, and results may become worse.
In conclusion, the total accuracy is 78% or so, which means the prediction is good and acceptable.

### 1-3: Most Representative Words

In [61]:
fold = []

test,train,counts,order = Test_Train_Split(fold, econ, num.copy(),list(N_econ.values()))
        
P_area = [i/sum(counts) for i in counts]

dic_area, C_area = counting_area(train,order,list(N_econ.values()))

In [62]:
for ind,area in enumerate(econ_dir):
    print(area)
    k = Counter(dic_area[ind])
    high = k.most_common(5)    
    for i in high: 
        print(i[0]," :",i[1]," ") 
    print('='*20)

africa
year  : 163  
say  : 142  
one  : 117  
countri  : 108  
govern  : 107  
asia
year  : 197  
one  : 143  
say  : 142  
two  : 112  
parti  : 110  
britain
one  : 210  
year  : 199  
use  : 159  
say  : 154  
make  : 151  
europe
year  : 202  
say  : 158  
one  : 148  
minist  : 138  
govern  : 137  
international
countri  : 122  
one  : 113  
year  : 105  
say  : 92  
use  : 83  
latin_america
year  : 133  
say  : 115  
state  : 107  
countri  : 92  
govern  : 91  
north_america
american  : 111  
year  : 111  
democrat  : 109  
like  : 102  
peopl  : 88  


### Summary
the some words from the top 5 common words are very representative to the corresponding area. For exmaple, in North America, "american", "democrat", and "peol(people)" are strong words to say, "hay, this is the US." But I have to say the some words have high frequencies appearing in different areas, such as "say", "year"，"one", and "govern," so that they are not representative any more. For Africa, the words related with economy could be "poor", "lagged", "underdeveloped", "low" since most of the countries in Africa are mid-developing countries or underdevelpoed countries, even though these years their economy grows up fast, but still have a long path. For Asia, we can give words, such as, "china","pacific", "APEC" (Asia-Pacific Economic Cooperation), "manufacturing", and "environment" because there are more business opportunities and jobs in new emerging market especially in China and India, and also "environment" is the topic that the developing countries are concerned most. For Britain, such as, "London", "LSEG", "Pounds" can be used. For Latin America, we could say "Brazil","Cuba","Coco","Cigar". For North America, we could update the words to "Trump", "Tesla","Twitter","Bloomberg" and etc.  

### 2-3: Least Representative Words

In [63]:
for ind,area in enumerate(econ_dir):
    print(area)
    for i in Counter(dic_area[ind]).most_common()[:-n-1:-1]:  
        print(i[0]," :",i[1]," ") 
    print('='*20)

africa
imara  : 1  
shortli  : 1  
66  : 1  
32m  : 1  
lonzim  : 1  
asia
sooner  : 1  
escal  : 1  
unilater  : 1  
react  : 1  
prabhakaran  : 1  
britain
blaze  : 1  
hotli  : 1  
fixer  : 1  
languid  : 1  
fascin  : 1  
europe
wili  : 1  
michnik  : 1  
adam  : 1  
liabil  : 1  
acerb  : 1  
international
clever  : 1  
bloodsh  : 1  
venu  : 1  
coverag  : 1  
satellit  : 1  
latin_america
richer  : 1  
exclud  : 1  
rightli  : 1  
saddest  : 1  
prevail  : 1  
north_america
laughabl  : 1  
be  : 1  
zani  : 1  
flaunt  : 1  
vibrant  : 1  


### Summary:
It was not surpried to see that these least representative words has lower weight in negative exponential to the 5 at least. Some of words shows up in different areas. As I mentioned in the model, Onion-Economists, these words are least likely to be read in the file, but they were chosen to here by chances. As we here, the result still have many typos therefore they are not representative and make sense to us.