# Feature Engineering and evaluation using baseline model

In this section we will apply the traditional approach of feature engineering for text using Bag of Words normalised by the TF-IDF factor. We will analyse BoW features as uni-grams alone and also as a combination of uni-grams and bi-grams. The baseline model for evaluating the features will be a multinomial Naive Bayes prediction model.

We will carry out the experiment using different combinations of input data from our baselined dataset.

In [1]:
import pandas as pd
import numpy as np
import pickle
import gc
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
from sklearn.naive_bayes import ComplementNB
from sklearn.naive_bayes import CategoricalNB
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
import scipy
from sklearn.pipeline import Pipeline

In [2]:
pickle_file = open('pickle_file','rb')
df=pickle.load(pickle_file)
pickle_file.close()

In [3]:
df.drop('index',axis=1,inplace=True)
df.drop('text',axis=1,inplace=True)
df.drop('title',axis=1,inplace=True)

In [17]:
df.head()

Unnamed: 0,authors,fake,site_name,text_length,type,normal_text,proper_nouns,verbs,adj,ner,normal_title,adverbs
0,1080,True,735,723,19,"[star, magazine, release, explosive, report, t...","[star, brad, pitt, bard, star, brad]","[release, come, learn, wish, remain, come, cla...","[explosive, pregnant, old, former, early, anon...","[star, brad pitt's, bard, star, brad]","[us, report, brad, pitt, s, secret, lover, pre...","[forward, allegedly, forward, effectively]"
1,1898,True,358,4749,19,"[early, year, buzz, around, megyn, kellys, mov...","[megyn, kellys, fox, news, nbc, fox, nbc, nbc,...","[buzz, move, get, break, capture, pay, turn, f...","[early, loud, insular, loyal, high, political,...","[fox news, nbc, fox, nbc, nbc, kellys, matt la...","[megyn, kelly, make, list, highest, pay, tv, h...","[typically, alike, nowhere, also, annually, ac..."
2,606,True,456,1673,19,"[first, time, since, involvement, fatal, car, ...","[kenneth, mosher, april, bachelor, chris, soul...","[kill, speak, reveal, lean, read, click, overw...","[first, fatal, fellow, overwhelmed, full, fell...","[kenneth mosher, chris soules, chris soules, c...","[chris, soule, break, silence, fatal, accident...","[finally, extremely, together, well, previousl..."
3,3791,True,1280,811,19,"[heel, cute, last, long, selena, gomez, wear, ...","[selena, gomez, hotel, transylvania, selena, s...","[wear, keep, ditch, go, sign, add, ruin, top, ...","[cute, last, sparkly, mid, -, heel, red, baref...","[heels cute, selena gomez, hotel transylvania ...","[selena, gomez, go, barefoot, street, hotel, t...","[also, cashmere, frankly, kind, hey, still, tw..."
4,3535,True,816,871,19,"[jessica, simpsons, beloved, dog, daisy, snatc...","[jessica, simpsons, daisy, monday, simpson, tw...","[snatch, watch, happen, reveal, break, take, w...","[beloved, stunned, precious, right, front, sho...","[jessica simpsons, daisy, simpson, twitter, da...","[jessica, simpson, heartbroken, missing, dog]",[]


In [18]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14968 entries, 0 to 14967
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   authors       14968 non-null  int16 
 1   fake          14968 non-null  bool  
 2   site_name     14968 non-null  int16 
 3   text_length   14968 non-null  int64 
 4   type          14968 non-null  int8  
 5   normal_text   14968 non-null  object
 6   proper_nouns  14968 non-null  object
 7   verbs         14968 non-null  object
 8   adj           14968 non-null  object
 9   ner           14968 non-null  object
 10  normal_title  14968 non-null  object
 11  adverbs       14968 non-null  object
dtypes: bool(1), int16(2), int64(1), int8(1), object(7)
memory usage: 1023.3+ KB


In [19]:
df_train, df_test = train_test_split(df, test_size=0.2, random_state=149) # skip if running sub prediction, start from C6

## Input data - normal text

### BoW with TF-IDF (uni-grams and bi-grams)

In [20]:
pipeline = Pipeline([
    ('bow', CountVectorizer(input='content',strip_accents='unicode',lowercase=True,ngram_range=(1,2),max_df=1.0,min_df=1)),  
    ('tfidf', TfidfTransformer()),  # BOW counts to weighted TF-IDF scores
    ('classifier', MultinomialNB()),  # Naive Bayes classifier
])

pipeline.fit(df_train['normal_text'].apply(' '.join),df_train['fake'])
predictions = pipeline.predict(df_test['normal_text'].apply(' '.join))
print(classification_report(df_test['fake'],predictions))


              precision    recall  f1-score   support

       False       0.76      1.00      0.87      2268
        True       1.00      0.03      0.06       726

    accuracy                           0.77      2994
   macro avg       0.88      0.52      0.46      2994
weighted avg       0.82      0.77      0.67      2994



### BoW with TF-IDF (only uni-grams)
We can see the bi-grams are not having any positive impact. So for the rest of the section we will only consider uni-grams, since processing will also be cheaper.

In [21]:
pipeline = Pipeline([
    ('bow', CountVectorizer(input='content',strip_accents='unicode',lowercase=True,ngram_range=(1,1),max_df=1.0,min_df=1)),  
    ('tfidf', TfidfTransformer()),  # BOW counts to weighted TF-IDF scores
    ('classifier', MultinomialNB()),  # Naive Bayes classifier
])

pipeline.fit(df_train['normal_text'].apply(' '.join),df_train['fake'])
predictions = pipeline.predict(df_test['normal_text'].apply(' '.join))
print(classification_report(df_test['fake'],predictions))


              precision    recall  f1-score   support

       False       0.77      1.00      0.87      2268
        True       0.98      0.09      0.16       726

    accuracy                           0.78      2994
   macro avg       0.88      0.54      0.51      2994
weighted avg       0.82      0.78      0.70      2994



## Function for bow, tf-idf transformers and extraction

In [9]:
def get_tfidf(in_text_series):
    bow_transformer=CountVectorizer(input='content',strip_accents='unicode',lowercase=True,ngram_range=(1,1),max_df=1.0,min_df=1).fit(in_text_series.apply(' '.join))
    text_bow = bow_transformer.transform(in_text_series.apply(' '.join))
    tfidf_transformer = TfidfTransformer().fit(text_bow)
    text_tfidf = tfidf_transformer.transform(text_bow)
    return {'text_tfidf':text_tfidf, 'bow_transformer':bow_transformer, 'tfidf_transformer':tfidf_transformer}

In [23]:
# skip if running sub prediction, start from C6
df_train_sub_features = df_train[['type', 'site_name', 'authors', 'text_length']].copy()
df_test_sub_features = df_test[['type', 'site_name', 'authors', 'text_length']].copy() 

## Input data Combination 1 (C1)- normalised text, authors, site_name, text_length, type

In [19]:
normal_text_tfidf = get_tfidf(df_train['normal_text'])
df_train_normal_text_tfidf = pd.DataFrame(normal_text_tfidf['text_tfidf'].toarray())

df_train_C1 = pd.concat([df_train_normal_text_tfidf.reset_index(drop=True), df_train_sub_features.reset_index(drop=True)], ignore_index=True, sort=False, axis = 1)
df_train_C1_sm=scipy.sparse.csr_matrix(df_train_C1.values).copy() #sparse matrix


In [20]:
fakenews_model = MultinomialNB().fit(df_train_C1_sm, df_train['fake'])


In [21]:
test_normal_text_bow = normal_text_tfidf['bow_transformer'].transform(df_test['normal_text'].apply(' '.join))
test_normal_text_tfidf = normal_text_tfidf['tfidf_transformer'].transform(test_normal_text_bow)
df_test_normal_text_tfidf = pd.DataFrame(test_normal_text_tfidf.toarray())

df_test_C1 = pd.concat([df_test_normal_text_tfidf.reset_index(drop=True), df_test_sub_features.reset_index(drop=True)], ignore_index=True, sort=False, axis = 1)
df_test_C1_sm=scipy.sparse.csr_matrix(df_test_C1.values).copy()


In [22]:
predictions = fakenews_model.predict(df_test_C1_sm)
print (classification_report(df_test['fake'], predictions))

              precision    recall  f1-score   support

       False       0.85      0.95      0.90      1535
        True       0.74      0.43      0.55       471

    accuracy                           0.83      2006
   macro avg       0.79      0.69      0.72      2006
weighted avg       0.82      0.83      0.81      2006



## Input data Combination 2 (C2)- normalised title, authors, site_name, text_length, type

In [23]:
normal_title_tfidf = get_tfidf(df_train['normal_title'])
df_train_normal_title_tfidf = pd.DataFrame(normal_title_tfidf['text_tfidf'].toarray())

df_train_C2 = pd.concat([df_train_normal_title_tfidf.reset_index(drop=True), df_train_sub_features.reset_index(drop=True)], ignore_index=True, sort=False, axis = 1)
df_train_C2_sm=scipy.sparse.csr_matrix(df_train_C2.values).copy() #sparse matrix


In [24]:
fakenews_model = MultinomialNB().fit(df_train_C2_sm, df_train['fake'])

In [25]:
test_normal_title_bow = normal_title_tfidf['bow_transformer'].transform(df_test['normal_title'].apply(' '.join))
test_normal_title_tfidf = normal_title_tfidf['tfidf_transformer'].transform(test_normal_title_bow)
df_test_normal_title_tfidf = pd.DataFrame(test_normal_title_tfidf.toarray())

df_test_C2 = pd.concat([df_test_normal_title_tfidf.reset_index(drop=True), df_test_sub_features.reset_index(drop=True)], ignore_index=True, sort=False, axis = 1)
df_test_C2_sm=scipy.sparse.csr_matrix(df_test_C2.values).copy()


In [26]:
predictions = fakenews_model.predict(df_test_C2_sm)
print (classification_report(df_test['fake'], predictions))

              precision    recall  f1-score   support

       False       0.85      0.95      0.90      1535
        True       0.74      0.44      0.55       471

    accuracy                           0.83      2006
   macro avg       0.79      0.69      0.72      2006
weighted avg       0.82      0.83      0.81      2006



## Input data Combination 3 (C3)- proper nouns, verbs, adjectives, authors, site_name, text_length, type

In [27]:
proper_nouns_tfidf = get_tfidf(df_train['proper_nouns'])
verbs_tfidf = get_tfidf(df_train['verbs'])
adj_tfidf = get_tfidf(df_train['adj'])

df_train_proper_nouns_tfidf = pd.DataFrame(proper_nouns_tfidf['text_tfidf'].toarray())
df_train_verbs_tfidf = pd.DataFrame(verbs_tfidf['text_tfidf'].toarray())
df_train_adj_tfidf = pd.DataFrame(adj_tfidf['text_tfidf'].toarray())

df_train_C3 = pd.concat([df_train_proper_nouns_tfidf.reset_index(drop=True),df_train_verbs_tfidf.reset_index(drop=True),df_train_adj_tfidf.reset_index(drop=True),df_train_sub_features.reset_index(drop=True)], ignore_index=True, sort=False, axis = 1)
df_train_C3_sm=scipy.sparse.csr_matrix(df_train_C3.values).copy()

In [28]:
fakenews_model = MultinomialNB().fit(df_train_C3_sm, df_train['fake'])

In [29]:
test_proper_nouns_bow = proper_nouns_tfidf['bow_transformer'].transform(df_test['proper_nouns'].apply(' '.join))
test_proper_nouns_tfidf = proper_nouns_tfidf['tfidf_transformer'].transform(test_proper_nouns_bow)
df_test_proper_nouns_tfidf = pd.DataFrame(test_proper_nouns_tfidf.toarray())

test_verbs_bow = verbs_tfidf['bow_transformer'].transform(df_test['verbs'].apply(' '.join))
test_verbs_tfidf = verbs_tfidf['tfidf_transformer'].transform(test_verbs_bow)
df_test_verbs_tfidf = pd.DataFrame(test_verbs_tfidf.toarray())

test_adj_bow = adj_tfidf['bow_transformer'].transform(df_test['adj'].apply(' '.join))
test_adj_tfidf = adj_tfidf['tfidf_transformer'].transform(test_adj_bow)
df_test_adj_tfidf = pd.DataFrame(test_adj_tfidf.toarray())

df_test_C3 = pd.concat([df_test_proper_nouns_tfidf.reset_index(drop=True),df_test_verbs_tfidf.reset_index(drop=True),df_test_adj_tfidf.reset_index(drop=True),df_test_sub_features.reset_index(drop=True)], ignore_index=True, sort=False, axis = 1)
df_test_C3_sm=scipy.sparse.csr_matrix(df_test_C3.values).copy()


In [30]:
predictions = fakenews_model.predict(df_test_C3_sm)
print (classification_report(df_test['fake'], predictions))

              precision    recall  f1-score   support

       False       0.85      0.95      0.90      1535
        True       0.74      0.43      0.55       471

    accuracy                           0.83      2006
   macro avg       0.79      0.69      0.72      2006
weighted avg       0.82      0.83      0.81      2006



## Input data Combination 4 (C4)- named entities, verbs, adjectives, authors, site_name, text_length, type

In [31]:
ner_tfidf = get_tfidf(df_train['ner'])
verbs_tfidf = get_tfidf(df_train['verbs'])
adj_tfidf = get_tfidf(df_train['adj'])

df_train_ner_tfidf = pd.DataFrame(ner_tfidf['text_tfidf'].toarray())
df_train_verbs_tfidf = pd.DataFrame(verbs_tfidf['text_tfidf'].toarray())
df_train_adj_tfidf = pd.DataFrame(adj_tfidf['text_tfidf'].toarray())

df_train_C4 = pd.concat([df_train_ner_tfidf.reset_index(drop=True),df_train_verbs_tfidf.reset_index(drop=True),df_train_adj_tfidf.reset_index(drop=True),df_train_sub_features.reset_index(drop=True)], ignore_index=True, sort=False, axis = 1)
df_train_C4_sm=scipy.sparse.csr_matrix(df_train_C4.values).copy()


In [32]:
fakenews_model = MultinomialNB().fit(df_train_C4_sm, df_train['fake'])

In [33]:
test_ner_bow = ner_tfidf['bow_transformer'].transform(df_test['ner'].apply(' '.join))
test_ner_tfidf = ner_tfidf['tfidf_transformer'].transform(test_ner_bow)
df_test_ner_tfidf = pd.DataFrame(test_ner_tfidf.toarray())

test_verbs_bow = verbs_tfidf['bow_transformer'].transform(df_test['verbs'].apply(' '.join))
test_verbs_tfidf = verbs_tfidf['tfidf_transformer'].transform(test_verbs_bow)
df_test_verbs_tfidf = pd.DataFrame(test_verbs_tfidf.toarray())

test_adj_bow = adj_tfidf['bow_transformer'].transform(df_test['adj'].apply(' '.join))
test_adj_tfidf = adj_tfidf['tfidf_transformer'].transform(test_adj_bow)
df_test_adj_tfidf = pd.DataFrame(test_adj_tfidf.toarray())

df_test_C4 = pd.concat([df_test_ner_tfidf.reset_index(drop=True),df_test_verbs_tfidf.reset_index(drop=True),df_test_adj_tfidf.reset_index(drop=True),df_test_sub_features.reset_index(drop=True)], ignore_index=True, sort=False, axis = 1)
df_test_C4_sm=scipy.sparse.csr_matrix(df_test_C4.values).copy()

In [34]:
predictions = fakenews_model.predict(df_test_C4_sm)
print (classification_report(df_test['fake'], predictions))

              precision    recall  f1-score   support

       False       0.85      0.95      0.90      1535
        True       0.74      0.43      0.55       471

    accuracy                           0.83      2006
   macro avg       0.79      0.69      0.72      2006
weighted avg       0.82      0.83      0.81      2006



## Input data Combination 5 (C5)- adverbs, verbs, adjectives, authors, site_name, text_length, type

In [35]:
adverbs_tfidf = get_tfidf(df_train['adverbs'])
verbs_tfidf = get_tfidf(df_train['verbs'])
adj_tfidf = get_tfidf(df_train['adj'])

df_train_adverbs_tfidf = pd.DataFrame(adverbs_tfidf['text_tfidf'].toarray())
df_train_verbs_tfidf = pd.DataFrame(verbs_tfidf['text_tfidf'].toarray())
df_train_adj_tfidf = pd.DataFrame(adj_tfidf['text_tfidf'].toarray())

df_train_C5 = pd.concat([df_train_adverbs_tfidf.reset_index(drop=True),df_train_verbs_tfidf.reset_index(drop=True),df_train_adj_tfidf.reset_index(drop=True),df_train_sub_features.reset_index(drop=True)], ignore_index=True, sort=False, axis = 1)
df_train_C5_sm=scipy.sparse.csr_matrix(df_train_C5.values).copy()

In [36]:
fakenews_model = MultinomialNB().fit(df_train_C5_sm, df_train['fake'])

In [37]:
test_adverbs_bow = adverbs_tfidf['bow_transformer'].transform(df_test['adverbs'].apply(' '.join))
test_adverbs_tfidf = adverbs_tfidf['tfidf_transformer'].transform(test_adverbs_bow)
df_test_adverbs_tfidf = pd.DataFrame(test_adverbs_tfidf.toarray())

test_verbs_bow = verbs_tfidf['bow_transformer'].transform(df_test['verbs'].apply(' '.join))
test_verbs_tfidf = verbs_tfidf['tfidf_transformer'].transform(test_verbs_bow)
df_test_verbs_tfidf = pd.DataFrame(test_verbs_tfidf.toarray())

test_adj_bow = adj_tfidf['bow_transformer'].transform(df_test['adj'].apply(' '.join))
test_adj_tfidf = adj_tfidf['tfidf_transformer'].transform(test_adj_bow)
df_test_adj_tfidf = pd.DataFrame(test_adj_tfidf.toarray())

df_test_C5 = pd.concat([df_test_adverbs_tfidf.reset_index(drop=True),df_test_verbs_tfidf.reset_index(drop=True),df_test_adj_tfidf.reset_index(drop=True),df_test_sub_features.reset_index(drop=True)], ignore_index=True, sort=False, axis = 1)
df_test_C5_sm=scipy.sparse.csr_matrix(df_test_C5.values).copy()

In [38]:
predictions = fakenews_model.predict(df_test_C5_sm)
print (classification_report(df_test['fake'], predictions))

              precision    recall  f1-score   support

       False       0.85      0.95      0.90      1535
        True       0.74      0.43      0.55       471

    accuracy                           0.83      2006
   macro avg       0.79      0.69      0.72      2006
weighted avg       0.82      0.83      0.81      2006



## Input data Combination 2 (C2)- normalised title, authors, site_name, text_length, type with ComplementNB

In [39]:
normal_title_tfidf = get_tfidf(df_train['normal_title'])
df_train_normal_title_tfidf = pd.DataFrame(normal_title_tfidf['text_tfidf'].toarray())

df_train_C2 = pd.concat([df_train_normal_title_tfidf.reset_index(drop=True), df_train_sub_features.reset_index(drop=True)], ignore_index=True, sort=False, axis = 1)
df_train_C2_sm=scipy.sparse.csr_matrix(df_train_C2.values).copy() #sparse matrix


In [40]:
fakenews_model = ComplementNB().fit(df_train_C2_sm, df_train['fake']) # suited for imbalanced data sets as per documentation

In [41]:
test_normal_title_bow = normal_title_tfidf['bow_transformer'].transform(df_test['normal_title'].apply(' '.join))
test_normal_title_tfidf = normal_title_tfidf['tfidf_transformer'].transform(test_normal_title_bow)
df_test_normal_title_tfidf = pd.DataFrame(test_normal_title_tfidf.toarray())

df_test_C2 = pd.concat([df_test_normal_title_tfidf.reset_index(drop=True), df_test_sub_features.reset_index(drop=True)], ignore_index=True, sort=False, axis = 1)
df_test_C2_sm=scipy.sparse.csr_matrix(df_test_C2.values).copy()


In [42]:
predictions = fakenews_model.predict(df_test_C2_sm)
print (classification_report(df_test['fake'], predictions))

              precision    recall  f1-score   support

       False       0.85      0.95      0.90      1535
        True       0.74      0.44      0.55       471

    accuracy                           0.83      2006
   macro avg       0.79      0.69      0.72      2006
weighted avg       0.82      0.83      0.81      2006



Complement NB is not having any better results than Multinomial NB. So we will not pursue this algorithm further.

## Input data Combination 6 (C6)- normalised title + Categorical Prediction (authors, site_name, type)

In [4]:
df["type"]=df["type"].astype('category')
df["site_name"]=df["site_name"].astype('category')
df["authors"]=df["authors"].astype('category')
df_main, df_sub = train_test_split(df, test_size=0.33, random_state=149) # for sub modeling intermediate prediction

df_train, df_test = train_test_split(df_sub, test_size=0.1, random_state=149) 


In [5]:
inter_prediction_model = CategoricalNB().fit(df_train[['type', 'site_name', 'authors']], df_train['fake'])

In [6]:
inter_predictions = inter_prediction_model.predict(df_test[['type', 'site_name', 'authors']])
print (classification_report(df_test['fake'], inter_predictions))

              precision    recall  f1-score   support

       False       0.83      0.96      0.89       368
        True       0.78      0.42      0.55       126

    accuracy                           0.82       494
   macro avg       0.80      0.69      0.72       494
weighted avg       0.82      0.82      0.80       494



We will use a simplified model as shown above that makes an intermediate prediction based on just the 3 categorical features (non-textual matter) - authors, site_name, type. We will use this prediction as an input in the larger model which includes the text.

In [7]:
# re-train inter_prediction_model on entire sub df
inter_prediction_model = CategoricalNB().fit(df_sub[['type', 'site_name', 'authors']], df_sub['fake'])

In [8]:
# apply on the main df
inter_predictions = inter_prediction_model.predict(df_main[['type', 'site_name', 'authors']])
print (classification_report(df_main['fake'], inter_predictions))

              precision    recall  f1-score   support

       False       0.85      0.95      0.90      7582
        True       0.76      0.48      0.58      2446

    accuracy                           0.84     10028
   macro avg       0.80      0.71      0.74     10028
weighted avg       0.83      0.84      0.82     10028



In [10]:
inter_predictions=pd.Series(inter_predictions).astype('float64')
inter_predictions=inter_predictions*2000 # for boosting weight
df_main=pd.concat([df_main.reset_index(drop=True),inter_predictions.rename('inter_predictions').reset_index(drop=True)],axis=1)
# renamed series otherwise will take a default name like 0

In [11]:
df_train, df_test = train_test_split(df_main, test_size=0.2, random_state=149)

In [12]:
df_train_sub_features = df_train[['inter_predictions']].copy()
df_test_sub_features = df_test[['inter_predictions']].copy()

After running above step, everything below is similar to C1 to C5. You can even run all the C1 to C5 now to see the effect of sub prediction.

In [15]:
normal_title_tfidf = get_tfidf(df_train['normal_title'])
df_train_normal_title_tfidf = pd.DataFrame(normal_title_tfidf['text_tfidf'].toarray())

df_train_C2 = pd.concat([df_train_normal_title_tfidf.reset_index(drop=True), df_train_sub_features.reset_index(drop=True)], ignore_index=True, sort=False, axis = 1)
df_train_C2_sm=scipy.sparse.csr_matrix(df_train_C2.values).copy() #sparse matrix


In [16]:
fakenews_model = MultinomialNB().fit(df_train_C2_sm, df_train['fake'])

In [17]:
test_normal_title_bow = normal_title_tfidf['bow_transformer'].transform(df_test['normal_title'].apply(' '.join))
test_normal_title_tfidf = normal_title_tfidf['tfidf_transformer'].transform(test_normal_title_bow)
df_test_normal_title_tfidf = pd.DataFrame(test_normal_title_tfidf.toarray())

df_test_C2 = pd.concat([df_test_normal_title_tfidf.reset_index(drop=True), df_test_sub_features.reset_index(drop=True)], ignore_index=True, sort=False, axis = 1)
df_test_C2_sm=scipy.sparse.csr_matrix(df_test_C2.values).copy()


In [18]:
predictions = fakenews_model.predict(df_test_C2_sm)
print (classification_report(df_test['fake'], predictions))

              precision    recall  f1-score   support

       False       0.85      0.95      0.90      1535
        True       0.74      0.44      0.55       471

    accuracy                           0.83      2006
   macro avg       0.79      0.69      0.72      2006
weighted avg       0.82      0.83      0.81      2006

