# Multinomial Logistic Regression on Liar Dataset
** Names: ** Barbara, Eva & Joyce
<br><br>
In this notebook we will implement multinomial logistic regression in various different ways, to see what the best multinomial model is. As we will include sentiment scores later on, we will use this dataset from the start rather than just the normal train dataset. This dataset includes all the information of the normal liar dataset, but includes the sentiment scores for each statement. After preprocessing the Liar Dataset we will apply Logistic Regression. We will first do this by using Count Vectorizer to represent the data of our train and test dataset. We will then use TF-IDF Vectorizer to represent our data and implement Logistic Regression. Once we have these two different logistic regression models, we reduce the dimensionality of the data by using SVD and apply this to our (so-far) best-performing model. Then, apply filtering of words in the statements and then test this on our best performing model. After doing this we will add the sentiment score and political score as features to our LR model. At the end of the notebook we will show whether these new features have any impact on the regression at all. Finally, we will present an overview of the accuracies of our different multinomial models.

### Index
1. ** Preprocessing Liar dataset ** (adding sentiment + political score)
2. ** Multinomial LR using `Count Vectorizer`**
    - regression
    - feature importance 
3. ** Multinomial LR with `TF-IDF Vectorizer`** (on data without sentiment & political score)
    - non SVD 
    - with SVD 
    
4. ** Filtering ** 
    - non SVD TF-IDF model (since this is the best performing model)
   
5. ** Implementing more features: sentiment & political score ** (on best multinomial model)
    - regression
    - feature importance

6. ** Final results & remarks ** 

In [28]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.metrics import accuracy_score
import numpy as np
from sklearn import decomposition
from collections import OrderedDict

In [9]:
# open the csv file.
df_liar_sentiment = pd.read_csv("sentiment_train.csv", encoding="utf8", sep=",")
df_liar_sentiment.shape

(10240, 16)

# 1. Preprocessing

##### 1. Adding truth scoring

In [10]:
# scoring metric for the different truthvalues 
truthlabels ={"false":0, "barely-true":1,"half-true":2,"mostly-true":3,"true":4, "pants-fire":5}

# classification formula for truth value
def classify_truth(text):
    if text not in truthlabels.keys(): 
        return -1
    else:
        return truthlabels[text]

# add this new class of truth scores
df_liar_sentiment["truth-score"] = df_liar_sentiment["truth-value"].apply(classify_truth) 

##### 2. Adding political scoring
Since there are quite a lot of political labels in the dataset, we have focussed to only include the 3 labels that are the most occuring in the dataset: 'republican', 'democrat' and 'none'. 

In [11]:
# let's see how many political preferences there are
politics = dict()
for line in df_liar_sentiment["politics"]:
    if line not in politics.keys():
        politics[line] = 1
    else:
        politics[line] +=1
print(politics)

{'republican': 4497, 'democrat': 3336, 'none': 1744, 'organization': 219, 'independent': 147, 'columnist': 35, 'activist': 39, 'talk-show-host': 26, 'libertarian': 40, 'newsmaker': 56, 'journalist': 38, 'labor-leader': 11, 'state-official': 20, 'business-leader': 9, 'education-official': 2, 'tea-party-member': 10, nan: 2, 'green': 3, 'liberal-party-canada': 1, 'government-body': 1, 'Moderate': 1, 'democratic-farmer-labor': 1, 'ocean-state-tea-party-action': 1, 'constitution-party': 1}


In [12]:
# we'll only focus on the 3 most occuring political preferences
politics = dict({"republican": 0, "democrat" : 1, "none" : 2})

# classification formula for political background
def classify_politics(text):
    if text not in politics.keys(): 
        return -1
    else:
        return politics[text]

# add this new class of politic scores
df_liar_sentiment["political-score"] = df_liar_sentiment["politics"].apply(classify_politics) 

In [13]:
# filtering out the statement with a truth-score / a political preference that we don't like
df_reduced_sent1 = df_liar_sentiment[df_liar_sentiment["truth-score"] != -1]
df_reduced_sent2 = df_reduced_sent1[df_reduced_sent1["political-score"] != -1]
print(df_reduced_sent2.shape)
df_reduced_sent2.head(1)

(9577, 18)


Unnamed: 0,id,truth-value,text,topic,name,job,state,politics,count1,count2,count3,count4,count5,context,pos-sentiment,neg-sentiment,truth-score,political-score
0,2635.json,False,Says the Annies List political group supports ...,abortion,dwayne-bohac,State representative,Texas,republican,0.0,1.0,0.0,0.0,0.0,a mailer,0.007972,0.012908,0,0


In [14]:
# reducing the dataframe to only the important/most interesting data
df_reduced_sent = df_reduced_sent2.drop(['id', 'count1', 'count2', 'count3', 'count4', 'count5'], axis=1)  
df_reduced_sent.head(1)

Unnamed: 0,truth-value,text,topic,name,job,state,politics,context,pos-sentiment,neg-sentiment,truth-score,political-score
0,False,Says the Annies List political group supports ...,abortion,dwayne-bohac,State representative,Texas,republican,a mailer,0.007972,0.012908,0,0


##### 3. Doing the same for the test data

In [15]:
# create df of the test data
df_test_sent = pd.read_csv("sentiment_test.csv", encoding="utf8", sep=",", names=["id", "truth-value", 
                                                                     "text", "topic", "name", "job", 
                                                                     "state", "politics", "count1", "count2", 
                                                                     "count3", "count4", "count5", "context", 
                                                                                        "pos-sentiment", "neg-sentiment"])
df_test_sent["truth-score"] = df_test_sent["truth-value"].apply(classify_truth)
df_test_sent["political-score"] = df_test_sent["politics"].apply(classify_politics)

df_test_sent_reduced1 = df_test_sent[df_test_sent["truth-score"] != -1]
df_test_sent_reduced2 = df_test_sent_reduced1[df_test_sent_reduced1["political-score"] != -1]

In [16]:
df_test_sent_reduced = df_test_sent_reduced2.drop(['id', 'count1', 'count2', 'count3', 'count4', 'count5'], axis=1)  
df_test_sent_reduced.head(1)

Unnamed: 0,truth-value,text,topic,name,job,state,politics,context,pos-sentiment,neg-sentiment,truth-score,political-score
1,True,Building a wall on the U.S.-Mexico border will...,immigration,rick-perry,Governor,Texas,republican,Radio interview,0.0079719387755102,0.0129081632653061,4,0


## 2.  Regression using CountVec

#### 1. *Regression*

In [30]:
count_vect = CountVectorizer()
X_train_cv = count_vect.fit(df_reduced_sent.text)          #our X matrix is the text from the statements 
X_train_cv = count_vect.transform(df_reduced_sent.text)    

y_train_cv = df_reduced_sent["truth-score"].values         #our y vector is the list of all the truth labels     

In [34]:
logreg = LogisticRegression(solver='lbfgs', multi_class='multinomial')
logreg.fit(X_train_cv, y_train_cv)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='multinomial',
          n_jobs=1, penalty='l2', random_state=None, solver='lbfgs',
          tol=0.0001, verbose=0, warm_start=False)

In [32]:
X_test_cv = count_vect.transform(df_test_sent_reduced.text)        # X matrix is again text from statements
y_test_cv = df_test_sent_reduced["truth-score"].values             # y vector is list of all the truth values

In [35]:
y_hat_test_cv = logreg.predict(X_test_cv)

# evaluate using accuracy: proportion of correctly predicted over total
print(accuracy_score(y_test_cv, y_hat_test_cv))
print(accuracy_score(y_test_cv, y_hat_test_cv, normalize=False))

0.2434928631402183
290


#### Comments
> The accuracy of this multinomial model is unfortunately very low. However, we expected the accuracy to be low, as the accuracy of the binomial model was only around 60%. As there are 6 different labels for validity in this case with some labels between false and true the difference between the statements is smaller than in the case of only true and false and thus probably more difficult to distinguish. We saw from the binomial model that it was already difficult to predict the label of the false and true statements.  

#### 2. *Feature Importance*
Here we create dictionaries for all the features, which are all the different words of the train dataset, with their corresponding weight in this logistic regression model. All the words have a different weight for the six different labels. We therefor have six dictionaries with all the words and their importance weight.

In [36]:
print(logreg.coef_)
print(logreg.coef_.shape)
# This coefficient matrix has 6 rows as there are 6 different labels. 
#For every label each word has a different weight.

[[-0.07141639  0.00501307 -0.04506177 ... -0.04890659 -0.02738769
  -0.06304054]
 [-0.0401037  -0.09912629  0.23734855 ... -0.01895194  0.31864874
  -0.25438965]
 [-0.11112725  0.21189963 -0.03978087 ... -0.11737302 -0.07922386
  -0.04646419]
 [-0.01736026  0.33066743 -0.07158738 ...  0.34770544 -0.15955491
  -0.14338132]
 [-0.0070142  -0.02009042 -0.06017099 ... -0.13507738 -0.04778029
  -0.18076825]
 [ 0.2470218  -0.42836342 -0.02074755 ... -0.02739652 -0.00470198
   0.68804395]]
(6, 11765)


In [37]:
#Coefficient dictionaries for the different labels

#label false 
coef_dict1_false = dict()                                    
for n, key in enumerate(count_vect.vocabulary_.keys()):
    coef_dict1_false[key] = logreg.coef_[0][n] 
    
#label barely-true
coef_dict1_barelytrue = dict()
for n, key in enumerate(count_vect.vocabulary_.keys()):
    coef_dict1_barelytrue[key] = logreg.coef_[1][n] 

#label half-true
coef_dict1_halftrue = dict()
for n, key in enumerate(count_vect.vocabulary_.keys()):
    coef_dict1_halftrue[key] = logreg.coef_[2][n] 
    
#label mostly-true
coef_dict1_mostlytrue = dict()
for n, key in enumerate(count_vect.vocabulary_.keys()):
    coef_dict1_mostlytrue[key] = logreg.coef_[3][n] 
    
#label true
coef_dict1_true = dict()
for n, key in enumerate(count_vect.vocabulary_.keys()):
    coef_dict1_true[key] = logreg.coef_[4][n] 
    
#label pants-fire 
coef_dict1_pantsfire = dict()
for n, key in enumerate(count_vect.vocabulary_.keys()):
    coef_dict1_pantsfire[key] = logreg.coef_[5][n] 

In [38]:
#Ordering the different coefficient dictionaries

ordered_coefs_false = [(k, coef_dict1_false[k]) for k in sorted(coef_dict1_false, key=coef_dict1_false.get, reverse=True)]

ordered_coefs_barelytrue = [(k, coef_dict1_barelytrue[k]) for k in sorted(coef_dict1_barelytrue, key=coef_dict1_barelytrue.get, reverse=True)]

ordered_coefs_halftrue = [(k, coef_dict1_halftrue[k]) for k in sorted(coef_dict1_halftrue, key=coef_dict1_halftrue.get, reverse=True)]

ordered_coefs_mostlytrue = [(k, coef_dict1_mostlytrue[k]) for k in sorted(coef_dict1_mostlytrue, key=coef_dict1_mostlytrue.get, reverse=True)]

ordered_coefs_true = [(k, coef_dict1_true[k]) for k in sorted(coef_dict1_true, key=coef_dict1_true.get, reverse=True)]

ordered_coefs_pantsfire = [(k, coef_dict1_pantsfire[k]) for k in sorted(coef_dict1_pantsfire, key=coef_dict1_pantsfire.get, reverse=True)]

In [39]:
ordered_coefs_false[0:10]

[('reasonable', 1.551299571555561),
 ('authored', 1.4732789953865582),
 ('verifies', 1.3096547758452828),
 ('concludes', 1.2655602916155169),
 ('guards', 1.2114423464254092),
 ('dioxide', 1.206493037160284),
 ('ira', 1.2055905251486978),
 ('spill', 1.19673027826057),
 ('farouk', 1.1833886135054632),
 ('rihanna', 1.1783243697466481)]

In [40]:
ordered_coefs_barelytrue[0:10]

[('drop', 1.3007233358230523),
 ('offender', 1.2934828071725997),
 ('airports', 1.2718334283154786),
 ('identity', 1.2370813286008355),
 ('heidi', 1.2303936122788284),
 ('raise', 1.2295257720098194),
 ('weakening', 1.1762635684278318),
 ('webb', 1.1716577894897826),
 ('regularly', 1.1669798094247215),
 ('package', 1.158812276538333)]

In [41]:
ordered_coefs_halftrue[0:10]

[('steady', 1.5143594902381408),
 ('crushes', 1.5054199700817747),
 ('cato', 1.4872095839764192),
 ('himself', 1.3580486698153738),
 ('11023', 1.2894362510873283),
 ('2021', 1.2448034686282377),
 ('wilson', 1.2268877620782388),
 ('indefinitely', 1.202461399199106),
 ('stadium', 1.2008768750511016),
 ('midwestern', 1.1989869225218057)]

In [42]:
ordered_coefs_mostlytrue[0:10]

[('concept', 1.8823921133934578),
 ('oscar', 1.798451391812844),
 ('stimulate', 1.6314212493249507),
 ('stands', 1.4542596366590907),
 ('seeking', 1.2675739020164032),
 ('thousand', 1.224958779688441),
 ('weems', 1.1937898948326273),
 ('amending', 1.1647361671150147),
 ('focused', 1.1640392201365017),
 ('favor', 1.1407078796881214)]

In [43]:
ordered_coefs_true[0:10]

[('arent', 1.5985036509803325),
 ('create', 1.407444839586859),
 ('thompson', 1.3721746625624904),
 ('burdensome', 1.2948823119855133),
 ('hated', 1.2527226155427307),
 ('calderon', 1.2409849744771508),
 ('sayspeter', 1.2004616692205996),
 ('projects', 1.1673800329571766),
 ('earthquake', 1.1385532376865188),
 ('directed', 1.1379934450253566)]

In [44]:
ordered_coefs_pantsfire[0:10]

[('alliance', 1.8405279368534364),
 ('avowed', 1.4767565631882693),
 ('jv', 1.453418944083574),
 ('phds', 1.4309000472424644),
 ('greensboro', 1.4211868545478252),
 ('probationers', 1.3432274525330101),
 ('khrushchev', 1.3238150037630398),
 ('promptly', 1.3120623369091902),
 ('internationally', 1.2939867159179281),
 ('adjust', 1.2744458387466617)]

### Comments
>Here we can see which words, for each label, were important in determining its label. Interestingly, the words with the highest weigt for "true", are different from the words with the highest weight in binomial logistic regression. Furthermore, the words with the lowest weight in the binomial logistic regression, which were the words contributed most to labeling the statement as false, did not correspond with the highest weights of the "false" label. As there are multiple labels in this case, the whole model is ofcourse different, however we would have expected some simmilarities. Furthermore some of the words with the highest weights seem very strange to actually be of importance in determining the label of the statement such as "jv" and "11023". However, as our model has very low accuracy, we cannot really extract useful information from these coefficient matrices. 

## 3. Regression using TF-IDF Vectorizer
Here, we will do a Logistic Regression using the TF-IDF vectorizer, but not yet taking into acount the sentiment/political scorings. 

##### 1. Regression

In [17]:
tfidf_vect = TfidfVectorizer()

# creating the training vector
X_train = tfidf_vect.fit(df_reduced_sent.text)
X_train = tfidf_vect.transform(df_reduced_sent.text)
y_train = df_reduced_sent["truth-score"].values

X_train.shape

(9577, 11765)

In [18]:
lr = LogisticRegression(solver='lbfgs',multi_class='multinomial')

In [19]:
# Create an instance of Logistic Regression Classifier and fit the data.
lr.fit(X_train, y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='multinomial',
          n_jobs=1, penalty='l2', random_state=None, solver='lbfgs',
          tol=0.0001, verbose=0, warm_start=False)

In [20]:
# transform the test data to the right format, aligning with the training data 
# (so that it has the size of the vocab of the training set)
X_test = tfidf_vect.transform(df_test_sent_reduced.text) 
y_test = df_test_sent_reduced["truth-score"].values
X_test.shape

(1191, 11765)

In [21]:
# evaluating the tfidf model
lr.fit(X_train, y_train)
y_hat_test = lr.predict(X_test)

# evaluate using accuracy: proportion of correctly predicted over total
print(accuracy_score(y_test, y_hat_test))
print(accuracy_score(y_test, y_hat_test, normalize=False))

0.2510495382031906
299


#### Comments
> The accuracy is not that much higher in comparison with the countvectorizer LR from the other notebook (0.245 vs. 0.251). That's kind of disappointing, but we did expect it since for binomial regression it also wasn't that much better. 

##### 2. Let's try dimensionality reduction
Since the Tfidf model gives the highest accuracy we will apply SVD to this model.

In [18]:
# converting train to a matrix
new_train = np.empty([9577, 11765])
array_X_train = X_train.toarray()

for n in range(9577):
    new_train[n] = array_X_train[n]

# converting test to a matrix
new_test = np.empty([1191, 11765])
array_X_test = X_test.toarray()

for n in range(1191):
    new_test[n] = array_X_test[n]

In [19]:
# 50 dimensions
train_SVD50Mat = decomposition.TruncatedSVD(n_components = 50, algorithm = "arpack").fit_transform(new_train)
test_SVD50Mat = decomposition.TruncatedSVD(n_components = 50, algorithm = "arpack").fit_transform(new_test)

In [20]:
# evaluating 50 dimensions SVD model
lr.fit(train_SVD50Mat, y_train)
y_hat_test_SVD = lr.predict(test_SVD50Mat)

# evaluate using accuracy: proportion of correctly predicted over total
print(accuracy_score(y_test, y_hat_test_SVD))
print(accuracy_score(y_test, y_hat_test_SVD, normalize=False))

0.172124265323
205


#### Comments
> Again, as we also saw in the binomial regression, after SVD it doesn't give better results than normal, so there's no point for implementing it. 

## 4. Filtering 
Here we will lemmatize all words, filter out numbers and stop words. After applying the filtering to the text of the statements, we can see if our LR model will iprove by including filtering.

In [6]:
import nltk 
from nltk.corpus import stopwords 
import string 
# preprocessing function to lemmatize and to filter out unimportant words. 
def filtering(text):
    lostrings = text.split(' ')
    new_lostrings = []
    for word in lostrings:
        word = nltk.WordNetLemmatizer().lemmatize(
                word.translate(str.maketrans('', '', string.punctuation)).lower()) # remove punctuation & lemmatize
        if word not in stopwords.words('english') and not word.isdigit(): # remove stopwords & digits
            new_lostrings.append(word)
    return ' '.join(new_lostrings)

In [22]:
df_reduced_sent["filteredtext"] = df_reduced_sent["text"].apply(filtering)

In [23]:
df_test_sent_reduced["filteredtext"] = df_test_sent_reduced["text"].apply(filtering)

In [24]:
tfidf_vect = TfidfVectorizer()

# creating the training vector
X_train_filter = tfidf_vect.fit(df_reduced_sent.filteredtext)
X_train_filter = tfidf_vect.transform(df_reduced_sent.filteredtext)

# creating the test vector
X_test_filter = tfidf_vect.transform(df_test_sent_reduced.filteredtext) 

print(X_train_filter.shape, X_test_filter.shape)

(9577, 10667) (1191, 10667)


In [26]:
# evaluating the filtered tfidf model

lr.fit(X_train_filter, y_train)
y_hat_test = lr.predict(X_test_filter)

# evaluate using accuracy: proportion of correctly predicted over total
print(accuracy_score(y_test, y_hat_test))
print(accuracy_score(y_test, y_hat_test, normalize=False))

0.24769101595298068
295


Comments
>Opposed to what we expected, applying filtering and lemmatization does not seem to improve our model (0.248 vs 0.251). 

## 5. Implementing more features: politics & sentiment
Now, it's finally time to try to implement the new scorings that we added to our data. We will perform the same regression as above, but this time adding 3 new colums signifying the positive score, the negative score and the political score of the statement respectively.  

##### 1. Adding the new columns to the train and test matrices 

In [21]:
# we need to add 3 new columns: one for the positive sentiment score, 
# one for the negative score and one for the political score. 
X_train2 = np.empty([9577, 11768])

for n in range(9577):
    pos = df_reduced_sent["pos-sentiment"].values[n]
    neg = df_reduced_sent["neg-sentiment"].values[n]
    pol = df_reduced_sent["political-score"].values[n]
    X_train2[n] = np.append(new_train[n], [pos, neg, pol])

print(X_train2)   

[[ 0.          0.          0.         ...,  0.00797194  0.01290816  0.        ]
 [ 0.          0.          0.         ...,  0.01148072  0.0142225   1.        ]
 [ 0.          0.          0.         ...,  0.00925808  0.01196854  1.        ]
 ..., 
 [ 0.          0.          0.         ...,  0.00999878  0.01135029  0.        ]
 [ 0.          0.          0.         ...,  0.00741618  0.01049563  1.        ]
 [ 0.          0.          0.         ...,  0.01075098  0.00883072  0.        ]]


In [22]:
# we also need to do this for the test set.
X_test2 = np.empty([1191, 11768])

for n in range(1191):
    pos = df_test_sent_reduced["pos-sentiment"].values[n]
    neg = df_test_sent_reduced["neg-sentiment"].values[n]
    pol = df_test_sent_reduced["political-score"].values[n]
    X_test2[n] = np.append(new_test[n], [pos, neg, pol])

print(X_test2) 

[[ 0.          0.          0.         ...,  0.00797194  0.01290816  0.        ]
 [ 0.          0.          0.         ...,  0.01148072  0.0142225   1.        ]
 [ 0.          0.          0.         ...,  0.00925808  0.01196854  0.        ]
 ..., 
 [ 0.          0.          0.         ...,  0.00970238  0.01297619  1.        ]
 [ 0.          0.          0.         ...,  0.00965795  0.00716801  0.        ]
 [ 0.          0.          0.         ...,  0.00975765  0.01259566  1.        ]]


##### 2. Evaluation / regression

In [23]:
# evaluating the new model
lr.fit(X_train2, y_train)
y_hat_test = lr.predict(X_test2)

# evaluate using accuracy: proportion of correctly predicted over total
print(accuracy_score(y_test, y_hat_test))
print(accuracy_score(y_test, y_hat_test, normalize=False))

0.252728799328
301




#### Comments
> Here you will see that it only classifies 2 more test statements correctly in comparison with our previous model (with accuracy 0.251 vs. 0.253). Therefore, we don't really think the sentiment scoring and the political scoring have any effect on the results, but it was at least interesting to try...

##### 3. Feature importance
We also wanted to know *how* significant our new features were in the regression. Therefore, we evaluated there feature importance below. 

In [24]:
# print out the coefficients 
print(lr.coef_)
print(lr.coef_.shape) #is of size (n_classes, n_features)

[[-0.04752767 -0.41708012 -0.0371028  ..., -0.07067812 -0.27048791
  -0.15765825]
 [-0.03456917 -0.2710325   0.17343884 ...,  0.00767341  0.07877831
  -0.15604081]
 [-0.06063145  0.66533971 -0.03342733 ..., -0.02959719  0.09569888
  -0.0153626 ]
 [-0.03132313  1.26773272 -0.0475349  ...,  0.12738169  0.19015447
   0.10484853]
 [-0.01839588 -0.20426712 -0.03560098 ..., -0.03611805  0.02074004
   0.00946544]
 [ 0.1924473  -1.0406927  -0.01977283 ...,  0.00133826 -0.11488379
   0.21474769]]
(6, 11768)


In [25]:
# adding the names of the newly added columns to the vocabulary of our features
features = []
for key in tfidf_vect.vocabulary_.keys():
    features.append(key)

features = features + ["positive-score", "negative-score", "political-score"]

In [26]:
# creating a dictionary of the coefficient scores of each feature, for each truth-value label
coef_dict_false = dict()
start = 0
for feature in features: 
    coef_dict_false[feature] = lr.coef_[0][start]
    start += 1 
    
coef_dict_barely = dict()
start = 0
for feature in features: 
    coef_dict_barely[feature] = lr.coef_[1][start]
    start += 1 

coef_dict_half = dict()
start = 0
for feature in features: 
    coef_dict_half[feature] = lr.coef_[2][start]
    start += 1 
    
coef_dict_mostly = dict()
start = 0
for feature in features: 
    coef_dict_mostly[feature] = lr.coef_[3][start]
    start += 1 
    
coef_dict_true = dict()
start = 0
for feature in features: 
    coef_dict_true[feature] = lr.coef_[4][start]
    start += 1 
    
coef_dict_pantsfire = dict()
start = 0
for feature in features: 
    coef_dict_pantsfire[feature] = lr.coef_[5][start]
    start += 1 

In [27]:
# most important features determining the "false" labeled statements 
ordered_false_coefs = [(k, coef_dict_false[k]) for k in sorted(coef_dict_false, key=coef_dict_false.get, reverse=True)]
ordered_false_coefs[0:10]

[('campbell', 1.5514009372940964),
 ('reveal', 1.4204777899958099),
 ('vetoing', 1.3378204600073866),
 ('160', 1.3333355222925962),
 ('dioxide', 1.2893928312562191),
 ('authored', 1.2651959516125),
 ('karen', 1.2618004549643798),
 ('fsas', 1.2395506990394158),
 ('spill', 1.2237527964892905),
 ('bureaus', 1.223451243322266)]

In [28]:
# most important features determining the "barely true" labeled statements 
ordered_barely_coefs = [(k, coef_dict_barely[k]) for k in sorted(coef_dict_barely, key=coef_dict_barely.get, reverse=True)]
ordered_barely_coefs[0:10]

[('regularly', 1.4891080350411461),
 ('safer', 1.2653403063840096),
 ('identity', 1.2348181864638454),
 ('topics', 1.2292261122743249),
 ('offender', 1.2274607718830501),
 ('package', 1.189296131791274),
 ('wastes', 1.133781841091499),
 ('thatcher', 1.1283452001889964),
 ('lazy', 1.1278096347111295),
 ('drop', 1.1199054660037748)]

In [29]:
# etc..

#### Comments 
> As you can see here, our new features do not really occur in the top-10 important features for our labels, so therefore we decided to just check per label what their coefficient scores are, as seen below. 

##### 4. Coefficient scores per new feature

In [35]:
# let's see what the importance is of the negative score for all labels. 
labellist = ["false", "barely-true","half-true","mostly-true","true", "pants-fire"]

for n, line in enumerate(lr.coef_):
    print("negative scoring coefficient for", labellist[n], ":", line[-3])

negative scoring coefficient for false : -0.0706781212921
negative scoring coefficient for barely-true : 0.00767340941101
negative scoring coefficient for half-true : -0.0295971886476
negative scoring coefficient for mostly-true : 0.127381687191
negative scoring coefficient for true : -0.0361180504101
negative scoring coefficient for pants-fire : 0.00133826374786


In [36]:
# let's see what the importance is of the positive score for all labels. 
for n, line in enumerate(lr.coef_):
    print("positive scoring coefficient for", labellist[n], ":", line[-2])

positive scoring coefficient for false : -0.27048790946
positive scoring coefficient for barely-true : 0.0787783058196
positive scoring coefficient for half-true : 0.095698883791
positive scoring coefficient for mostly-true : 0.190154473183
positive scoring coefficient for true : 0.0207400362779
positive scoring coefficient for pants-fire : -0.114883789612


In [37]:
# let's see what the importance is of the political score for all labels. 
for n, line in enumerate(lr.coef_):
    print("political scoring coefficient for", labellist[n], ":", line[-1])

political scoring coefficient for false : -0.157658250775
political scoring coefficient for barely-true : -0.156040812142
political scoring coefficient for half-true : -0.0153626006286
political scoring coefficient for mostly-true : 0.104848530231
political scoring coefficient for true : 0.00946543984909
political scoring coefficient for pants-fire : 0.214747693466


#### Comments
> We see that the positive score has the highest coefficient value in the labeling of "mostly true" (giving coefficient = 0.127). For the negative score this is in labeling of "mostly true" too (coefficient = 0.190). And for political score this is in "mostly true" (coefficient = 0.105) and "pants on fire" (coefficient = 0.215). However, in comparison with the most important features of these labels, these coefficient scores are still not really significant, since they all score far below 1 still.

## 6. Final results & remarks
Here are the overall results, nicely presented in a table. So in conclusion, our model *with* the new sentiment and political features works the best, but it doesn't show too much of a difference with the normal tfidf model that we already had.

Model | Accuracy 
--- | --- 
nonSVDCountvec | 0.243492
nonSVDtfidf | 0.25105
SVD50tfidf | 0.17212
nonSVDtfidf + filter | 0.24769
nonSVDtfidf + new features | 0.25273