<h3>Comparing all Titles from News Sources</h3>

In [1]:
def title_extract(file_name):
    """
    Extracts all article titles from txt files
    Input: a txt file (string)
    Output: article title names (list)
    """
    titles = []
    text_file = open(file_name,"r")
    for line in text_file:
        line = line.strip()
        titles.append(line)
    return titles

In [2]:
#Extract all article names from txt file
fox_titles = title_extract("fox.txt")
abc_titles = title_extract("abc.txt")
wsj_titles = title_extract("wsj.txt")
npr_titles = title_extract("npr.txt")
nyt_titles = title_extract("nytimes.txt")
washington_titles = title_extract("washington.txt")
huffpost_titles = title_extract("Huffpost.txt")
breit_titles = title_extract("breitbart.txt")
econ_titles = title_extract("economist.txt")

In [44]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer

In [48]:
#Make a dataframe for all titles
organize = pd.DataFrame(columns =['title','type'])
liberal = huffpost_titles+nyt_titles+abc_titles+washington_titles
conservative = breit_titles+fox_titles
other = npr_titles+econ_titles+wsj_titles
total_titles = liberal+conservative+other

In [51]:
for i in range(len(total_titles)):
    if i <= len(liberal):
        organize.loc[i] = [total_titles[i], "Liberal"]
    if (i > len(liberal) and i <= len(liberal+conservative)):
        organize.loc[i] = [total_titles[i], "Conservative"]
    if i > len(liberal+conservative):
        organize.loc[i] = [total_titles[i], "Other"]

In [57]:
tfidf = TfidfVectorizer().fit_transform(total_titles)
pairwise_similarity = tfidf * tfidf.T

In [58]:
#Get all tuples of titles with greatest similarity
x = []
y = []
for row in range(pairwise_similarity.shape[0]):
    for col in range(pairwise_similarity.shape[1]):
        if pairwise_similarity[row,col] >= 0.75:
            if round(pairwise_similarity[row,col],4) != 1:
                x.append(row)
                y.append(col)
similar_titles = zip(x,y)

In [60]:
#Get all unique tuples
unique = []
for item in similar_titles:
    if not (item in unique or tuple([item[1], item[0]]) in unique):
        unique.append(item)                    

In [146]:
#Find the titles that are associated with tuples
sim_titles = []
for t in unique:
    sim_titles.append(zip(organize.loc[t[0]], organize.loc[t[1]]))
pd.set_option('display.width', 1000)
similar_title = pd.DataFrame(sim_titles, columns =['Similar Titles','Type'])
similar_title = pd.concat([similar_title.iloc[0:5],similar_title.iloc[8:14],similar_title.iloc[16:20],similar_title.iloc[24:25]])
similar_title = similar_title.set_index([range(len(similar_title))])

In [147]:
#Dataframe of pairwise title comparisons
pd.options.display.max_colwidth = 300
similar_title

Unnamed: 0,Similar Titles,Type
0,"(Here Are The Many Ways Trump Scares The Crap Out Of Democrats, Here Are The Many Ways Donald Trump Scares The Crap Out Of Democrats)","(Liberal, Liberal)"
1,"(Obama EPA Head Savages Trump's Environmental Policies, Obama EPA Head Savages Donald Trump's Environmental Policies)","(Liberal, Liberal)"
2,"(U.S. Politicians Want To Quiz British Spy Who Wrote Russian Dossier On Trump, U.S. Politicians Seek To Quiz British Spy Who Wrote Russian Dossier On Donald Trump)","(Liberal, Liberal)"
3,"(McCain Calls On Trump To Retract Obama Wiretap Claim Or Prove It, McCain to Trump: Retract wiretapping claim or prove it)","(Liberal, Liberal)"
4,"(Tracing where Trump gets some of his news ,, Tracing where President Trump gets some of his news ,)","(Liberal, Liberal)"
5,"(WH: Trump meeting with Russian ambassador absurd, WH calls reported Trump meeting with Russian ambassador absurd)","(Liberal, Liberal)"
6,"(What we know about Trumps unsubstantiated wiretapping..., What we know about Trumps unsubstantiated wiretapping allegations)","(Liberal, Liberal)"
7,"(Trump, Netanyahu speak by phone, discuss Irans malevolent..., Trump, Netanyahu speak by phone, discuss Irans malevolent behavior)","(Liberal, Liberal)"
8,"(White House calls reported Trump meeting with Russian ambassador absurd, WH calls reported Trump meeting with Russian ambassador absurd)","(Liberal, Liberal)"
9,"(Flynns lawyer told Trump team about lobbying during..., Flynns lawyer told Trump team about lobbying during transition)","(Liberal, Liberal)"


<p>Comparing the titles for liberal, conservative, and other news sources for the most similar titles, there are the most similarites within groups rather than between groups. Liberal news sources share the most similar titles with other liberal news sources. Conservative news sources also have the most similar titles to other conservative news sources. This may mean that the type of news that liberal news sources cover are very different from those of conservative news sources. Conservative news sources never share a similar article title with liberal news sources. As shown in the table, conservative news sources and liberal news also share similar titles to "other" news sources. This implies that "other" news sources cover some liberal and some conservative news.  </p>