# News Bias analysis notebook
#### Authors: Alexander Lambert, Casey Mathews, and Shivam Patel
#### Githubs: alambe22, cmathew9, and spatel90
## Description: 
The goal of this notebook is to analysze a given set of articles (found in datasets/articles.txt) with known biases and leanings to see if patterns can be determined in their writing. 
We analyze the following parts of the articles:
- Buzzwords and phrases count
- Emotional word count
- Average word length
- Use of words with negative connotations
- Use of words with positive connotations
- Use of words that indicate opinion (I think, I believe, etc.)
- Use of words that indicate fact (We know, research indicates, etc)
- First person pronoun usage (Does the author present this as their perspective, or as information)

Using the data we gather, the hope is to find patterns that could be used to analyze new articles for bias or factuality.
 


In [4]:
#Article class defintion and reading in
class article:
        __slots__ =["link", "bias", "cont", "buzz", "emo", "neg", "pos", "avgLen", 
                    "opi", "fac", "fPro"]
        def __init__ (self, link, bias):
            self.link = link
            self.bias = bias
            
articles = []
with open("datasets/articles.txt") as fin:
    for line in fin:
        lineList = line.rstrip().split(" ")
        articles.append(article(lineList[1], lineList[0]))


In [7]:
from bs4 import BeautifulSoup
import requests, re

# Return a list of all words in <img> alt text
def get_alt_text(soup):
    img_elements = soup.find_all("img")
    alt_text_words = []
    for img_element in img_elements:
        if('alt' in img_element):
            img_text = img_element['alt']
            img_text = string_cleaner(img_text)
            img_text = img_text.lower()
            words = list(filter(None, img_text.split(" ")))
            alt_text_words += words
    return alt_text_words

# Clean up a string for splitting on space
def string_cleaner(paragraph):
    # Remove apostrophes from the word
    paragraph = paragraph.replace("'", "")
    paragraph = paragraph.replace("’", "")
    # Replace non-alpha-numeric characters with a space
    paragraph = re.sub('[^A-Za-z]+', ' ', paragraph)
    return paragraph

# Returns a list of all words in an article
def get_article(url):
    # Setup
    r = requests.get(url)
    html = r.text
    soup = BeautifulSoup(html, 'html.parser')
    
    # Contains a list of all words from <p> and <li> elements
    article_words = []
    
    # Contains all <p> and <li> elements
    p_li_elements = soup.find_all(["p", "li"])
    for p_li_element in p_li_elements:
        p_li = p_li_element.getText()
        p_li = string_cleaner(p_li)
        # Convert the string to lowercase
        p_li = p_li.lower()
        # Filter out empty strings created by cleaning
        words = list(filter(None, p_li.split(" ")))
        article_words += words
    
    alt_text = get_alt_text(soup)
    article_words += alt_text
    return " ".join(article_words)



In [12]:
def parsePhraseFile(file, phrases):
    with open(file) as fin:
        for line in fin:
            phrases.append(line.strip())

def filterWords(file, words):
    phrases = []
    count = 0
    parsePhraseFile(file, phrases)
    
    for phrase in phrases:
        count += words.count(phrase)
    return count
    
def wordLength(words):
    sum = 0
    wordSet = set(words.split(' '))
    for word in wordSet:
        sum = sum + len(word)
    
    return sum/len(wordSet)

for article in articles:
    print("Processing article:" + article.link)
    article.cont = get_article(article.link)
    article.buzz = filterWords("./datasets/buzzwords.txt", article.cont)
    article.emo = filterWords("./datasets/emotional_words.txt", article.cont)
    article.neg = filterWords("./datasets/negative-words.txt", article.cont)
    article.pos = filterWords("./datasets/positive-words.txt", article.cont)
    article.avgLen = wordLength(article.cont)
    article.opi = filterWords("./datasets/opinion.txt", article.cont)
    article.fac = filterWords("./datasets/fact_phrases.txt", article.cont)
    article.fPro = filterWords("./datasets/first_person.txt", article.cont)
    print("Processed article:" + article.link +" avg_len: " + str(article.avgLen))

Processing article:https://abort73.com/abortion/medical_testimony/
Processed article:https://abort73.com/abortion/medical_testimony/ avg_len: 6.5762081784386615
Processing article:https://abort73.com/abortion/prenatal_development/
Processed article:https://abort73.com/abortion/prenatal_development/ avg_len: 6.666666666666667
Processing article:https://abort73.com/abortion/personhood/
Processed article:https://abort73.com/abortion/personhood/ avg_len: 6.44874715261959
Processing article:https://abort73.com/abortion/abortion_techniques/
Processed article:https://abort73.com/abortion/abortion_techniques/ avg_len: 6.9061547836684944
Processing article:https://abort73.com/abortion/abortion_pictures/
Processed article:https://abort73.com/abortion/abortion_pictures/ avg_len: 6.246516613076099
Processing article:https://www.academia.org/oops-government-agency-allegedly-lost-gibill-com-rights/
Processed article:https://www.academia.org/oops-government-agency-allegedly-lost-gibill-com-rights/ av

Processed article:https://www.12news.com/article/news/nation-world/us-jobless-report-october-15/507-a77b846c-59b4-4aa9-91f2-539b13c9c2b7 avg_len: 6.072413793103448
Processing article:https://www.9news.com/article/news/local/mile-high-mornings/former-sky9-pilot-forced-to-evacuate-east-troublesome-fire-grand-county/73-2ba65b6d-ae0a-444b-b225-0b32fdc63b0d
Processed article:https://www.9news.com/article/news/local/mile-high-mornings/former-sky9-pilot-forced-to-evacuate-east-troublesome-fire-grand-county/73-2ba65b6d-ae0a-444b-b225-0b32fdc63b0d avg_len: 5.55
Processing article:https://www.9news.com/article/news/local/wildfire/cameron-peak-fire-october-21/73-f6e6f94b-ad10-45e8-9f59-26691b06a6bc
Processed article:https://www.9news.com/article/news/local/wildfire/cameron-peak-fire-october-21/73-f6e6f94b-ad10-45e8-9f59-26691b06a6bc avg_len: 5.913636363636364
Processing article:https://www.9news.com/article/news/local/wildfire/blm-land-closures-wildfire-risk/73-33f8c47a-52d7-4a8d-a7e6-a60884d668d

Processed article:https://achnews.org/2020/03/10/smartphone-addiction-changes-the-brain-to-resemble-drug-addicts/ avg_len: 5.3
Processing article:https://www.altnews.in/pm-modi-quotes-selective-data-to-paint-a-rosy-picture-of-indias-covid-19-response/
Processed article:https://www.altnews.in/pm-modi-quotes-selective-data-to-paint-a-rosy-picture-of-indias-covid-19-response/ avg_len: 5.668989547038327
Processing article:https://www.altnews.in/surendra-poonia-passes-off-unrelated-photo-as-teacher-recently-killed-in-france/
Processed article:https://www.altnews.in/surendra-poonia-passes-off-unrelated-photo-as-teacher-recently-killed-in-france/ avg_len: 5.645051194539249
Processing article:https://www.altnews.in/fact-check-did-congress-leader-aslam-khans-men-stopped-bhajan-at-durga-pandal-in-mumbai-malad/
Processed article:https://www.altnews.in/fact-check-did-congress-leader-aslam-khans-men-stopped-bhajan-at-durga-pandal-in-mumbai-malad/ avg_len: 5.693009118541034
Processing article:https:

Processed article:https://www.discovermagazine.com/health/to-stop-coronavirus-spread-well-need-new-testing-technology avg_len: 6.124567474048443
Processing article:https://www.ama-assn.org/press-center/ama-statements/statement-cdc-s-recommendation-public-cloth-masks
Processed article:https://www.ama-assn.org/press-center/ama-statements/statement-cdc-s-recommendation-public-cloth-masks avg_len: 6.5840867992766725
Processing article:https://insidescience.org/news/deep-ocean-currents-carry-plastic-microfibers-seafloor-hot-spots
Processed article:https://insidescience.org/news/deep-ocean-currents-carry-plastic-microfibers-seafloor-hot-spots avg_len: 6.123737373737374
Processing article:https://www.livescience.com/september-worlds-hottest-record.html
Processed article:https://www.livescience.com/september-worlds-hottest-record.html avg_len: 6.297429620563036
Processing article:https://www.livescience.com/turbulent-environment-human-evolution.html
Processed article:https://www.livescience.co