### Library import

In [5]:
from matplotlib import pyplot as plt
import pandas as pd
import glob
import os

from string import punctuation
from nltk.corpus import stopwords
from collections import defaultdict, Counter
from wordcloud import WordCloud

!pip install textblob
from textblob import TextBlob

import warnings
warnings.filterwarnings('ignore')



In [7]:
df = pd.read_csv('data/news_cleaned.csv')
#df['tokens'] = df['tokens'].str.replace("'", "")
#df['tokens_no_climate'] = df['tokens_no_climate'].str.replace("'", "")

In [8]:
len(df)

90863

### Sentiment analysis with Textblob

Textblob polarity scoring is between [-1.0 to 1.0] where -1.0 indicates negative sentiment and 1.0 indicates positive sentiment.

Textblob subjectivity scoring is between [0.0 to 1.0], where 0.0 is very objective, and 1.0 is very subjective. 

In [9]:
df['polarity'] = df['snippet'].apply(lambda x: TextBlob(x).polarity)
df['subjectivity'] = df['snippet'].apply(lambda x: TextBlob(x).subjectivity)

In [15]:
# make binary
def get_polarity(num):
    if num <0:
        return 'negative'
    else:
        return 'positive'
    
def get_subjectivity(num):
    if num < .50:
        return 'objective'
    else:
        return 'subjective'
    
df['polarity_binary'] = df['polarity'].apply(get_polarity)
df['subjectivity_binary'] = df['subjectivity'].apply(get_subjectivity)

In [16]:
df[['matchdatetime', 'station', 'snippet', 'polarity','polarity_binary', 'subjectivity','subjectivity']].sample(n=5)

Unnamed: 0,matchdatetime,station,snippet,polarity,polarity_binary,subjectivity,subjectivity.1
83263,2019-12-23 04:03:51,BBCNEWS,"from the weekend's bushfires, and while the we...",-0.103935,negative,0.362963,0.362963
84708,2019-06-18 11:04:13,BBCNEWS,londoners will be affected by concentrated fli...,-0.152273,negative,0.509091,0.509091
345,2013-01-15 13:13:18,FOXNEWS,black workers and the black family. i would ch...,0.177778,positive,0.411111,0.411111
86254,2009-10-23 05:50:58,MSNBC,legislation to try to stop climate change but ...,-0.433333,negative,0.6,0.6
24163,2016-11-19 08:18:36,FOXNEWS,warming. clean air and clean water. when we ta...,0.146667,positive,0.58,0.58


In [12]:
# see 10 most subjective snippets
s= df.nlargest(10, 'subjectivity')[['snippet','station']].index
for index in s:
    print(df.loc[index, 'snippet'])

environmental catastrophe in another part of the world. so far, administration officials are not backing away from nuclear. which they said will reduce emissions and prevent climate
strict greenhouse gas reduction law. prop 23 would suspend that law and that, of course, would be awesome for companies that make a lot of money by making a lot of pollution. 97% of the funding for prop 23 so far comes from oil and chemical companies, including a
and state chapters of the naacp. the letters urged perriello to vote against climate change legislation. the letters were fake. tea party groups camped out mr. perriello's virginia office, one
targeting climate change, is there a bit of hypocrisy of it? i disagree with it. i think you find it funny. he made hundred million. he is 3500 votes of being
producing countries? yes, it is. so you al gore are doing business with this country. [ laughter ] that's enabling your ultimate foe, climate change? i think i understand what you are getting at. [ laug

In [13]:
# see 10 most negative snippets
s= df.nsmallest(10, 'polarity')[['snippet','station']].index
for index in s:
    print(df.loc[index, 'snippet'])

by 2050, countries like that might not exist. closer to home, things like wildfires, devastating hurricanes, food shortages, migrations, they're all a host of awful things associated with climate change. we're already seeing the beginnings of this now. and this report just underscores
report warns of devastating effects from climate change. president trump suggested that he doesn't believe it, what's your response to the president? look, the climate den
we're going to have to build shelters so people can escape when these terrible fires get out of hand. and yes, we're going to have to deal with climate change. all of that. reporter: meanwhile, 145 evacuees and workers in shelters around butte county are suffering from norovirus.
published scientific literature. so what this report will tell us is that we are seeing the impact of climate change on our coastlines here in the united states, in terms of devastating superstorms. you add a foot of sea level rise and we could see six feet to


In [14]:
# see 10 most positive snippets
s = df.nlargest(10, 'polarity')[['snippet','station']].index
for index in s:
    print(df.loc[index, 'snippet'])

issues, or pressing concerns - whether it be climate change, animal exploitation or refugees. at the forefront of films addressing the refugee crisis was 80-year-old legendary actress, vanessa redgrave,
some scientists have called climate change the greatest threat that humanity changes. president trump's defense secretar james mattis called it a challenge to national security. the president said he would make
that is not all. causation is the republican resolution that climate change is happening and we need to find a solution. while she has had an impressive start in congress, she does not plan to be there forever. i do think institutionally congress benefits from having a
candidates. by the way, in massachusetts they say the shape of the field determines the winner. here's the people that look like they may run against her. maybe ed markey, very impressive senior who did all this mark pushing the climate change and
truly greatest weapons. but the speech had nothing to say about clim

In [None]:
# calculate average polarity and subjectivity for each station
station_stats = df.groupby('station').agg({'polarity': 'mean', 'subjectivity': 'mean'})

# rename column
station_stats.columns = ['Average Polarity', 'Average Subjectivity']
print(station_stats)

         Average Polarity  Average Subjectivity
station                                        
BBCNEWS          0.088726              0.395586
CNN              0.097142              0.397627
FOXNEWS          0.075069              0.369970
MSNBC            0.099143              0.396647


When we examine the polarity and subjectivity scoring, we see that:
* There is overall positive sentiment for all four stations, with little to none significant differences between them. 
* There is an overall objective reporting of climate change for all news stations with little significant differences between them. 




In [17]:
df['polarity_binary'].value_counts()

polarity_binary
positive    70137
negative    20726
Name: count, dtype: int64

In [25]:
p_counts = df.groupby('station')['polarity_binary'].value_counts()
p_perc = df.groupby('station')['polarity_binary'].value_counts()/len(df)

polarity_table = pd.DataFrame({
    'counts':p_counts,
    'percentage': p_perc
})

polarity_table

Unnamed: 0_level_0,Unnamed: 1_level_0,counts,percentage
station,polarity_binary,Unnamed: 2_level_1,Unnamed: 3_level_1
BBCNEWS,positive,17533,0.192961
BBCNEWS,negative,5160,0.056789
CNN,positive,14729,0.162101
CNN,negative,4011,0.044143
FOXNEWS,positive,17767,0.195536
FOXNEWS,negative,6070,0.066804
MSNBC,positive,20108,0.2213
MSNBC,negative,5485,0.060366


In [19]:
df['subjectivity_binary'].value_counts()

subjectivity_binary
objective     61377
subjective    29486
Name: count, dtype: int64

In [27]:
s_counts = df.groupby('station')['subjectivity_binary'].value_counts()
s_perc = df.groupby('station')['subjectivity_binary'].value_counts()/len(df)

subjectivity_table = pd.DataFrame({
    'counts':s_counts,
    'percentage': s_perc
})

subjectivity_table

Unnamed: 0_level_0,Unnamed: 1_level_0,counts,percentage
station,subjectivity_binary,Unnamed: 2_level_1,Unnamed: 3_level_1
BBCNEWS,objective,15160,0.166845
BBCNEWS,subjective,7533,0.082905
CNN,objective,12320,0.135589
CNN,subjective,6420,0.070656
FOXNEWS,objective,16938,0.186413
FOXNEWS,subjective,6899,0.075927
MSNBC,objective,16959,0.186644
MSNBC,subjective,8634,0.095022
