# Sentiment Analyzer for News Quotes

1. Read cleaned data into a pandas dataframe
2. Pass quotes from each article into sentiment analyzer
3. Save output into new columns 'negative', 'neutral', 'positive', 'compound'
4. Save output into new excel sheet with two sheets, one for quotes and one for non-quotes

In [None]:
# run this code if connecting to a Google drive
from google.colab import drive

drive.mount('/content/drive')

In [1]:
!pip install vaderSentiment

Collecting vaderSentiment
  Downloading vaderSentiment-3.3.2-py2.py3-none-any.whl.metadata (572 bytes)
Downloading vaderSentiment-3.3.2-py2.py3-none-any.whl (125 kB)
   ---------------------------------------- 126.0/126.0 kB 2.5 MB/s eta 0:00:00
Installing collected packages: vaderSentiment
Successfully installed vaderSentiment-3.3.2


DEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 24.1 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063

[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import pandas as pd
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

## Extracting Data from Excel Files

In [3]:
# replace with quotes_input.xlsx
fp = 'C:\Maite\MOD\projects\Monika_Bednarek\Evaluation_quotes\Data\CBC_qt_output\jan_CBC_news_qt_clean.xlsx'

quotes_df = pd.read_excel(fp, usecols = ["text_id", "text_name", "quote", "speaker", "verb"])

non_quotes_df = pd.read_excel(fp, usecols = ["text_id", "text_name", "non_quoted_text"])

In [4]:
# add new columns to dataframes
quotes_df['negative'] = pd.Series(dtype='float')
quotes_df['neutral'] = pd.Series(dtype='float')
quotes_df['positive'] = pd.Series(dtype='float')
quotes_df['compound'] = pd.Series(dtype='float')

non_quotes_df['negative'] = pd.Series(dtype='float')
non_quotes_df['neutral'] = pd.Series(dtype='float')
non_quotes_df['positive'] = pd.Series(dtype='float')
non_quotes_df['compound'] = pd.Series(dtype='float')

## Running Quotes through VADER

Vader Documentation: https://github.com/cjhutto/vaderSentiment

Guide to using Vader: https://medium.com/@rslavanyageetha/vader-a-comprehensive-guide-to-sentiment-analysis-in-python-c4f1868b0d2e

Vader sentiment analyzer returns a dictionary of sentiment intensity scores for
a particular text input with the following sentiments: negative, neutral,
positive, and compound for overall sentiment intensity. The negative, neutral,
and positive scores have a value from 0 to 1 and compound scores have a
value from -1 to 1, with -1 indicating entirely negative, 0 indicating
entirely neutral, and 1 indicating entirely positive.


In [5]:
# helper function to extract scores for each story
# args: dataframe, column name of text to be analyzed as a string
def get_sentiment_score(df, col):

  for index, row in df.iterrows():
    analyzer = SentimentIntensityAnalyzer()
    score = analyzer.polarity_scores(df[col][index])

    df.loc[index, 'negative'] = score['neg']
    df.loc[index, 'neutral'] = score['neu']
    df.loc[index, 'positive'] = score['pos']
    df.loc[index, 'compound'] = score['compound']

In [6]:
get_sentiment_score(quotes_df, 'quote')

In [7]:
quotes_df.head()

Unnamed: 0,text_id,text_name,quote,speaker,verb,negative,neutral,positive,compound
0,0000aa72772aa87768a64b07ad46b2b2,text4186,the supports available to parents have been re...,Regev\nRegev\nCraigen Ecsy\nCraigen Etsy\nCrai...,said\nsaid\nsays\nsaid\nsaid\nsaid\nsaid\nsaid...,0.197,0.757,0.046,-0.9804
1,00037f2c22648f99e666262e78237795,text3794,Wednesday's order from the 6th U.S. Circuit Co...,The Detroit News\nThe suit\nSteeh\nThe appeals...,reported\nclaims\nsaid\nsaying\nsaid\nsaying\n...,0.035,0.859,0.106,0.9422
2,000a58dd6ee070d2490a2c029ce5e459,text3135,Mahomes underwent an MRI exam Sunday to better...,the person\nMRI\nMahomes\nMahomes\nMahomes\nKa...,told\nconfirmed\nsaid\nsaid\nsaid\nsaid\nsaid\...,0.044,0.808,0.148,0.9811
3,002a0b5bca38ecf64277e8efab0fab8b,text896,"""Maybe we use it for a couple of weeks, but\nt...",Bowen\nBowen\nAnyone with a newspaper subscrip...,said\nsaid\ntell\naccording to\nsaid\npredicts...,0.087,0.82,0.094,0.8682
4,0031f692b335fa20e28f398813f73afd,text4277,"that based on preliminary evidence, five of th...","She\nMonica Pirous, director of child, family ...",said\nsaid\nsaid\nsaid\nsaid\nsaid\nsaid\nsaid...,0.083,0.823,0.094,0.057


In [8]:
get_sentiment_score(non_quotes_df, 'non_quoted_text')

In [9]:
non_quotes_df.head()

Unnamed: 0,text_id,text_name,non_quoted_text,negative,neutral,positive,compound
0,0000aa72772aa87768a64b07ad46b2b2,text4186,This story is part of Amy Bell's Parental Guid...,0.147,0.715,0.137,-0.8884
1,00037f2c22648f99e666262e78237795,text3794,A U.S. federal appeals court has ordered a Det...,0.03,0.958,0.012,-0.4019
2,000a58dd6ee070d2490a2c029ce5e459,text3135,Patrick Mahomes sustained a right high ankle s...,0.056,0.881,0.063,0.4338
3,002a0b5bca38ecf64277e8efab0fab8b,text896,"When Mississauga, Ont.-based money coach Vanes...",0.073,0.863,0.064,-0.8461
4,0031f692b335fa20e28f398813f73afd,text4277,"Residents of Hay River, N.W.T., are answering ...",0.071,0.863,0.066,-0.0978


In [10]:
# create a third dataframe to directly compare compound scores between quotes and non quotes
scores_comp_df = pd.merge(quotes_df[['text_id', 'text_name', 'compound']],
                          non_quotes_df[['text_id', 'text_name', 'compound']],
                          on=['text_id', 'text_name'],
                          suffixes=('_quotes', '_non_quotes'))

scores_comp_df.rename(columns={'compound_quotes': 'quote_score', 'compound_non_quotes': 'non_quote_score'}, inplace=True)

scores_comp_df.head()

Unnamed: 0,text_id,text_name,quote_score,non_quote_score
0,0000aa72772aa87768a64b07ad46b2b2,text4186,-0.9804,-0.8884
1,00037f2c22648f99e666262e78237795,text3794,0.9422,-0.4019
2,000a58dd6ee070d2490a2c029ce5e459,text3135,0.9811,0.4338
3,002a0b5bca38ecf64277e8efab0fab8b,text896,0.8682,-0.8461
4,0031f692b335fa20e28f398813f73afd,text4277,0.057,-0.0978


## Write Output to New Workbook

In [11]:
!pip install xlsxwriter



DEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 24.1 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063

[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [12]:
# replace with 'quotes_sentiment.xlsx'
output = 'C:\Maite\MOD\projects\Monika_Bednarek\Evaluation_quotes\Data\CBC_sentiment_output\jan_CBC_news_sentiment.xlsx'

# create excel writer object to initialize new workbook
writer = pd.ExcelWriter(output, engine="xlsxwriter")

# write dataframes to different worksheets
quotes_df.to_excel(writer, sheet_name="quotes", index=False)
non_quotes_df.to_excel(writer, sheet_name="non_quotes", index=False)
scores_comp_df.to_excel(writer, sheet_name="scores_comp", index=False)

# close the excel writer and output file
writer.close()