# Sentiment and Time plots

The idea is to check how the accumulated sentiment of quotes on movies change over time. The output is thus mainly interactive animated plots made with `plotly`.

We take inspiration from the way [this article/post](#https://towardsdatascience.com/how-to-produce-an-animated-bar-plot-in-plotly-using-python-2b5b360492f8) uses plotly express to generate animated bar plots that changes based on year. We will however modify it to our approach and start by expressing the approach using auxiliary functions, that are import from the auxiliary python script, `plotly_aux.py`


In [1]:
import os
import pandas as pd

from plotly_aux import * 
from raceplotly.plots import barplot

Load the sentiment data from the pickle containing all types of sentiment scores.

In [2]:
# Specifying filename and directory
data_dir = os.getcwd() + os.sep + 'data'
filepath = rf"{data_dir}{os.sep}Quotebank_sentiment.pkl"

df_Quotebank = pd.read_pickle(filepath)
df_Quotebank.head()

Unnamed: 0,quotation,speaker,qids,date,numOccurrences,probas,urls,movie,shared_ID,AFINN_label,AFINN_score,VADER_label,VADER_score,BERT_label,BERT_score,positive_BERT_score,scaledReverted_BERT_score
0,Is Ferguson like Mockingjay?,Laci Green,[Q16843606],2015-11-15,1,"[[Laci Green, 0.9013], [None, 0.0987]]",[http://www.dailykos.com/story/2015/11/15/1450...,The Hunger Games: Mockingjay - Part 2,1751,POSITIVE,0.5,POSITIVE,0.3612,NEGATIVE,0.989802,0.010198,-0.541032
1,I want to clarify my interview on the `Charlie...,George Lucas,"[Q38222, Q1507803]",2015-12-31,7,"[[George Lucas, 0.5327], [None, 0.4248], [Char...",[http://www.escapistmagazine.com/news/view/165...,Star Wars: Episode VII - The Force Awakens,6724,POSITIVE,0.165563,POSITIVE,0.991,POSITIVE,0.999293,0.999293,0.787788
2,Is Daredevil joining the Avengers for Infinity...,Scott Davis,"[Q1373440, Q7436227, Q7436228, Q7436226, Q1619...",2015-12-10,2,"[[None, 0.4806], [Scott Davis, 0.4017], [Antho...",[http://www.flickeringmyth.com/2015/12/is-dare...,Avengers: Age of Ultron,692,NEGATIVE,-0.153846,NEGATIVE,-0.6369,NEGATIVE,0.833872,0.166128,-0.208302
3,"They were saying, `Well, since when has Star W...",J.J. Abrams,[Q188137],2015-12-21,1,"[[J.J. Abrams, 0.5868], [None, 0.2584], [Lupit...",[http://rssfeeds.usatoday.com/~/129385923/0/us...,Star Wars: Episode VII - The Force Awakens,2394,POSITIVE,0.0,NEGATIVE,-0.3612,NEGATIVE,0.991336,0.008664,-0.559515
4,You meet new characters and you learn about Ha...,Kevin Feige,[Q515161],2015-05-06,1,"[[Kevin Feige, 0.9108], [None, 0.0782], [Scott...",[http://www.digitaltrends.com/movies/ant-man-m...,Avengers: Age of Ultron,8789,NEGATIVE,-0.011111,POSITIVE,0.1901,POSITIVE,0.999218,0.999218,0.776419


We now want to add a simpler time-index attribute to the data - the year-month pair (as strings) related to each quote will be used as a time-index. This is done as we want an interactive animation (with plotly express). The year-month pair is our initial way of dealing with the time series related to quotes about movies at it is neither trivial (like year) nor super complex as accessing each individual date will be. 

In [3]:
df_Quotebank['year-month'] = [("-").join(date.split("-")[:2]) for date in df_Quotebank['date']]
df_Quotebank['year-month'].unique()[0]

'2015-11'

Investigating the running average of weighted daily average BERT sentiment. The functions has a parameter for subsampling and we start by investigating the quotes related to 10 subsampled movies (where replacement is allowed) - then we do it for all movies.

In [4]:
df_Quotebank['positive_BERT_score'] = (df_Quotebank.positive_BERT_score - 0.5) * 2

# specifying parameters
N_samples = 10
dataset = 'Quotebank'
attribute = 'positive_BERT_score'
y_label = 'sentiment (BERT)'
time_attribute = 'year-month'
title = f'Sentiment on Quotes evolving through time'

save_fig = False

# creating dataframes
df_Quotebank_wrangled = wrangleData(df_Quotebank, time_attribute=time_attribute)
df_plot, df_sample= createPlotDF(df_Quotebank_wrangled, attribute=attribute, dataset=dataset, time_attribute=time_attribute, y_label=y_label, N_samples=N_samples)
    
# plotting figure
fig = animatedBarPlot(df_plot, y_label=y_label, time_attribute=time_attribute, speed=0.4, title=title, save_fig=save_fig)

  0%|          | 0/8 [00:00<?, ?it/s]

Now a similar investigation with all movies included is conducted.

In [5]:
# specifying parameters
N_samples = None
dataset = 'Quotebank'
attribute = 'positive_BERT_score'
y_label = 'sentiment'
time_attribute = 'year-month'
title = ''# f'Sentiment on Quotes evolving through time'

save_fig = True

# creating dataframes
df_Quotebank_wrangled = wrangleData(df_Quotebank, time_attribute=time_attribute)
df_plot, df_sample= createPlotDF(df_Quotebank_wrangled, attribute=attribute, dataset=dataset, time_attribute=time_attribute, y_label=y_label, N_samples=N_samples)
    
# plotting figure
fig = animatedBarPlot(df_plot, y_label=y_label, time_attribute=time_attribute, speed=0.4, title=title, save_fig=save_fig)

  0%|          | 0/56 [00:00<?, ?it/s]

saved html version of plot to: C:\Users\Albert Kjøller\Documents\EPFL\Courses\CS-401_ADA\ada-2021-project-f-jab\exploratory\plotlyplots


Now we do the same for the VADER sentiments. First for a subsample of 10, then for all movies.

In [6]:
# specifying parameters
N_samples = 10
dataset = 'Quotebank'
attribute = 'VADER_score'
y_label = 'sentiment (VADER)'
time_attribute = 'year-month'
title = f'Evolving {y_label}'

save_fig = False

# creating dataframes
df_Quotebank_wrangled = wrangleData(df_Quotebank, time_attribute=time_attribute)
df_plot, df_sample= createPlotDF(df_Quotebank_wrangled, attribute=attribute, dataset=dataset, time_attribute=time_attribute, y_label=y_label, N_samples=N_samples)
    
# plotting figure
fig = animatedBarPlot(df_plot, y_label=y_label, time_attribute=time_attribute, speed=0.4, title=title, save_fig=save_fig)

  0%|          | 0/8 [00:00<?, ?it/s]

In [7]:
# specifying parameters
N_samples = None
dataset = 'Quotebank'
attribute = 'VADER_score'
y_label = 'sentiment (VADER)'
time_attribute = 'year-month'
title = f'Running mean of {y_label}'

save_fig = False

# creating dataframes
df_Quotebank_wrangled = wrangleData(df_Quotebank, time_attribute=time_attribute)
df_plot, df_sample= createPlotDF(df_Quotebank_wrangled, attribute=attribute, dataset=dataset, time_attribute=time_attribute, y_label=y_label, N_samples=N_samples)
    
# plotting figure
fig = animatedBarPlot(df_plot, y_label=y_label, time_attribute=time_attribute, speed=0.4, title=title, save_fig=save_fig)

  0%|          | 0/56 [00:00<?, ?it/s]