# Aim

In this notebook, we look at the trend over time of:
- keyword prominence in the press
- sentiment expressed toward the keyword (as manually annoted by researchers)

We will report both basic summary measures and charts. 

We will do it by keyword.

## Keyword prominence

A timeline of the covid19 pandemic is here: https://en.wikipedia.org/wiki/Timeline_of_the_COVID-19_pandemic_in_the_United_Kingdom
and here: https://inews.co.uk/news/politics/lockdown-rules-easing-coronavirus-uk-measures-lifted-when-start-453497

Key dates we will be looking at:

- (week including) 11-03-2020: "herd immunity" approach is mentioned on BBC, generating controversy. 
- (week including) 23-03-2020: the UK enters strict lockdown
- (week including) 10-05-2020: Govt's message changes from "stay at home" to "stay alert", and a roadmap to ease lockdown is issued

Note: our corpus consists of articles from the top-15 UK newspaper filtered on the basis of containing at least one of a set of pre-selected keywords ("pre-filtered articles" from now on).

#### Measures 

- **normalised keyword frequencies (week or fortnight)**: (nkf) a keyword's raw count of occurrences in a week (fortnight) devided by the total word count that week (fortnight) (across all published pre-filtered articles). Word count only include count of nouns.

- **relative document frequency (week or fortnight)**: (rdf) the number of pre-filtered articles in a week (fortnight) that contain the keyword devided by the total number of pre-filtered articles published that week (fortnight).

- **nkf x rdf**: nkf * rdf (our final metric)



## Sentiment

For each keyword occurrence, we extracted the sentence in which the keyword occurred ("opinoin context"). Sentiment expressed in the sentence toward the keyword was manually coded on a -2 to +2 range.

# Set up and get data

In [None]:
import os
import pickle

In [None]:
import numpy as np
import pandas as pd

In [None]:
import psutil

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
%matplotlib inline

In [None]:
import plotly.express as px
import chart_studio.plotly as py
import plotly.graph_objects as go
import plotly.offline as pyo
pyo.init_notebook_mode()

In [None]:
pd.set_option('display.max_colwidth', None)

### Constant

In [None]:
DIR_DATA = os.environ.get("DIR_DATA_INTERIM")

In [None]:
DIR_DATA_EXTRA = os.environ.get("DIR_DATA_EXTRA")

In [None]:
# prominence
term_freqs_nm = "kword_rawfreq_2W-MON.csv"
doc_freqs_nm = "kword_docfreq_2W-MON.csv"
metrics_nm = "kword_rfrdf_2W-MON.csv"

In [None]:
# sentiment
sentiment_nm = "preproc_kword_sent.pickle"

In [None]:
# keywords to be excluded because of low frequency in the corpus
EXCLUDE_KWORDS = ['behav_insight', 'behavioural_economist', 'behav_analysis', 'chater', 'american_behav_scientists']
NON_KWORDS = ['herd_immunity', 'behavioural_fatigue']

## Import data

In [None]:
doc_freqs = pd.read_csv(os.path.join(DIR_DATA, doc_freqs_nm))

In [None]:
term_freqs = pd.read_csv(os.path.join(DIR_DATA, term_freqs_nm))

In [None]:
metrics = pd.read_csv(os.path.join(DIR_DATA, metrics_nm))

In [None]:
with open(os.path.join(DIR_DATA, sentiment_nm), "rb") as input_file:
        sentiments = pickle.load(input_file)

## Exclude words

In [None]:
EXCLUDE_LIST = EXCLUDE_KWORDS + NON_KWORDS

In [None]:
doc_freqs.drop(EXCLUDE_LIST, axis=1, inplace=True)

In [None]:
term_freqs.drop(EXCLUDE_LIST, axis=1, inplace=True)

In [None]:
metrics = metrics[~metrics.kword.isin(EXCLUDE_LIST)].copy()

In [None]:
sentiments = sentiments[~sentiments.kword.isin(EXCLUDE_LIST)].copy()

In [None]:
term_freqs

## Preprocessing of prominence/frequency data

In [None]:
def wide2long(df: pd.DataFrame, new_value_var: str, time_var: str ='fortnight_starting', new_group_var:str = 'kword') -> pd.DataFrame:
    
    return pd.melt(df,
    id_vars=time_var,
    var_name=new_group_var,
    value_name=new_value_var)

In [None]:
doc_freqs

In [None]:
doc_freqs_l = wide2long(doc_freqs.drop("article_count", axis=1), new_value_var='doc_freq')


In [None]:
term_freqs_l = wide2long(term_freqs.drop("word_count", axis=1), new_value_var='term_freq')


In [None]:
prominence_df = metrics.merge(doc_freqs_l, on=['fortnight_starting', 'kword']).merge(term_freqs_l, on=['fortnight_starting', 'kword'])


In [None]:
prominence_df

In [None]:
prominence_df.drop("Unnamed: 0", axis=1, inplace=True)

#### Expressed normalised frequency (and so prominance) per 10,000 words

In [None]:
prominence_df['rkf'] = prominence_df.rkf * 10000

In [None]:
prominence_df['rkf*rdf'] = prominence_df['rkf*rdf'] * 10000

In [None]:
prominence_df

## Preprocessing of sentiment data

Only keep the relevant columns

In [None]:
cols_to_keep = ['pub_date_dt', 'kword', 'opinion_context_id', 'keyword_sentiment', 'refers_to_gov', 'gov_sentiment']

In [None]:
sentiments = sentiments[cols_to_keep].copy()

In [None]:
sentiments

Assign one unique id to each keyword-sentiment pair

In [None]:
sentiments['id'] = range(1, sentiments.shape[0]+1)

In [None]:
sentiments.id.count()

### Code sentiment scores as labels: 'pos' and 'neg'

In [None]:
int2labels = {
    -2: 'neg',
    -1: 'neg',
    0: 'neu',
    1: 'pos',
    2: 'pos'
}

In [None]:
sentiments['kword_sent_label'] = [int2labels.get(score) for score in sentiments.keyword_sentiment]

In [None]:
sentiments['gov_sent_label'] = [int2labels.get(score) for score in sentiments.gov_sentiment]

In [None]:
sentiments

## Aggregate sentiments

We will report:
- the number of positive vs. neutral vs. negative sentiments expressed in a fortnight toward a keyword
- the corresponding proportions
- the change in number from fornight-to-fortnight as % change

There are fortnights where very few sentences expression a sentiment toward a given keyword. So % changes must be interpreted carefully.

In [None]:
sentiments['kword_sent_cat'] = sentiments['kword_sent_label'].astype('category')

In [None]:
sent_counts = sentiments.set_index('pub_date_dt').groupby(['kword', 'kword_sent_cat', pd.Grouper(freq="2W-MON", closed="left", label="left")]).agg(
    {'id':"count",'refers_to_gov':'count'})

In [None]:
sent_counts

In [None]:
sent_counts.index.names = ['kword', 'kword_sent_cat', 'fortnight_starting']
sent_counts.rename(columns={'id': 'sentiment_count', 'refers_to_gov': 'refers_to_gov_count'}, inplace=True)

In [None]:
sent_counts

In [None]:
sent_counts = sent_counts.reset_index(level="kword_sent_cat").merge(sent_counts.groupby(['kword', 'fortnight_starting'], observed=False)['sentiment_count'].sum(),
                                                           left_index=True, right_index=True).rename(columns={'sentiment_count_x': 'sentiment_count', 
                                                                                                              'sentiment_count_y': 'tot_counts'})

Calculate proportions per keyword/week of each sentiment type.

In [None]:
sent_counts['prop_sentiments'] = round(sent_counts['sentiment_count'] / sent_counts['tot_counts'],2)

In [None]:
sent_counts

In [None]:
# reset axis
sent_counts.reset_index(["kword", "fortnight_starting"], inplace=True)

In [None]:
from datetime import datetime

In [None]:
date2num = {
    datetime.strptime('2020-01-27', '%Y-%m-%d'): -4, 
    datetime.strptime('2020-02-10', '%Y-%m-%d'): -3, 
    datetime.strptime('2020-02-24', '%Y-%m-%d'): -2, 
    datetime.strptime('2020-03-09', '%Y-%m-%d'): -1,
    datetime.strptime('2020-03-23', '%Y-%m-%d'): 0, 
    datetime.strptime('2020-04-06', '%Y-%m-%d'): 1, 
    datetime.strptime('2020-04-20', '%Y-%m-%d'): 2, 
    datetime.strptime('2020-05-04', '%Y-%m-%d'): 3, 
    datetime.strptime('2020-05-18', '%Y-%m-%d'): 4, 
    datetime.strptime('2020-06-01', '%Y-%m-%d'): 5, 
    datetime.strptime('2020-06-15', '%Y-%m-%d'): 6, 
    datetime.strptime('2020-06-29', '%Y-%m-%d'): 7
}

datestr2num = {
    '2020-01-27': -4, 
    '2020-02-10': -3, 
    '2020-02-24': -2, 
    '2020-03-09': -1,
    '2020-03-23': 0, 
    '2020-04-06': 1, 
    '2020-04-20': 2, 
    '2020-05-04': 3, 
    '2020-05-18': 4, 
    '2020-06-01': 5, 
    '2020-06-15': 6, 
    '2020-06-29': 7
}

In [None]:
sent_counts['fortnight_to_lockdown'] = [date2num.get(fortnight) for fortnight in sent_counts.fortnight_starting]

In [None]:
prominence_df['fortnight_to_lockdown'] = [datestr2num.get(fortnight) for fortnight in 
                                          prominence_df.fortnight_starting]

## Save data

In [None]:
#sent_counts.to_csv(os.path.join(DIR_DATA_EXTRA, "kwords_sentiments_fortnight.csv"))
#prominence_df.to_csv(os.path.join(DIR_DATA_EXTRA, "kwords_prominence_fortnight.csv"))

In [None]:
sent_counts

### Settings for charts and tables

In [None]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go


In [None]:
fortnights = ['2020-01-27', '2020-02-10', '2020-02-24', '2020-03-09','2020-03-23', '2020-04-06', '2020-04-20', 
 '2020-05-04', '2020-05-18', '2020-06-01', '2020-06-15', '2020-06-29']

In [None]:
fortnights_around_lockdown = [-4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7]

In [None]:
from typing import List

def draw_plot_combo(keyword:str,
                    x_axis_var = "fortnight_to_lockdown",
                    x_axis_tickvals = fortnights_around_lockdown,
                    x_axis_title="Time (lockdown at 0)",
                    y_axis_range = List[float],
                    y2_title_text="<b>Polarity of sentiment</b>",
                    y1_title_text="Salience",
                    font_size=28,
                    sent_df: pd.DataFrame = sent_counts, 
                    prominence_df: pd.DataFrame = prominence_df,
                    y2_showticklabels=True,
                    y1_showticklabels=True,
                    y1_color = "white", 
                    y2_color = "white",
                    x_showticklabels=True
             ):
    
    # subset datasets 
    sent_df = sent_df[(sent_df.kword == keyword)].copy()
    prominence_df = prominence_df[prominence_df.kword == keyword].copy()
    
    fig = make_subplots(specs=[[{"secondary_y": True}]])
    
    sent2fillcol = {
        "neg": "red",
        "neu": "white",
        "pos": "darkgreen"
    }
    
    bubble_fillcols = [sent2fillcol.get(sent) for sent in sent_df['kword_sent_cat']]
    
    sent2linecol = {
        "neg": "red",
        "neu": "black",
        "pos": "darkgreen"
    }

    bubble_linecols = [sent2linecol.get(sent) for sent in sent_df['kword_sent_cat']]
    
    # customise text of sentiment label
    sent_df['kword_sent_cat'] = [" " + str(label) for label in sent_df['kword_sent_cat']]
    sent_df['kword_sent_cat'] = sent_df['kword_sent_cat'].astype('category')
    
    
    trace1 = go.Scatter(
        x=prominence_df[x_axis_var], 
        y=prominence_df['salience'],
        line=dict(color='black'),
        name="Prominence (RKF * RDF)" 
    )
    
    trace2 = go.Scatter(
        x=sent_df[x_axis_var],
        y=sent_df['kword_sent_cat'],
        mode='markers',
        name="Polarity of sentiment",
        yaxis='y2',
        marker=dict(
            color=bubble_fillcols,
            line={"color":bubble_linecols},
            size=sent_df['sentiment_count'],
            sizemode='area',
            # sizeref=2.*max(sent_df['sentiment_count'])/(50.**2),
            sizeref=2.*20/(50.**2),
            sizemin=2,
        ))

    fig.add_trace(trace1)
    fig.add_trace(trace2,secondary_y=True)
    
    # Set x-axis title
    fig.update_xaxes(
        showgrid=False,
        zeroline=False,
        showline=True, linewidth=1, linecolor='black', mirror=True,
        # tickangle =-45,
        tickvals= x_axis_tickvals,
        title_text=x_axis_title,
    showticklabels=x_showticklabels)

    # Set y-axes titles
    fig.update_yaxes(title_text=y2_title_text, showgrid=False, zeroline=False, secondary_y=True,
                    showticklabels=y2_showticklabels,
                    tickfont=dict(color=y2_color))
    fig.update_yaxes(title_text=y1_title_text, range=y_axis_range, showgrid=False, zeroline=False, secondary_y=False,
                    showticklabels=y1_showticklabels,
                    tickfont=dict(color=y1_color))

    fig.update_shapes(dict(xref='x', yref='y'))
    fig.update_layout(height=700, width=1000,
                  #title_text=f'{keyword}',
                      showlegend=False,
                      paper_bgcolor='rgba(0,0,0,0)',
                      plot_bgcolor='rgba(0,0, 0,0)',
                      yaxis=dict(showline=True, linewidth=1, linecolor='black', mirror=True),
                      font=dict(
                        family="Helvetica",
                        size=font_size,
                        color="black"
                        ),
                      margin=dict(l=5, r=5, t=5, b=5),
                      shapes=[
        # 1st highlight during March 23 - May 10
        dict(
            type="rect",
            # x-reference is assigned to the x-values
            xref="x",
            # y-reference is assigned to the plot paper [0,1]
            yref="paper",
            x0=0,
            #x0="2020-03-23",
            y0=0,
            x1=3,
            #x1="2020-05-04",
            y1=1,
            fillcolor='rgba(30,30,30,0.4)', 
            opacity=0.2,
            layer="below",
            line_width=0)])

    return pyo.iplot(fig)

### Table of sentiment (function)

In [None]:
def table_sentiment(df: pd.DataFrame, kword:str) -> pd.DataFrame:
    
    fortnight_to_lockdown = [-4,-3,-2,-1,0,1,2,3,4,5,6,7]
    #assert all(['fortnight_starting', 'kword_sent_cat', 'sentiment_count', 'prop_sentiments', 'fortnight_to_lockdown']) in df.columns
    
    df = df.reset_index()
    pivott = df[df.kword == kword].pivot(index='fortnight_starting', 
                                         columns='kword_sent_cat', 
                                         values=['sentiment_count', 'prop_sentiments'])
    pivott.columns
    pivott.sentiment_count = pivott.sentiment_count.astype('int')
    
    pivott.set_index([pivott.index, fortnight_to_lockdown], inplace=True)
    pivott.index.names = ['fortnight_starting', 'fortnight_to_lockdown']
    
    return pivott

#### Crate and save table for all keywords

In [None]:
sent_table = pd.DataFrame()

In [None]:
for kw in sent_counts.kword.unique():
    subdata = table_sentiment(df=sent_counts, kword=kw)
    
    kw_list = [kw] * subdata.shape[0]
    subdata.set_index([subdata.index, kw_list], inplace=True)
    
    #subdata['kword'] = kw
    sent_table = sent_table.append(subdata)
    
    

In [None]:
#sent_table

In [None]:
sent_table.to_csv(os.path.join(DIR_DATA_EXTRA, "news_sentiment_table.csv"))

## Prepare prominence tables

In [None]:
prominence_df.columns

In [None]:
prominence_df.rename(columns={'rkf*rdf':'salience', 'rkf':'normTF', 
                             'rdf':'relDF', 'term_freq': 'TF', 'doc_freq':'DF'}, inplace=True)

In [None]:
prominence_df = prominence_df[['kword', 'fortnight_starting', 'fortnight_to_lockdown', 'salience', 'normTF', 
                               'relDF', 'TF', 'DF']].copy()

### Calculate the difference in salience between adjacent time points

In [None]:
prominence_df['salience_diff'] = prominence_df.groupby("kword")['salience'].diff(1)

In [None]:
prominence_df = prominence_df[['kword', 'fortnight_starting', 'fortnight_to_lockdown', 'salience', 'salience_diff', 'normTF', 
                               'relDF', 'TF', 'DF']]

In [None]:
# check
prominence_df[prominence_df.kword == "michie"]

In [None]:
max_salience = max(prominence_df[prominence_df.kword != 'behav_science'].salience) + 0.5

In [None]:
print(max_salience)

# Named key actors

Here we look at the main named key actors:
- halpern
- michie
- spi-b
- behavioural_insights_team / nudge_unit
- american_behavioural_scientist (thaler, sunstein, kahneman)

## Michie

### Prominence

In [None]:
prominence_df[prominence_df.kword == "michie"]

### Sentiment

In [None]:
table_sentiment(df=sent_counts, kword="michie")

In [None]:
draw_plot_combo(keyword="michie", y_axis_range=[-0.1, max_salience],
                x_axis_var = "fortnight_to_lockdown",
                    x_axis_tickvals = fortnights_around_lockdown,
                    x_axis_title="",
                    y2_title_text="",
                    y1_title_text="",
                    sent_df=sent_counts, 
                    prominence_df= prominence_df,
                    y2_color="white",
                    y1_color="black",
                    x_showticklabels=True)

## Halpern

### Prominence

In [None]:
prominence_df[prominence_df.kword == "halpern"]

### Sentiment

In [None]:
table_sentiment(df=sent_counts, kword="halpern")

In [None]:
draw_plot_combo(keyword="halpern", y_axis_range=[-0.1, max_salience],
                x_axis_var = "fortnight_to_lockdown",
                    x_axis_tickvals = fortnights_around_lockdown,
                    x_axis_title="",
                    y2_title_text="",
                    y1_title_text="",
                    sent_df=sent_counts, 
                    prominence_df= prominence_df,
                    y2_color="white",
                    y1_color="white",
                    x_showticklabels=True)

## Behavioural Insight Team / Nudge Unit 

### Prominence

In [None]:
prominence_df[prominence_df.kword == "behav_insights_team"]

### Sentiment

In [None]:
table_sentiment(df=sent_counts, kword="behav_insights_team")

In [None]:
draw_plot_combo(keyword="behav_insights_team", y_axis_range=[-0.1, max_salience],
                x_axis_var = "fortnight_to_lockdown",
                    x_axis_tickvals = fortnights_around_lockdown,
                    x_axis_title="",
                    y2_title_text="",
                    y1_title_text="",
                    sent_df=sent_counts, 
                    prominence_df= prominence_df,
                    y2_color="white",
                    y1_color="white",
                    x_showticklabels=True)

## SPI-B

### Prominence

In [None]:
prominence_df[prominence_df.kword == "spi-b"]

### Sentiment

In [None]:
table_sentiment(df=sent_counts, kword="spi-b")

In [None]:
draw_plot_combo(keyword="spi-b", y_axis_range=[-0.1, max_salience],
                x_axis_var = "fortnight_to_lockdown",
                    x_axis_tickvals = fortnights_around_lockdown,
                    x_axis_title="",
                    y2_title_text="",
                    y1_title_text="",
                    sent_df=sent_counts, 
                    prominence_df= prominence_df,
                    y2_color="white",
                    y1_color="white",
                    x_showticklabels=True)

## American behavioural scientists 

### Prominence

In [None]:
#prominence_df[prominence_df.kword == "american_behav_scientists"]

## Sentiment

In [None]:
#table_sentiment(df=sent_counts, kword="american_behav_scientists")

In [None]:
#draw_plot_combo(keyword="american_behav_scientists", y_axis_range=[-.00003, 0.0017])

# General actors

## Behavioural scientist

### Prominence

In [None]:
prominence_df[prominence_df.kword == "behav_scientist"]

### Sentiment

In [None]:
table_sentiment(df=sent_counts, kword="behav_scientist")

In [None]:
draw_plot_combo(keyword="behav_scientist", y_axis_range=[-0.1, max_salience],
                x_axis_var = "fortnight_to_lockdown",
                    x_axis_tickvals = fortnights_around_lockdown,
                    x_axis_title="",
                    y2_title_text="",
                    y1_title_text="",
                    sent_df=sent_counts, 
                    prominence_df= prominence_df,
                    y2_color="white",
                    y1_color="white",
                    x_showticklabels=True)

In [None]:
prominence_df.kword.unique()

## Psychologist

### Prominence

In [None]:
prominence_df[prominence_df.kword == "psychologist"]

### Sentiment

In [None]:
table_sentiment(df=sent_counts, kword="psychologist")

In [None]:
draw_plot_combo(keyword="psychologist", y_axis_range=[-0.1, max_salience],
               x_axis_var = "fortnight_to_lockdown",
                    x_axis_tickvals = fortnights_around_lockdown,
                    x_axis_title="",
                    y2_title_text="",
                    y1_title_text="",
                    sent_df=sent_counts, 
                    prominence_df= prominence_df,
                    y2_color="white",
                    y1_color="white",
                    x_showticklabels=True)

# Disciplines

## Behavioural science

### Prominence

In [None]:
prominence_df[prominence_df.kword == "behav_science"]

### Sentiment

In [None]:
table_sentiment(df=sent_counts, kword="behav_science")

In [None]:
draw_plot_combo(keyword="behav_science", y_axis_range=[-0.1, 38],
               x_axis_var = "fortnight_to_lockdown",
                    x_axis_tickvals = fortnights_around_lockdown,
                    #x_axis_title="Time of 2-week periods (gray area covers strict lockdown)",
                    y2_title_text="Sentiment",
                    y1_title_text="Salience",
                font_size=28,
                    sent_df=sent_counts, 
                    prominence_df= prominence_df,
                    y2_showticklabels=True,
                    y1_showticklabels=True,
                    y1_color = "black", 
                    y2_color = "black",
                    x_showticklabels=True)

## Psychology

### Prominence

In [None]:
prominence_df[prominence_df.kword == "psychology"]

### Sentiment

In [None]:
table_sentiment(df=sent_counts, kword="psychology")

In [None]:
draw_plot_combo(keyword="psychology", y_axis_range=[-0.1, max_salience],
               x_axis_var = "fortnight_to_lockdown",
                    x_axis_tickvals = fortnights_around_lockdown,
                    x_axis_title="",
                    y2_title_text="",
                    y1_title_text="",
                    sent_df=sent_counts, 
                    prominence_df= prominence_df,
                    y2_color="white",
                    y1_color="white",
                    x_showticklabels=True)

## Behavioural economics

### Prominence

In [None]:
prominence_df[prominence_df.kword == "behav_econ"]

### Sentiment

In [None]:
table_sentiment(df=sent_counts, kword="behav_econ")

In [None]:
draw_plot_combo(keyword="behav_econ", y_axis_range=[-0.1, max_salience],
               x_axis_var = "fortnight_to_lockdown",
                    x_axis_tickvals = fortnights_around_lockdown,
                    x_axis_title="",
                    y2_title_text="",
                    y1_title_text="",
                    sent_df=sent_counts, 
                    prominence_df= prominence_df,
                    y2_color="white",
                    y1_color="white",
                    x_showticklabels=True)

# Key concepts and techniques

## Nudge

### Prominence

In [None]:
prominence_df[prominence_df.kword == "nudge"]

### Sentiment

In [None]:
table_sentiment(df=sent_counts, kword="nudge")

In [None]:
draw_plot_combo(keyword="nudge", y_axis_range=[-0.1, max_salience],
               x_axis_var = "fortnight_to_lockdown",
                    x_axis_tickvals = fortnights_around_lockdown,
                    x_axis_title="",
                    y2_title_text="",
                    y1_title_text="",
                    sent_df=sent_counts, 
                    prominence_df= prominence_df,
                    y2_color="white",
                    y1_color="white",
                    x_showticklabels=True)

## Behavioural change

### Prominence

In [None]:
prominence_df[prominence_df.kword == "behav_change"]

### Sentiment

In [None]:
table_sentiment(df=sent_counts, kword="behav_change")

In [None]:
draw_plot_combo(keyword="behav_change", y_axis_range=[-0.1, max_salience],
               x_axis_var = "fortnight_to_lockdown",
                    x_axis_tickvals = fortnights_around_lockdown,
                    x_axis_title="",
                    y2_title_text="",
                    y1_title_text="",
                    sent_df=sent_counts, 
                    prominence_df= prominence_df,
                    y2_color="white",
                    y1_color="black",
                    x_showticklabels=True)

## Muses

### Kendall's Tau

Correlatio between the corpus sub-periods (time as sequence) and a keyword'' salience:

 - 0 = absence of trend
 - 1 or -1 passage of time correlated with increase / decreasse in salience

In [None]:
from scipy.stats import kendalltau 

In [None]:
michie_salience = prominence_df[prominence_df.kword == "michie"].salience.to_list()

In [None]:
time_series = list(range(1, prominence_df.fortnight_to_lockdown.nunique() + 1))

In [None]:
michie_salience

In [None]:
time_series

In [None]:
# Calculating Kendall Rank correlation 
corr, _ = kendalltau(time_series, michie_salience) 
print(f'Kendall Rank correlation: {corr}') 

In [None]:
halpern_salience = prominence_df[prominence_df.kword == "halpern"].salience.to_list()

In [None]:
halpern_salience

In [None]:
# Calculating Kendall Rank correlation 
corr, _ = kendalltau(time_series, halpern_salience) 
print(f'Kendall Rank correlation: {corr}') 

In [None]:
beh_science_salience = prominence_df[prominence_df.kword == "behav_science"].salience.to_list()

In [None]:
# Calculating Kendall Rank correlation 
corr, _ = kendalltau(time_series, beh_science_salience) 
print(f'Kendall Rank correlation: {corr}') 

In [None]:
spib_salience = prominence_df[prominence_df.kword == "spi-b"].salience.to_list()

In [None]:
spib_salience

In [None]:
# Calculating Kendall Rank correlation 
corr, _ = kendalltau(time_series, spib_salience) 
print(f'Kendall Rank correlation: {corr}') 

In [None]:
behav_science_salience = prominence_df[prominence_df.kword == "behav_science"].salience.to_list()
behav_science_salience