# Notebook: Metaphor analysis

This notebook is meant to analyse the metaphors detected by the REGEX and POS approaches.

The analysis will focus on:
- frequency of the word "inflation" related to the inflation rate
- Sentiment analysis of the metaphors flagged
- Poisson regression, with the metaphors sentiment score 

In [9]:
import re
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
from ecbdata import ecbdata

tqdm.pandas()

#ECB Color
color = (17/255, 49/255, 147/255)

In [None]:
# Target words:
words_to_match = ["inflation","deflation","inflationary","desinflationary","hyperinflation","disinflation"]

In [10]:
# Functions

In [11]:
# Import dataset containing the interveiw data
df = pd.read_csv('data_complete.csv')

# Convert 'date' column to datetime format
df['Date'] = pd.to_datetime(df['Date'])

# For computation speed, limit the data to the last 50 rows:
df = df.tail(50)

df.head()

Unnamed: 0,Date,Media,Member,Link,Information,Questions,Answers,list_regex,list_regex_reduced,list_regex_len,list_regex_reduced_len,text_length,pos_metaphors,pos_metaphors_len
0,2005-12-23,Interview with Der Spiegel,Jean-Claude Trichet,https://www.ecb.europa.eu/press/inter/date/200...,Information not found,"SPIEGEL: Monsieur Trichet, any concrete uttera...",The publication of the translation was authori...,['we will in the future take the decision that...,['we will in the future take the decision that...,3,3,9371,[],0
1,2005-12-19,Interview with Hospodářské Noviny,Otmar Issing,https://www.ecb.europa.eu/press/inter/date/200...,Information not found,The new EU member states want to adopt the eur...,These questions are all closely related to eac...,['we be not confront with deflation but with i...,['we be not confront with deflation but with i...,1,1,3209,[],0
2,2005-12-19,Interview with Financial Times and Financial T...,Lucas Papademos,https://www.ecb.europa.eu/press/inter/date/200...,Information not found,"Mr Papademos, you have responsibility as ECB V...",A comparison of the risks involved when short-...,['if longterm interest rate remain at a low le...,['if longterm interest rate remain at a low le...,11,10,16231,"[('low', 'inflation'), ('current', 'inflation')]",2
3,2005-12-15,Interview with Paris Match,Jean-Claude Trichet,https://www.ecb.europa.eu/press/inter/date/200...,Information not found,Paris Match. After two uneventful years at the...,The publication of the translation was authori...,['you be exaggerate the increase in the cost o...,['you be exaggerate the increase in the cost o...,5,4,8186,[],0
4,2005-12-09,Interview in Il Giornale,Lorenzo Bini Smaghi,https://www.ecb.europa.eu/press/inter/date/200...,Information not found,"However, Europe’s politicians, with few except...","By Angelo Allegri, our correspondent in Frankf...",['the rise have help to keep inflation expecta...,['the rise have help to keep inflation expecta...,7,7,6765,[],0


In [12]:
print("Shape of the dataset before dropping missing values: ", df.shape)
df.isna().sum()

Shape of the dataset before dropping missing values:  (519, 14)


Date                       0
Media                      0
Member                     0
Link                       0
Information                0
Questions                 51
Answers                   10
list_regex                 0
list_regex_reduced         0
list_regex_len             0
list_regex_reduced_len     0
text_length                0
pos_metaphors              0
pos_metaphors_len          0
dtype: int64

In [14]:
print("Shape of the dataset after dropping missing values: ", df.shape)
df.nunique()

Shape of the dataset after dropping missing values:  (519, 14)


Date                      489
Media                     259
Member                     19
Link                      519
Information               361
Questions                 465
Answers                   509
list_regex                407
list_regex_reduced        407
list_regex_len             38
list_regex_reduced_len     38
text_length               504
pos_metaphors             175
pos_metaphors_len          15
dtype: int64

In [13]:
#types of data
df.dtypes

Date                      datetime64[ns]
Media                             object
Member                            object
Link                              object
Information                       object
Questions                         object
Answers                           object
list_regex                        object
list_regex_reduced                object
list_regex_len                     int64
list_regex_reduced_len             int64
text_length                        int64
pos_metaphors                     object
pos_metaphors_len                  int64
dtype: object

In [None]:
# TODO: RUN both POS and Regex
# TODO: Make graph of frequency of "infaltion" and frequency of Metaphors and the line of ratio per interview -> change the text length value in the graphs to one of these features -> Scale
# TODO: Poisson regression -> with POS and Regex, then labeled
# TODO: POS extract_relatiioships() - keep also the whole sentence?
