## Descriptive Analysis - ECB Metaphors

Mathieu Notebook

19/04/2024

In [45]:
# Libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import os
import aux_functions 

In [46]:
workingdir = os.getcwd()
# read cleaned_data.csv file
df = pd.read_csv(workingdir + '/cleaned_data.csv')

#copy the dataset for later verifications
df_copy = df.copy()
df_copy = df_copy.drop(df.index[171])
df_copy.reset_index(drop=True, inplace=True)

### First look a the data frame

In [47]:
#head of the data
df.head()

Unnamed: 0,date,speakers,title,subtitle,contents
0,2024-02-07,Isabel Schnabel,Interview with Financial Times,"Interview with Isabel Schnabel, Member of the ...",INTERVIEW Interview with Financial Times ...
1,2024-02-03,Frank Elderson,Interview with De Volkskrant,"Interview with Frank Elderson, Member of the E...",INTERVIEW Interview with De Volkskrant In...
2,2024-01-31,Luis de Guindos,Interview with Die Zeit,"Interview with Luis de Guindos, Vice-President...",INTERVIEW Interview with Die Zeit Intervi...
3,2024-01-22,Christine Lagarde,Thanks to Wolfgang Schäuble,"Contribution by Christine Lagarde, President o...",CONTRIBUTION Thanks to Wolfgang Schäuble ...
4,2024-01-13,Philip R. Lane,Interview with Corriere della Sera,"Interview with Philip R. Lane, Member of the E...",INTERVIEW Interview with Corriere della Ser...


In [48]:
# tail of the data
df.tail()

Unnamed: 0,date,speakers,title,subtitle,contents
533,2004-12-06,Otmar Issing,"Interview with Prof. Otmar Issing, (Delo, Slov...",published on 4 December 2004,
534,2004-10-18,José Manuel González-Páramo,"Interview with Mr José Manuel González-Páramo,...","by Mrs Marietta Kurm-Engels, Handelsblatt",
535,2004-10-09,Jean-Claude Trichet,"Interview with Jean-Claude Trichet, President ...",Conducted by Corinne Lhaïk (L'Express) on 29 S...,
536,2004-08-08,Lucas Papademos,"Interview with Lucas Papademos, Vice-President...",Conducted by Beda Romano (Il Sole 24 Ore) on 5...,
537,2004-06-18,Jean-Claude Trichet,"Interview with Jean-Claude Trichet, President ...","conducted by Andrea Bonanni (La Repubblica), J...",


In [49]:
# describe the data
df.describe()

Unnamed: 0,date,speakers,title,subtitle,contents
count,538,538,538,538,537.0
unique,507,20,284,537,292.0
top,2020-11-17,Benoît Cœuré,Interview with Il Sole 24 Ore,"Interview with Jean-Claude Trichet, President ...",
freq,3,88,14,2,246.0


In [50]:
# count of the data
df.count()

date        538
speakers    538
title       538
subtitle    538
contents    537
dtype: int64

In [51]:
# shape of data
df.shape

(538, 5)

In [52]:
# remove missing values
df = df.dropna()

### First look at the content of the interviews

In [53]:
aux_functions.print_contents(df,20, "contents")


Content 2:   INTERVIEW  Interview with De Volkskrant   Interview with Frank Elderson, Member of the Executive Board of the ECB and Vice-Chair of the Supervisory Board of the ECB, conducted by Jonathan Witteman on 29 January 2024 3 February 2024  In its introduction to the interview, the newspaper refers to and quotes from speeches of   September 2023   and   November 2023   as well as from the interview itself, ending with references to the sanctions that the ECB can impose and the fit and proper requirements for bankers.    Coming from a supervisor charged with assessing bankers, those words sound quite threatening. What happens if the ECB finds a banker inadequate on climate?  We obviously have the advantage of being able to inspect the inner workings of all banks. And we see that a lot is going well at banks in the area of climate risks, even if no single bank has currently met all of our expectations. But I don’t see any bank completely ignoring climate risks either. Should this ha

this is a sample of the first 5 text contained in the first 5 rows of the columns "contents". Please create a function to extract consistently the type of the interview that is in capital letters and add it to a new column and remove it from the contents columns: 

### Formatting the dataset
+ Creating new columns: "Speaker_Position", "Interviewer_Name", "Interview_Type"
+ Removing "Subtitle" column
+ Keeping only the content in the column "contents"

In [54]:
# # Libraries for text processing
# import re
# import nltk
# from nltk.corpus import stopwords
# from nltk.stem.porter import PorterStemmer
# from nltk.tokenize import RegexpTokenizer
# from nltk.stem.wordnet import WordNetLemmatizer


1. Extracting the type of the interview (probably impossible to do given that some rows are missing that information)
+ solution: simply remove that part from the contents column

2. Extracting the position of the speaker

In [55]:
# Apply the function
df = aux_functions.extract_position_and_clean_content(df)
df.head(5)


Unnamed: 0,date,speakers,title,subtitle,contents,position_speaker
0,2024-02-07,Isabel Schnabel,Interview with Financial Times,"Interview with Isabel Schnabel, Member of the ...",INTERVIEW Interview with Financial Times In...,Member of the Executive Board
1,2024-02-03,Frank Elderson,Interview with De Volkskrant,"Interview with Frank Elderson, Member of the E...",INTERVIEW Interview with De Volkskrant Inte...,Member of the Executive Board
2,2024-01-31,Luis de Guindos,Interview with Die Zeit,"Interview with Luis de Guindos, Vice-President...",INTERVIEW Interview with Die Zeit Interview...,Vice-President
3,2024-01-22,Christine Lagarde,Thanks to Wolfgang Schäuble,"Contribution by Christine Lagarde, President o...",CONTRIBUTION Thanks to Wolfgang Schäuble Co...,President
4,2024-01-13,Philip R. Lane,Interview with Corriere della Sera,"Interview with Philip R. Lane, Member of the E...",INTERVIEW Interview with Corriere della Sera ...,Member of the Executive Board


In [56]:
aux_functions.print_contents(df,20, "contents")

Content 2: INTERVIEW  Interview with De Volkskrant   Interview with Frank Elderson,  of the ECB and Vice-Chair of the Supervisory Board of the ECB, conducted by Jonathan Witteman on 29 January 2024 3 February 2024  In its introduction to the interview, the newspaper refers to and quotes from speeches of   September 2023   and   November 2023   as well as from the interview itself, ending with references to the sanctions that the ECB can impose and the fit and proper requirements for bankers.    Coming from a supervisor charged with assessing bankers, those words sound quite threatening. What happens if the ECB finds a banker inadequate on climate?  We obviously have the advantage of being able to inspect the inner workings of all banks. And we see that a lot is going well at banks in the area of climate risks, even if no single bank has currently met all of our expectations. But I don’t see any bank completely ignoring climate risks either. Should this happen in the future, a moment wo

In [57]:
aux_functions.print_contents(df,30, "position_speaker")

Content 1: Member of the Executive Board
Content 2: Member of the Executive Board
Content 3: Vice-President
Content 4: President
Content 5: Member of the Executive Board
Content 6: Member of the Executive Board
Content 7: President
Content 8: Member of the Executive Board
Content 9: Vice-President
Content 10: Member of the Executive Board
Content 11: Vice-President
Content 12: Vice-President
Content 13: President
Content 14: member of the Executive Board
Content 15: President
Content 16: Member of the Executive Board
Content 17: Vice-President
Content 18: Member of the Executive Board
Content 19: Member of the Executive Board
Content 20: Member of the Executive Board
Content 21: President
Content 22: President
Content 23: Vice-President
Content 24: Member of the Executive Board
Content 25: Member of the Executive Board
Content 26: Member of the Executive Board
Content 27: Member of the Executive Board
Content 28: Vice-President
Content 29: President
Content 30: Member of the Executive 

*Content 172* contains the value **None**. After looking  more precisely at the content of this interview, we notice that the interview was conducted in Italian. Hence, we decide to remove that interview. 

In [58]:
# remove the row 171 only from df
df = df.drop(df.index[171])
df.reset_index(drop=True, inplace=True)

In [59]:
aux_functions.print_contents(df,30, "position_speaker")

Content 1: Member of the Executive Board
Content 2: Member of the Executive Board
Content 3: Vice-President
Content 4: President
Content 5: Member of the Executive Board
Content 6: Member of the Executive Board
Content 7: President
Content 8: Member of the Executive Board
Content 9: Vice-President
Content 10: Member of the Executive Board
Content 11: Vice-President
Content 12: Vice-President
Content 13: President
Content 14: member of the Executive Board
Content 15: President
Content 16: Member of the Executive Board
Content 17: Vice-President
Content 18: Member of the Executive Board
Content 19: Member of the Executive Board
Content 20: Member of the Executive Board
Content 21: President
Content 22: President
Content 23: Vice-President
Content 24: Member of the Executive Board
Content 25: Member of the Executive Board
Content 26: Member of the Executive Board
Content 27: Member of the Executive Board
Content 28: Vice-President
Content 29: President
Content 30: Member of the Executive 

3. Extracting who is conducting the interview

In [60]:
aux_functions.print_contents(df,30, "title")

Content 1: Interview with Financial Times
Content 2: Interview with De Volkskrant
Content 3: Interview with Die Zeit
Content 4: Thanks to Wolfgang Schäuble
Content 5: Interview with Corriere della Sera
Content 6: Q&A on X
Content 7: Tribute article on Wolfgang Schäuble for Die Zeit
Content 8: Interview with Süddeutsche Zeitung
Content 9: Interview with 20 Minutos
Content 10: Interview with Reuters
Content 11: Interview with De Standaard and La Libre Belgique
Content 12: Interview with Finance
Content 13: Interview with Kathimerini
Content 14: Interview with Het Financieele Dagblad
Content 15: Interview with La Tribune Dimanche
Content 16: Interview with Jutarnji list
Content 17: Interview with Financial Times
Content 18: Interview with Market News International 
Content 19: Interview with Yahoo Finance
Content 20: Interview with The Currency
Content 21: Interview with Le Figaro 
Content 22: Interview with La Provence
Content 23: Interview with ABC
Content 24: Interview with the Financi

In [61]:
# Apply the function to extract interviewer information
df = aux_functions.extract_interviewer(df)
df.head(10)


Unnamed: 0,date,speakers,title,subtitle,contents,position_speaker,interviewer
0,2024-02-07,Isabel Schnabel,Interview with Financial Times,"Interview with Isabel Schnabel, Member of the ...",INTERVIEW Interview with Financial Times In...,Member of the Executive Board,Financial Times
1,2024-02-03,Frank Elderson,Interview with De Volkskrant,"Interview with Frank Elderson, Member of the E...",INTERVIEW Interview with De Volkskrant Inte...,Member of the Executive Board,De Volkskrant
2,2024-01-31,Luis de Guindos,Interview with Die Zeit,"Interview with Luis de Guindos, Vice-President...",INTERVIEW Interview with Die Zeit Interview...,Vice-President,Die Zeit
3,2024-01-22,Christine Lagarde,Thanks to Wolfgang Schäuble,"Contribution by Christine Lagarde, President o...",CONTRIBUTION Thanks to Wolfgang Schäuble Co...,President,Wolfgang Schäuble
4,2024-01-13,Philip R. Lane,Interview with Corriere della Sera,"Interview with Philip R. Lane, Member of the E...",INTERVIEW Interview with Corriere della Sera ...,Member of the Executive Board,Corriere della Sera
5,2024-01-10,Isabel Schnabel,Q&A on X,"Interview with Isabel Schnabel, Member of the ...",INTERVIEW Q&A on X Interview with Isabel Sc...,Member of the Executive Board,X
6,2024-01-03,Christine Lagarde,Tribute article on Wolfgang Schäuble for Die Zeit,Tribute article on Wolfgang Schäuble for Die Z...,INTERVIEW Tribute article on Wolfgang Schäubl...,President,Wolfgang Schäuble for Die Zeit
7,2023-12-22,Isabel Schnabel,Interview with Süddeutsche Zeitung,"Interview with Isabel Schnabel, Member of the ...",INTERVIEW Interview with Süddeutsche Zeitung ...,Member of the Executive Board,Süddeutsche Zeitung
8,2023-12-21,Luis de Guindos,Interview with 20 Minutos,"Interview with Luis de Guindos, Vice-President...",INTERVIEW Interview with 20 Minutos Intervi...,Vice-President,20 Minutos
9,2023-12-05,Isabel Schnabel,Interview with Reuters,"Interview with Isabel Schnabel, Member of the ...",INTERVIEW Interview with Reuters Interview ...,Member of the Executive Board,Reuters


In [62]:
aux_functions.print_contents(df,30, "interviewer")

Content 1: Financial Times
Content 2: De Volkskrant
Content 3: Die Zeit
Content 4: Wolfgang Schäuble
Content 5: Corriere della Sera
Content 6: X
Content 7: Wolfgang Schäuble for Die Zeit
Content 8: Süddeutsche Zeitung
Content 9: 20 Minutos
Content 10: Reuters
Content 11: De Standaard and La Libre Belgique
Content 12: Finance
Content 13: Kathimerini
Content 14: Het Financieele Dagblad
Content 15: La Tribune Dimanche
Content 16: Jutarnji list
Content 17: Financial Times
Content 18: Market News International
Content 19: Yahoo Finance
Content 20: The Currency
Content 21: Le Figaro
Content 22: La Provence
Content 23: ABC
Content 24: the Financial Times
Content 25: De Tijd
Content 26: Le Monde
Content 27: Les Echos
Content 28: Il Sole 24 Ore
Content 29: Nikkei
Content 30: Le Monde


4. Removing the information from the column "subtitle" already included in the other columns.
+ Keeping only the text after "conducted" or "published"
+ Keeping untouched the exceptions 

In [63]:
aux_functions.print_contents(df,20, "subtitle")

Content 1: Interview with Isabel Schnabel, Member of the Executive Board of the ECB, conducted by Martin Arnold on 2 February 2024
Content 2: Interview with Frank Elderson, Member of the Executive Board of the ECB and Vice-Chair of the Supervisory Board of the ECB, conducted by Jonathan Witteman on 29 January 2024
Content 3: Interview with Luis de Guindos, Vice-President of the ECB, conducted by Kolja Rudzio
Content 4: Contribution by Christine Lagarde, President of the ECB, French and German members of parliament and other personalities, published on n-tv.de
Content 5: Interview with Philip R. Lane, Member of the Executive Board of the ECB, conducted by Federico Fubini
Content 6: Interview with Isabel Schnabel, Member of the Executive Board of the ECB, conducted and published on 10 January 2024
Content 7: Tribute article on Wolfgang Schäuble for Die Zeit by Christine Lagarde, President of the ECB 
Content 8: Interview with Isabel Schnabel, Member of the Executive Board of the ECB, con

In [64]:
# Apply the function to extract interviewer information
df = aux_functions.extract_subtitle(df)
df.head(10)

Unnamed: 0,date,speakers,title,subtitle,contents,position_speaker,interviewer
0,2024-02-07,Isabel Schnabel,Interview with Financial Times,conducted by Martin Arnold on 2 February 2024,INTERVIEW Interview with Financial Times In...,Member of the Executive Board,Financial Times
1,2024-02-03,Frank Elderson,Interview with De Volkskrant,conducted by Jonathan Witteman on 29 January 2024,INTERVIEW Interview with De Volkskrant Inte...,Member of the Executive Board,De Volkskrant
2,2024-01-31,Luis de Guindos,Interview with Die Zeit,conducted by Kolja Rudzio,INTERVIEW Interview with Die Zeit Interview...,Vice-President,Die Zeit
3,2024-01-22,Christine Lagarde,Thanks to Wolfgang Schäuble,published on n-tv,CONTRIBUTION Thanks to Wolfgang Schäuble Co...,President,Wolfgang Schäuble
4,2024-01-13,Philip R. Lane,Interview with Corriere della Sera,conducted by Federico Fubini,INTERVIEW Interview with Corriere della Sera ...,Member of the Executive Board,Corriere della Sera
5,2024-01-10,Isabel Schnabel,Q&A on X,conducted and published on 10 January 2024,INTERVIEW Q&A on X Interview with Isabel Sc...,Member of the Executive Board,X
6,2024-01-03,Christine Lagarde,Tribute article on Wolfgang Schäuble for Die Zeit,Tribute article on Wolfgang Schäuble for Die Z...,INTERVIEW Tribute article on Wolfgang Schäubl...,President,Wolfgang Schäuble for Die Zeit
7,2023-12-22,Isabel Schnabel,Interview with Süddeutsche Zeitung,conducted by Meike Schreiber und Markus Zydra ...,INTERVIEW Interview with Süddeutsche Zeitung ...,Member of the Executive Board,Süddeutsche Zeitung
8,2023-12-21,Luis de Guindos,Interview with 20 Minutos,conducted by Emilio Ordiz and Jorge Millán,INTERVIEW Interview with 20 Minutos Intervi...,Vice-President,20 Minutos
9,2023-12-05,Isabel Schnabel,Interview with Reuters,conducted by Balázs Korányi on 1 December 2023,INTERVIEW Interview with Reuters Interview ...,Member of the Executive Board,Reuters


In [65]:
aux_functions.print_contents(df,20, "subtitle")

Content 1: conducted by Martin Arnold on 2 February 2024
Content 2: conducted by Jonathan Witteman on 29 January 2024
Content 3: conducted by Kolja Rudzio
Content 4: published on n-tv
Content 5: conducted by Federico Fubini
Content 6: conducted and published on 10 January 2024
Content 7: Tribute article on Wolfgang Schäuble for Die Zeit by Christine Lagarde, President of the ECB 
Content 8: conducted by Meike Schreiber und Markus Zydra on 18 December 2023
Content 9: conducted by Emilio Ordiz and Jorge Millán
Content 10: conducted by Balázs Korányi on 1 December 2023
Content 11: conducted by Ruben Mooijman and Ariane van Caloen, on 23 November 2023
Content 12: conducted by Albina Kenda
Content 13: conducted by Alexis Papahelas on 30 October
Content 14: conducted by Marcel de Boer, Marijn Jongsma and Joost van Kuppeveld on 11 October 2023
Content 15: conducted by Marie-Pierre Gröndahl on 2 October 2023
Content 16: conducted by Marina Klepo on 29 September 2023
Content 17: conducted by Ma

5. Removing all the duplicate information in the column "contents" to leave only the content of the interview itself

In [66]:
aux_functions.print_contents(df,20, "contents")

Content 2: INTERVIEW  Interview with De Volkskrant   Interview with Frank Elderson,  of the ECB and Vice-Chair of the Supervisory Board of the ECB, conducted by Jonathan Witteman on 29 January 2024 3 February 2024  In its introduction to the interview, the newspaper refers to and quotes from speeches of   September 2023   and   November 2023   as well as from the interview itself, ending with references to the sanctions that the ECB can impose and the fit and proper requirements for bankers.    Coming from a supervisor charged with assessing bankers, those words sound quite threatening. What happens if the ECB finds a banker inadequate on climate?  We obviously have the advantage of being able to inspect the inner workings of all banks. And we see that a lot is going well at banks in the area of climate risks, even if no single bank has currently met all of our expectations. But I don’t see any bank completely ignoring climate risks either. Should this happen in the future, a moment wo

First keeping only the text that follows the content from the variable subtitle.

This allows us to safely remove the first unnecessary part without deleting any text of the interview itself.

In [67]:
# Apply the function to refine contents
df = aux_functions.refine_contents(df)



In [68]:
aux_functions.print_contents(df,20, "contents")

Content 2: 3 February 2024  In its introduction to the interview, the newspaper refers to and quotes from speeches of   September 2023   and   November 2023   as well as from the interview itself, ending with references to the sanctions that the ECB can impose and the fit and proper requirements for bankers.    Coming from a supervisor charged with assessing bankers, those words sound quite threatening. What happens if the ECB finds a banker inadequate on climate?  We obviously have the advantage of being able to inspect the inner workings of all banks. And we see that a lot is going well at banks in the area of climate risks, even if no single bank has currently met all of our expectations. But I don’t see any bank completely ignoring climate risks either. Should this happen in the future, a moment would come where we would have to ask ourselves whether the people at the helm are still fit for their task.   How do banks endanger the economy by underestimating climate change?   Through

Secondly, we decide to keep the text that follows the first appearance of a date in that format "19 March 2020". However, if that date appears after the first sentence (delimited by a dote), the text remains unchanged. This is done to prevent removing useful text in situations where a date would appear in the middle of the interview. 

Function looking at dates only before the 1ST dot

In [69]:

# Example usage:
# Assuming 'df' is your DataFrame and you want to adjust the 'contents' column
df = aux_functions.keep_text_after_date(df, 'contents')
df = aux_functions.keep_text_after_date(df, 'contents')


In [70]:
aux_functions.print_contents(df,20, "contents")

Content 2: In its introduction to the interview, the newspaper refers to and quotes from speeches of   September 2023   and   November 2023   as well as from the interview itself, ending with references to the sanctions that the ECB can impose and the fit and proper requirements for bankers.    Coming from a supervisor charged with assessing bankers, those words sound quite threatening. What happens if the ECB finds a banker inadequate on climate?  We obviously have the advantage of being able to inspect the inner workings of all banks. And we see that a lot is going well at banks in the area of climate risks, even if no single bank has currently met all of our expectations. But I don’t see any bank completely ignoring climate risks either. Should this happen in the future, a moment would come where we would have to ask ourselves whether the people at the helm are still fit for their task.   How do banks endanger the economy by underestimating climate change?   Through credit risks, fo

We observe that some sentences start with a dot. In order to make the function "keep_text_after_date" work properly on these texts, we decide to remove the dot in the text where a dot appears in the first 3 characters. 

In [71]:
df = aux_functions.remove_initial_dot(df, 'contents')
df = aux_functions.remove_initial_dot(df, 'contents')

df = aux_functions.keep_text_after_date(df, 'contents')


In [72]:
aux_functions.print_contents(df,20, "contents")

Content 2: In its introduction to the interview, the newspaper refers to and quotes from speeches of   September 2023   and   November 2023   as well as from the interview itself, ending with references to the sanctions that the ECB can impose and the fit and proper requirements for bankers.    Coming from a supervisor charged with assessing bankers, those words sound quite threatening. What happens if the ECB finds a banker inadequate on climate?  We obviously have the advantage of being able to inspect the inner workings of all banks. And we see that a lot is going well at banks in the area of climate risks, even if no single bank has currently met all of our expectations. But I don’t see any bank completely ignoring climate risks either. Should this happen in the future, a moment would come where we would have to ask ourselves whether the people at the helm are still fit for their task.   How do banks endanger the economy by underestimating climate change?   Through credit risks, fo

We will now verify if the preprocessing of the content of the interview worked properly. In other words, we verify if the pipeline did not remove any sentences of the interview. 

We will manually verify every the texts number 1, 50, 100, 150 and 200

In [73]:
print(f"Interview Preprocessed: '{df['contents'][0]}'")
print(f"Interview Original: '{df_copy['contents'][0]}'")



In [74]:
print(f"Interview Preprocessed: '{df['contents'][1]}'")
print(f"Interview Original: '{df_copy['contents'][1]}'")

Interview Preprocessed: 'In its introduction to the interview, the newspaper refers to and quotes from speeches of   September 2023   and   November 2023   as well as from the interview itself, ending with references to the sanctions that the ECB can impose and the fit and proper requirements for bankers.    Coming from a supervisor charged with assessing bankers, those words sound quite threatening. What happens if the ECB finds a banker inadequate on climate?  We obviously have the advantage of being able to inspect the inner workings of all banks. And we see that a lot is going well at banks in the area of climate risks, even if no single bank has currently met all of our expectations. But I don’t see any bank completely ignoring climate risks either. Should this happen in the future, a moment would come where we would have to ask ourselves whether the people at the helm are still fit for their task.   How do banks endanger the economy by underestimating climate change?   Through cr

In [75]:
print(f"Interview Preprocessed: '{df['contents'][50]}'")
print(f"Interview Original: '{df_copy['contents'][50]}'")

Interview Preprocessed: 'Has the euro zone reached peak inflation?  It’s probably too early to make that judgement, but I would be reasonably confident in saying that it is likely we are close to peak inflation. But whether this already is the peak or whether it will arrive at the start of 2023, is still uncertain. The main uncertainty is that we’ve seen so much volatility in gas prices. In some countries, consumer prices have moved a lot, while in others for example some utility companies have not yet finished hiking prices. Given the significant increase in prices, I don’t rule out some extra inflation early next year. Once we are past the initial months of 2023, later on in 2023 – in the spring or summer – we should see a sizeable drop in the inflation rate. That said, the journey of inflation from the current very high levels back to 2% will take time.  Next year, will inflation go down to 6-7%?  The initial downshift from the current high rates will be to around that level but I w

In [76]:
print(f"Interview Preprocessed: '{df['contents'][150]}'")
print(f"Interview Original: '{df_copy['contents'][150]}'")

Interview Preprocessed: 'The latest ECB forecasts weren’t much changed, some of the data is good, some of it not so good. What’s your view of the outlook at the moment?  I must say that since the last time we took action, the least one could say is that things in the economy have not gone for the worse. And even with sometimes conflicting incoming information, most of it has led us to say that the risk balance may still be somewhat to the downside but less so than it has been, and that led us to no change because we said we are broadly in line with our baseline. Looking also at new incoming information I think nothing is pointing to a further deterioration at least not on the front of prices and production. I do not speak on the health system, and that is why I have to bring in a caveat. This is based on an assumption that things continue as they are right now, that there is no major deterioration on the health front. Now a deterioration on the health front might not be only an increas

In [77]:
print(f"Interview Preprocessed: '{df['contents'][200]}'")
print(f"Interview Original: '{df_copy['contents'][200]}'")

Interview Preprocessed: 'Good morning, Christine Lagarde. This is your first television interview in France since taking up your new position at the European Central Bank in November. Thank you for giving it to France 2 and Télématin. French people knew you as the Minister of Economy and then as Managing Director of the International Monetary Fund (IMF). And now, as head of the ECB, you have an even more powerful role and the financial markets scrutinise everything you say. Do you have to weigh every word?   You have to speak sparingly enough for your message to remain solid and credible. And central bankers obviously have to be careful in sending out signals because financial markets and analysts are watching and will draw certain conclusions in order to decipher what they should focus their efforts on and where they should move their funds to.   Let me briefly touch on France. Today is a day of strikes in protest against the pension reform which will be presented to the cabinet today

In [78]:
print(f"Interview Preprocessed: '{df['contents'][210]}'")
print(f"Interview Original: '{df_copy['contents'][210]}'")

Interview Preprocessed: 'Mr Cœuré, the working group on stablecoins you’ve been chairing is expected to provide policy recommendations to G7 ministers on 17 October. What will be the gist of what you’re going to tell them?  It’s important to expand the discussion on global stablecoins to embrace the broader context of technological changes in payments so that we may have a wider discussion on the mix of solutions – both public and private – that can bring the benefits of technological improvements to the end users. Stablecoins are only one component of this discussion. They raise a multifaceted technological and regulatory debate, covering financial regulation and anti-money laundering, as well as non-financial issues such as privacy, data and taxes. Stablecoins also raise deeper public-policy issues, particularly regarding the definition of money and monetary sovereignty. The report offers a framework for exploring these and related topics. It helps to structure the discussion, starti

6. Renaming columns name

In [79]:
# renaming the column speaker
df.rename(columns={'speakers': 'speaker'}, inplace=True)
df.rename(columns={'position_speaker': 'speaker_position'}, inplace=True)
df.rename(columns={'subtitle': 'extra_info'}, inplace=True)

## Final Data Frame

In [80]:
df.head(20)

Unnamed: 0,date,speaker,title,extra_info,contents,speaker_position,interviewer
0,2024-02-07,Isabel Schnabel,Interview with Financial Times,conducted by Martin Arnold on 2 February 2024,"Now that inflation is fading, some say it was ...",Member of the Executive Board,Financial Times
1,2024-02-03,Frank Elderson,Interview with De Volkskrant,conducted by Jonathan Witteman on 29 January 2024,"In its introduction to the interview, the news...",Member of the Executive Board,De Volkskrant
2,2024-01-31,Luis de Guindos,Interview with Die Zeit,conducted by Kolja Rudzio,"Mr de Guindos, Germany is in a recession, the ...",Vice-President,Die Zeit
3,2024-01-22,Christine Lagarde,Thanks to Wolfgang Schäuble,published on n-tv,A Franco-German homage and appeal As a young ...,President,Wolfgang Schäuble
4,2024-01-13,Philip R. Lane,Interview with Corriere della Sera,conducted by Federico Fubini,The rate hike in September was meant to increa...,Member of the Executive Board,Corriere della Sera
5,2024-01-10,Isabel Schnabel,Q&A on X,conducted and published on 10 January 2024,"Hi all, this is @Isabel_Schnabel, Executive Bo...",Member of the Executive Board,X
6,2024-01-03,Christine Lagarde,Tribute article on Wolfgang Schäuble for Die Zeit,Tribute article on Wolfgang Schäuble for Die Z...,"When I think of Wolfgang Schäuble, the first i...",President,Wolfgang Schäuble for Die Zeit
7,2023-12-22,Isabel Schnabel,Interview with Süddeutsche Zeitung,conducted by Meike Schreiber und Markus Zydra ...,"Inflation has recently fallen to 2.4%, more ra...",Member of the Executive Board,Süddeutsche Zeitung
8,2023-12-21,Luis de Guindos,Interview with 20 Minutos,conducted by Emilio Ordiz and Jorge Millán,Wage cost data in Spain point to an increase o...,Vice-President,20 Minutos
9,2023-12-05,Isabel Schnabel,Interview with Reuters,conducted by Balázs Korányi on 1 December 2023,What is your take on the unexpectedly benign N...,Member of the Executive Board,Reuters


7. Saving to csv file

In [81]:
#saving to csv file
df.to_csv(workingdir + '/ECB_data.csv', index=False)