In [22]:
import pandas as pd
from pprint import pprint
# Importing package and summarizer
import gensim

In [2]:
data = r'data/all_ECB_speeches.csv'

In [3]:
df = pd.read_csv(data,sep='|')
df.head()

Unnamed: 0,date,speakers,title,subtitle,contents
0,2021-10-20,Frank Elderson,Overcoming the tragedy of the horizon: requiri...,"Keynote speech by Frank Elderson, Member of th...",SPEECH Overcoming the tragedy of the horiz...
1,2021-10-19,Fabio Panetta,“Hic sunt leones” – open research questions on...,"Speech by Fabio Panetta, Member of the Executi...",SPEECH “Hic sunt leones” – open research q...
2,2021-10-19,Frank Elderson,The role of supervisors and central banks in t...,"Keynote speech by Frank Elderson, Member of th...",SPEECH The role of supervisors and central...
3,2021-10-16,Christine Lagarde,Globalisation after the pandemic,2021 Per Jacobsson Lecture by Christine Lagard...,SPEECH Globalisation after the pandemic ...
4,2021-10-14,Christine Lagarde,IMFC Statement,"Statement by Christine Lagarde, President of t...",SPEECH IMFC Statement Statement by Chri...


In [4]:
#Extractive summery methods

In [7]:
#!pip install sumy
import sumy
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
# Import the LexRank summarizer
from sumy.summarizers.lex_rank import LexRankSummarizer

import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /Users/jbrettl/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [8]:
# Initializing the parser
my_parser = PlaintextParser.from_string(text,Tokenizer('english'))

In [9]:
# Creating a summary of 3 sentences.
lex_rank_summarizer = LexRankSummarizer()
lexrank_summary = lex_rank_summarizer(my_parser.document,sentences_count=5)

# Printing the summary
for sentence in lexrank_summary:
    print(sentence)

I will then turn to what I’ve just referred to as the next important step in risk management − transition planning − and what banks, as well as supervisors and other competent authorities, need to do in order to make it work.
On the supervisory side, we have been taking steps to increase our understanding of the impact of the climate crisis from a financial risk perspective and ensure that banks have a comprehensive, strategic and forward-looking approach to disclosing and managing all climate-related and environmental risks, or C&E risks for short.
Indeed, we see that some banks that declare carbon neutrality by 2050 have no strategy in place to reduce their exposures to carbon-intensive industries by 2030, nor have they made any progress in assessing their role in financing the transition of such companies.
Even if banks comply with our supervisory expectations for the management and disclosure of C&E risks, that does not automatically mean that they have a Paris-compliant transition

In [10]:
#LSA (Latent semantic analysis)
# Import the summarizer
from sumy.summarizers.lsa import LsaSummarizer
# Parsing the text string using PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.parsers.plaintext import PlaintextParser

parser=PlaintextParser.from_string(text,Tokenizer('english'))

In [11]:
# creating the summarizer
lsa_summarizer=LsaSummarizer()
lsa_summary= lsa_summarizer(parser.document,5)

# Printing the summary
for sentence in lsa_summary:
    print(sentence)

This is all the more problematic when our benchmarking shows that a majority of banks said they were exposed to medium and long-term C&E risks.
There are also encouraging examples: many banks have started to proactively manage their transition risks and set goals for decarbonisation, net zero emissions or alignment.
But my bottom line today is this: having processes and procedures in place, assigning responsibilities, conducting risk analyses of various sorts – these are all means to an end.
These templates can be supplemented with tools that give us information on banks’ forward-looking alignment – or a lack thereof – vis-à-vis the European intermediate targets for carbon emissions.
Banks should develop transition plans compatible with EU policies implementing the Paris Agreement that include concrete intermediate milestones from now until 2050, and disclose progress towards these goals on an annual basis.


In [12]:
#Luhn - Luhn Summarization algorithm’s approach is based on TF-IDF (Term Frequency-Inverse Document Frequency)

# Import the summarizer
from sumy.summarizers.luhn import LuhnSummarizer
# Creating the parser
from sumy.nlp.tokenizers import Tokenizer
from sumy.parsers.plaintext import PlaintextParser
parser=PlaintextParser.from_string(text,Tokenizer('english'))

In [13]:
# Creating the summarizer
luhn_summarizer=LuhnSummarizer()
luhn_summary=luhn_summarizer(parser.document,sentences_count=3)

# Printing the summary
for sentence in luhn_summary:
    print(sentence)

On the supervisory side, we have been taking steps to increase our understanding of the impact of the climate crisis from a financial risk perspective and ensure that banks have a comprehensive, strategic and forward-looking approach to disclosing and managing all climate-related and environmental risks, or C&E risks for short.
But in addition to banks’ exposures to an economy increasingly marred by climate change and banks’ own stated intentions, a legal obligation for banks to have a clear, detailed and prudent transition plan in place would increase the consistency of the regulatory and supervisory framework and contribute to maintaining a level playing field.
This is why I believe that having in place a formal, legally binding requirement to adopt such transition plans will push banks to go beyond the mere point-in-time measurement of C&E risks and to embark on a thorough assessment of the structural changes that are likely to occur within the industries they derive their business 

In [14]:
#KL-Sum: Another extractive method is the KL-Sum algorithm.
#It selects sentences based on similarity of word distribution as the original text. It aims to lower the KL-divergence criteria (learn more). 
#It uses greedy optimization approach and keeps adding sentences till the KL-divergence decreases.

from sumy.summarizers.kl import KLSummarizer
# Creating the parser
from sumy.nlp.tokenizers import Tokenizer
from sumy.parsers.plaintext import PlaintextParser
parser=PlaintextParser.from_string(text,Tokenizer('english'))

In [15]:
# Instantiating the KLSummarizer
kl_summarizer=KLSummarizer()
kl_summary=kl_summarizer(parser.document,sentences_count=3)

# Printing the summary
for sentence in kl_summary:
    print(sentence)

This means that the time for preparations is over and the time for action is now.
It should contain concrete intermediate milestones from now until 2050 and the associated key and performance indicators so that the bank’s management and the competent authorities can at all times understand the risks arising from a possible misalignment with the transition path.
But these are global scenarios and they do not reflect the ongoing developments within the European legal framework.


In [16]:
#Abstrative summary methods
#!pip install transformers
#!pip install h5py

In [17]:
#Summarization with T5 Transformers

In [18]:
# Importing requirements
from transformers import T5Tokenizer, T5Config, T5ForConditionalGeneration

In [19]:
# Instantiating the model and tokenizer
my_model = T5ForConditionalGeneration.from_pretrained('t5-small')
tokenizer = T5Tokenizer.from_pretrained('t5-small')

Downloading:   0%|          | 0.00/773k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.32M [00:00<?, ?B/s]

In [21]:
# Concatenating the word "summarize:" to raw text
text = "summarize:" + text
text

'summarize:   SPEECH  Overcoming the tragedy of the horizon: requiring banks to translate 2050 targets into milestones   Keynote speech by Frank Elderson, Member of the Executive Board of the ECB and Vice-Chair of the Supervisory Board of the ECB, at the Financial Market Authority’s Supervisory Conference Vienna, 20 October 2021 Introduction It is a great pleasure for me to speak here and I would like to thank the organisers for inviting me.  Today, I want to look ahead – to next year, to the next five years and to the three decades up to 2050, the year by which the EU has pledged to become carbon neutral under the Paris Agreement. Thirty years is a hair’s breadth of time: merely one-hundred-and-fifty-millionth of the earth’s 4.5 billion years of existence. A brief span indeed.  There is no doubt that time is running out for us to tackle the climate and environmental crises. What does this mean for banks? This means that the time for preparations is over and the time for action is now.

In [24]:
# encoding the input text
input_ids=tokenizer.encode(text, return_tensors='pt', max_length=512)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


In [41]:
# Generating summary ids
summary_ids = my_model.generate(input_ids,min_length=300)
summary_ids

tensor([[    0,     3,     9,   843,  7977,  5023,    57,  4937, 21297,   739,
             6,  1144,    13,     8,     3,  3073,   279,    11,  6444,    18]])

In [43]:
# Decoding the tensor and printing the summary.
t5_summary = tokenizer.decode(summary_ids[0])
print(t5_summary)

<pad> a keynote speech by Frank Elderson, member of the ECB and vice-


In [44]:
# Importing the model
from transformers import BartForConditionalGeneration, BartTokenizer, BartConfig

In [45]:
# Loading the model and tokenizer for bart-large-cnn

tokenizer=BartTokenizer.from_pretrained('facebook/bart-large-cnn')
model=BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.55k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.51G [00:00<?, ?B/s]

In [59]:
# Encoding the inputs and passing them to model.generate()
inputs = tokenizer.batch_encode_plus([text],return_tensors='pt',truncation=True,max_length=1000)
summary_ids = model.generate(inputs['input_ids'], early_stopping=True)

In [62]:
# Decoding and printing the summary
bart_summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print(bart_summary)

Time is running out for us to tackle the climate and environmental crises. Banks can no longer simply declare their intention to be Paris-compliant by 2050. They must make structural changes to their way of doing business so as to make sure that they actually reach that goal and avoid the build-up of risks.


In [68]:
# Importing model and tokenizer
from transformers import GPT2Tokenizer,GPT2LMHeadModel

# Instantiating the model and tokenizer with gpt-2
tokenizer=GPT2Tokenizer.from_pretrained('gpt2')
model=GPT2LMHeadModel.from_pretrained('gpt2')

# Encoding text to get input ids & pass them to model.generate()
inputs=tokenizer.batch_encode_plus([text],return_tensors='pt',truncation=True,max_length=300)
summary_ids=model.generate(inputs['input_ids'],early_stopping=True)

# Decoding and printing summary

GPT_summary=tokenizer.decode(summary_ids[0],skip_special_tokens=True)
print(GPT_summary)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Input length of input_ids is 300, but ``max_length`` is set to 20. This can lead to unexpected behavior. You should consider increasing ``config.max_length`` or ``max_length``.


summarize:   SPEECH  Overcoming the tragedy of the horizon: requiring banks to translate 2050 targets into milestones   Keynote speech by Frank Elderson, Member of the Executive Board of the ECB and Vice-Chair of the Supervisory Board of the ECB, at the Financial Market Authority’s Supervisory Conference Vienna, 20 October 2021 Introduction It is a great pleasure for me to speak here and I would like to thank the organisers for inviting me.  Today, I want to look ahead – to next year, to the next five years and to the three decades up to 2050, the year by which the EU has pledged to become carbon neutral under the Paris Agreement. Thirty years is a hair’s breadth of time: merely one-hundred-and-fifty-millionth of the earth’s 4.5 billion years of existence. A brief span indeed.  There is no doubt that time is running out for us to tackle the climate and environmental crises. What does this mean for banks? This means that the time for preparations is over and the time for action is now. 

In [69]:
# Importing model and tokenizer
from transformers import XLMWithLMHeadModel, XLMTokenizer

# Instantiating the model and tokenizer
tokenizer=XLMTokenizer.from_pretrained('xlm-mlm-en-2048')
model=XLMWithLMHeadModel.from_pretrained('xlm-mlm-en-2048')

# Encoding text to get input ids & pass them to model.generate()
inputs=tokenizer.batch_encode_plus([text],return_tensors='pt',truncation=True,max_length=300)
summary_ids=model.generate(inputs['input_ids'],early_stopping=True)

# Decode and print the summary
XLM_summary=tokenizer.decode(summary_ids[0],skip_special_tokens=True)
print(XLM_summary)

Some weights of XLMWithLMHeadModel were not initialized from the model checkpoint at xlm-mlm-en-2048 and are newly initialized: ['transformer.position_ids']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Input length of input_ids is 300, but ``max_length`` is set to 20. This can lead to unexpected behavior. You should consider increasing ``config.max_length`` or ``max_length``.


summarize : speech overcoming the tragedy of the horizon : requiring banks to translate 2050 targets into milestones keynote speech by frank elderson, member of the executive board of the ecb and vice-chair of the supervisory board of the ecb, at the financial market authority's supervisory conference vienna, 20 october 2021 introduction it is a great pleasure for me to speak here and i would like to thank the organisers for inviting me. today, i want to look ahead - to next year, to the next five years and to the three decades up to 2050, the year by which the eu has pledged to become carbon neutral under the paris agreement. thirty years is a hair's breadth of time : merely one-hundred-and-fifty-millionth of the earth's 4.5 billion years of existence. a brief span indeed. there is no doubt that time is running out for us to tackle the climate and environmental crises. what does this mean for banks? this means that the time for preparations is over and the time for action is now. bank