<a href="https://colab.research.google.com/github/simranbains9810/mark_carney_speech_analysis/blob/main/speech_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Key risk identification from Mark Carney speeches**

Installing and updating the text mining library pdfplumber into the local Colab environment

In [1]:
!pip install --upgrade pdfplumber



# **Web scraping, text extraction and text pre-processing Mark Carney speeches**

This section outlies the comprehensive text mining operations executed to analyse Mark Carney's speeches. These operations encompass several stages, including scripts for web scraping to gather speeches from the Bank of England's website, text extraction, and conversion to plain text format. Further preprocessing steps involve word tokenization and the removal of stop words to prepare the data for analysis. The speeches were specifically sourced from the Bank of England's website by applying filters to isolate those given by Mark Carney.

In [2]:
import os
import requests
import pdfplumber
import pandas as pd
import re
import string
import unicodedata
import nltk
from nltk.tokenize import word_tokenize
from urllib.request import urlopen
from bs4 import BeautifulSoup

from urllib.request import Request, urlopen

In [3]:
# Define the directory path
directory_path = "/content/speeches"
os.mkdir(directory_path)

FileExistsError: [Errno 17] File exists: '/content/speeches'

In [4]:
%cd speeches

/content/speeches


In [5]:
urls = ["https://www.bankofengland.co.uk/-/media/boe/files/speech/2020/the-grand-unifying-theory-and-practice-of-macroprudential-policy-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2020/the-road-to-glasgow-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/speech/2020/mark-carney-opening-remarks-at-the-future-of-inflation-targeting-conference",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/remarks-by-mark-carney-at-the-ecb-farewell-board-dinner-for-benoit-coeure.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/remarks-by-mark-carney-at-the-us-climate-action-centre-madrid.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/addressing-the-growing-challenges-in-the-international-monetary-and-financial-system-slides.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/light-is-therefore-colour-governor-remarks-at-the-new-20-launch.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/tcfd-strengthening-the-foundations-of-sustainable-finance-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/remarks-given-during-the-un-secretary-generals-climate-actions-summit-2019-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/the-growing-challenges-for-monetary-policy-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/speech/2019/50-note-character-selection-announcement",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/sea-change-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/enable-empower-ensure-a-new-finance-for-the-new-economy-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/remarks-to-open-policy-panel-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/finance-by-all-for-all-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/pull-push-pipes-sustainable-capital-flows-for-a-new-world-order-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/a-platform-for-innovation-remarks-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/investing-in-ethnicity-and-race-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/speech/2019/mark-carney-speech-at-european-commission-high-level-conference-brussels",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/the-global-outlook-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/remarks-at-the-accounting-for-sustainability-summit-2018.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/50-character-selection-and-future-forum-launch.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/ai-and-the-global-economy-mark-carney-slides.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/true-finance-ten-years-after-the-financial-crisis-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/the-future-of-work-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/from-protectionism-to-prosperity-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/new-economy-new-finance-new-bank-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/guidance-contingencies-and-brexit-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/staying-connected-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/opening-remarks-by-mark-carney-at-the-econome-launch-event.pdf",
        "https://www.bankofengland.co.uk/speech/2018/mark-carney-speech-at-the-public-policy-forum-toronto",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/a-transition-in-thinking-and-action-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/the-future-of-money-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/reflections-on-leadership-in-a-disruptive-age-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2017/turning-back-the-tide-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2017/opening-remarks-at-future-forum-2017.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2017/opening-remarks-at-the-boe-independence-20-years-on-conference.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2017/de-globalisation-and-inflation.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2017/policy-panel-investment-and-growth-in-advanced-economies.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2017/a-fine-balance.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2017/what-a-difference-a-decade-makes.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2017/building-the-infrastructure-to-realise-fintechs-promise.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2017/the-high-road-to-a-responsible-open-financial-system.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2017/banking-standards-board-worthy-of-trust-law-ethics-and-culture-in-banking.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2017/reflecting-diversity-choosing-the-inclusion.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2017/the-promise-of-fintech-something-new-under-the-sun.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2017/lambda.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2016/remarks-on-the-launch-of-the-recommendations-of-the-task-force-on-climate-related.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2016/the-spectre-of-monetarism.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2016/resolving-the-climate-paradox.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2016/uncertainty-the-economy-and-policy.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2016/enabling-the-fintech-transformation-revolution-restoration-or-reformation.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2016/the-sustainable-development-goal-imperative.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2016/opening-remarks-by-mark-carney-to-the-empowering-productivity-harnessing-the-talents.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2016/redeeming-an-unforgiving-world.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2016/the-turn-of-the-year.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2015/opening-statement-at-the-european-parliaments-econ-committee.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2015/closing-remarks-to-the-boe-open-forum.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2015/introduction-to-the-open-forum.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2015/the-european-union-monetary-and-financial-stability-and-the-boe.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2015/breaking-the-tragedy-of-the-horizon-climate-change-and-financial-stability.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2015/three-truths-for-finance.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2015/inflation-in-a-globalised-world.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2015/from-lincoln-to-lothbury-magna-carta-and-the-boe.pdf",
        #"https://www.bankofengland.co.uk/speech/2015/inclusive-capitalism-conference-in-conversation-with-governor-mark-carney",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2015/building-real-markets-for-the-good-of-the-people.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2015/writing-the-path-back-to-target.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2015/one-bank-research-agenda-launch-conference.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2015/fortune-favours-the-bold.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2014/the-future-of-financial-reform.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2014/regulatory-work-underway-and-lessons-learned.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2014/putting-the-right-ideas-into-practice.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2014/mark-carney-speech-at-the-trades-union-congress.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2014/winning-the-economic-marathon.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2014/mark-carney-speech-at-the-lord-mayors-banquet-for-bankers-and-merchants-of-the-city-of-london.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2014/inclusive-capitalism-creating-a-sense-of-the-systemic.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2014/one-mission-one-bank-promoting-the-good-of-the-people-of-the-uk.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2014/the-economics-of-currency-unions.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2014/remarks-given-by-mark-carney-at-davos-cbi-british-business-leaders-lunch.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2013/remarks-given-by-mark-carney-governor-regarding-polymer-notes-and-the-review-of-the-banknote-charact.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2013/the-spirit-of-the-season.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2013/the-uk-at-the-heart-of-a-renewed-globalisation.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2013/crossing-the-threshold-to-recovery.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2013/jane-austens-house-museum-remarks-by-mark-carney.pdf"
]

In [10]:
# Scraping PDFs from URLs
for url in urls:
    response = requests.get(url)
    if response.status_code == 200:
        file_path = os.path.join(directory_path, os.path.basename(url))
        with open(file_path, "wb") as f:
            f.write(response.content)

# **Text Extraction**

In [11]:
# Create a list of PDF file names and text file names
pdf_list = os.listdir(directory_path)
txt_list = [pdf[:-4] + ".txt" for pdf in pdf_list]

# Extracting text and saving output in dictionary
for i in range(0, len(pdf_list)):
    out = open(txt_list[i], "wt")  # open text output
    with pdfplumber.open(os.fsdecode(pdf_list[i])) as pdf:
        for pdf_page in pdf.pages:
            page_text = pdf_page.extract_text()
            out.write(page_text)
        out.close()

FileNotFoundError: [Errno 2] No such file or directory: 'path_to_pdf_directory'

In [None]:
# Create a list of speech titles without any suffix
speech = [pdf[:-4] for pdf in pdf_list]

# Read text files as CSV files into a dictionary
speech_1 = {}

for i in range(0, len(speech)):
    speech_1[speech[i]] = pd.read_csv(
        txt_list[i], delimiter = "\n", names = ["text"]
    )
    # concatenate all rows into one row
    speech_1[speech[i]]["text"] = speech_1[speech[i]]["text"].str.cat(sep = " ")
    # dataframe now redundant, replace it with one of the rows
    speech_1[speech[i]] = speech_1[speech[i]]["text"][0]
    # convert string to lowercases
    speech_1[speech[i]] = speech_1[speech[i]].lower()
    # translating unicode strings into normal characters
    speech_1[speech[i]] = unicodedata.normalize("NFKD", speech_1[speech[i]])
    # remove punctuation symbols
    speech_1[speech[i]] = speech_1[speech[i]].translate(
        str.maketrans(
            "", "",string.punctuation[:12] + string.punctuation[13:]
        )
    )
    # remove excess whitespaces
    speech_1[speech[i]] = re.sub(" +", " ", speech_1[speech[i]])


# Remove repeating strings
    del_strings = [
        "all speeches are available online at",
        "all speeches are available online",
        "all speeches available online",
        "speeches are available online at",
        "speeches available online at",
        "speeches available online",
        "wwwbankofenglandcoukpublicationspagesspeechesdefaultaspx",
        "wwwbankofenglandcoukpublicationsspeeches",
        "wwwbankofenglandcouknewsspeeches",
        "wwwbankofenglandcoukspeeches",
        "boepressoffice",
        "remarks by",
        "speech given by",
        "mark carney",
        "the views are not necessarily those of the bank of england or the monetary policy committee",
        "the views are not necessarily those of the bank of england or the financial policy committee",
        "the views expressed within are not necessarily those of the bank of england or the monetary policy committee",
        "the views expressed within are not necessarily those of the bank of england or the financial policy committee",
        "i would like to thank",
        "and the staff of the bank’s archives",
        "for comments and contributions",
        "for their comments and contributions",
        "et al"
    ]

for i in range(0, len(speech)):
    for j in range(0, len(del_strings)):
        # speech_1[speech[i]] = speech_1[speech[i]].replace(del_strings[j], "")
        speech_1[speech[i]] = re.sub(del_strings[j], " ", speech_1[speech[i]])
        # remove excess whitespaces
        speech_1[speech[i]] = re.sub(" +", " ", speech_1[speech[i]])


# Remove refences section
for i in range(0, len(speech)):
    if re.search("references", speech_1[speech[i]]):
        # use greedy regex to remove all strings after the last "references"
        speech_1[speech[i]] = re.findall(".*references", speech_1[speech[i]])
        speech_1[speech[i]] = speech_1[speech[i]][0]
    else:
        pass



In [22]:
nltk.download("punkt")
nltk.download("stopwords")

# Tokenise speech words
speech_2 = {}
for i in range(0, len(speech)):
    speech_2[speech[i]] = word_tokenize(speech_1[speech[i]])


# Remove stop words
stopwords = nltk.corpus.stopwords.words("english")

for i in range(0, len(speech)):
    speech_2[speech[i]] = [word for word in speech_2[speech[i]] if word not in stopwords]
    speech_2[speech[i]] = " ".join(speech_2[speech[i]])

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [30]:
# Combine into DataFrames
speeches = pd.DataFrame({"speech": speech, "text": list(speech_1.values()), "token": list(speech_2.values())})
temp = pd.DataFrame({"speech": speech_html, "text": list(speech_3.values()), "token": list(speech_3.values())})

# Concatenate both dataframes
speeches = pd.concat([speeches, temp], ignore_index=True)
speeches

Unnamed: 0,speech,text,token
0,,inclusive capitalism conference conversation g...,inclusive capitalism conference conversation g...
