<a href="https://colab.research.google.com/github/simranbains9810/mark_carney_speech_analysis/blob/main/speech_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Key risk identification from Mark Carney speeches**

Installing and updating the text mining library pdfplumber into the local Colab environment

In [1]:
!pip install --upgrade pdfplumber



# **Web scraping, text extraction and text pre-processing Mark Carney speeches**

This section outlies the comprehensive text mining operations executed to analyse Mark Carney's speeches. These operations encompass several stages, including scripts for web scraping to gather speeches from the Bank of England's website, text extraction, and conversion to plain text format. Further preprocessing steps involve word tokenization and the removal of stop words to prepare the data for analysis. The speeches were specifically sourced from the Bank of England's website by applying filters to isolate those given by Mark Carney.

In [2]:
# Define the directory path
import os
directory_path = "/content/speeches"
os.mkdir(directory_path)

FileExistsError: [Errno 17] File exists: '/content/speeches'

In [3]:
%cd speeches

/content/speeches


In [4]:
import requests
urls = ["https://www.bankofengland.co.uk/-/media/boe/files/speech/2020/the-grand-unifying-theory-and-practice-of-macroprudential-policy-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2020/the-road-to-glasgow-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/speech/2020/mark-carney-opening-remarks-at-the-future-of-inflation-targeting-conference",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/remarks-by-mark-carney-at-the-ecb-farewell-board-dinner-for-benoit-coeure.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/remarks-by-mark-carney-at-the-us-climate-action-centre-madrid.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/addressing-the-growing-challenges-in-the-international-monetary-and-financial-system-slides.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/light-is-therefore-colour-governor-remarks-at-the-new-20-launch.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/tcfd-strengthening-the-foundations-of-sustainable-finance-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/remarks-given-during-the-un-secretary-generals-climate-actions-summit-2019-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/the-growing-challenges-for-monetary-policy-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/speech/2019/50-note-character-selection-announcement",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/sea-change-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/enable-empower-ensure-a-new-finance-for-the-new-economy-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/remarks-to-open-policy-panel-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/finance-by-all-for-all-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/pull-push-pipes-sustainable-capital-flows-for-a-new-world-order-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/a-platform-for-innovation-remarks-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/investing-in-ethnicity-and-race-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/speech/2019/mark-carney-speech-at-european-commission-high-level-conference-brussels",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2019/the-global-outlook-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/remarks-at-the-accounting-for-sustainability-summit-2018.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/50-character-selection-and-future-forum-launch.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/ai-and-the-global-economy-mark-carney-slides.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/true-finance-ten-years-after-the-financial-crisis-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/the-future-of-work-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/from-protectionism-to-prosperity-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/new-economy-new-finance-new-bank-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/guidance-contingencies-and-brexit-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/staying-connected-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/opening-remarks-by-mark-carney-at-the-econome-launch-event.pdf",
        "https://www.bankofengland.co.uk/speech/2018/mark-carney-speech-at-the-public-policy-forum-toronto",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/a-transition-in-thinking-and-action-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/the-future-of-money-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2018/reflections-on-leadership-in-a-disruptive-age-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2017/turning-back-the-tide-speech-by-mark-carney.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2017/opening-remarks-at-future-forum-2017.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2017/opening-remarks-at-the-boe-independence-20-years-on-conference.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2017/de-globalisation-and-inflation.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2017/policy-panel-investment-and-growth-in-advanced-economies.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2017/a-fine-balance.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2017/what-a-difference-a-decade-makes.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2017/building-the-infrastructure-to-realise-fintechs-promise.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2017/the-high-road-to-a-responsible-open-financial-system.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2017/banking-standards-board-worthy-of-trust-law-ethics-and-culture-in-banking.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2017/reflecting-diversity-choosing-the-inclusion.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2017/the-promise-of-fintech-something-new-under-the-sun.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2017/lambda.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2016/remarks-on-the-launch-of-the-recommendations-of-the-task-force-on-climate-related.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2016/the-spectre-of-monetarism.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2016/resolving-the-climate-paradox.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2016/uncertainty-the-economy-and-policy.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2016/enabling-the-fintech-transformation-revolution-restoration-or-reformation.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2016/the-sustainable-development-goal-imperative.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2016/opening-remarks-by-mark-carney-to-the-empowering-productivity-harnessing-the-talents.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2016/redeeming-an-unforgiving-world.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2016/the-turn-of-the-year.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2015/opening-statement-at-the-european-parliaments-econ-committee.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2015/closing-remarks-to-the-boe-open-forum.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2015/introduction-to-the-open-forum.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2015/the-european-union-monetary-and-financial-stability-and-the-boe.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2015/breaking-the-tragedy-of-the-horizon-climate-change-and-financial-stability.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2015/three-truths-for-finance.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2015/inflation-in-a-globalised-world.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2015/from-lincoln-to-lothbury-magna-carta-and-the-boe.pdf",
        #"https://www.bankofengland.co.uk/speech/2015/inclusive-capitalism-conference-in-conversation-with-governor-mark-carney",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2015/building-real-markets-for-the-good-of-the-people.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2015/writing-the-path-back-to-target.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2015/one-bank-research-agenda-launch-conference.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2015/fortune-favours-the-bold.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2014/the-future-of-financial-reform.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2014/regulatory-work-underway-and-lessons-learned.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2014/putting-the-right-ideas-into-practice.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2014/mark-carney-speech-at-the-trades-union-congress.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2014/winning-the-economic-marathon.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2014/mark-carney-speech-at-the-lord-mayors-banquet-for-bankers-and-merchants-of-the-city-of-london.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2014/inclusive-capitalism-creating-a-sense-of-the-systemic.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2014/one-mission-one-bank-promoting-the-good-of-the-people-of-the-uk.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2014/the-economics-of-currency-unions.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2014/remarks-given-by-mark-carney-at-davos-cbi-british-business-leaders-lunch.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2013/remarks-given-by-mark-carney-governor-regarding-polymer-notes-and-the-review-of-the-banknote-charact.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2013/the-spirit-of-the-season.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2013/the-uk-at-the-heart-of-a-renewed-globalisation.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2013/crossing-the-threshold-to-recovery.pdf",
        "https://www.bankofengland.co.uk/-/media/boe/files/speech/2013/jane-austens-house-museum-remarks-by-mark-carney.pdf"
]

In [5]:
# Scraping PDFs from URLs
for url in urls:
    response = requests.get(url)
    if response.status_code == 200:
        file_path = os.path.join(directory_path, os.path.basename(url))
        with open(file_path, "wb") as f:
            f.write(response.content)

# **Text Extraction**

In [6]:
#@title Text extraction from PDFs
import pdfplumber

# Create a list of PDF file names and text file names
pdf_list = os.listdir(directory_path)
txt_list = [pdf[:-4] + ".txt" for pdf in pdf_list]

# Extracting text and saving output in dictionary
for i in range(0, len(pdf_list)):
    out = open(txt_list[i], "wt")  # open text output
    with pdfplumber.open(os.fsdecode(pdf_list[i])) as pdf:
        for pdf_page in pdf.pages:
            page_text = pdf_page.extract_text()
            out.write(page_text)
        out.close()

In [31]:
import pandas as pd  # data processing, CSV file I/O
import re            # regular expression operations
import unicodedata
import spacy         # modern NLP library

# Load the spaCy model
nlp = spacy.load("en_core_web_sm")

# Create a list of speech titles without any suffix
speech = [pdf[:-4] for pdf in pdf_list]

# Read text files as CSV files into a dictionary
d1 = {}

for i in range(len(speech)):
    # Read the text file
    d1[speech[i]] = pd.read_csv(
        txt_list[i], delimiter="\n", names=["text"]
    )
    # Concatenate all rows into one string
    d1[speech[i]]["text"] = d1[speech[i]]["text"].str.cat(sep=" ")
    # Replace the DataFrame with the text content
    d1[speech[i]] = d1[speech[i]]["text"]

    # Pass text through spaCy NLP pipeline for normalization and processing
    doc = nlp(d1[speech[i]])

    # Normalize and clean text
    tokens = [
        token.text.lower() for token in doc
        if not token.is_stop and not token.is_punct
    ]
    d1[speech[i]] = " ".join(tokens)

# Remove repeating strings
del_strings = [
    "all speeches are available online at",
    "all speeches are available online",
    "all speeches available online",
    "speeches are available online at",
    "speeches available online at",
    "speeches available online",
    "wwwbankofenglandcoukpublicationspagesspeechesdefaultaspx",
    "wwwbankofenglandcoukpublicationsspeeches",
    "wwwbankofenglandcouknewsspeeches",
    "wwwbankofenglandcoukspeeches",
    "boepressoffice",
    "remarks by",
    "speech given by",
    "mark carney",
    "the views are not necessarily those of the bank of england or the monetary policy committee",
    "the views are not necessarily those of the bank of england or the financial policy committee",
    "the views expressed within are not necessarily those of the bank of england or the monetary policy committee",
    "the views expressed within are not necessarily those of the bank of england or the financial policy committee",
    "i would like to thank",
    "and the staff of the bank’s archives",
    "for comments and contributions",
    "for their comments and contributions",
    "et al",
]

for i in range(len(speech)):
    for pattern in del_strings:
        d1[speech[i]] = re.sub(pattern, " ", d1[speech[i]])
    # Remove excess whitespace
    d1[speech[i]] = re.sub(r"\s+", " ", d1[speech[i]])

# Remove references section
for i in range(len(speech)):
    if "references" in d1[speech[i]]:
        # Use greedy regex to remove everything after the last "references"
        d1[speech[i]] = re.split(r"references", d1[speech[i]])[0]


In [34]:
# Tokenize and remove stop words using spaCy
d2 = {}

for i in range(len(speech)):
    # Process the text with spaCy
    doc = nlp(d1[speech[i]])

    # Extract tokens, removing stop words and punctuation
    tokens = [token.text for token in doc if not token.is_stop and not token.is_punct]

    # Store the cleaned and tokenized text
    d2[speech[i]] = " ".join(tokens)

# Print tokenized and cleaned speeches
for key, text in d2.items():
    print(f"Tokenized and processed text for {key}:\n{text}\n")

In [35]:
import spacy
from bs4 import BeautifulSoup
import requests
import re
import unicodedata

# Load the spaCy model
nlp = spacy.load("en_core_web_sm")

# List of URLs
url_html = [
    "https://www.bankofengland.co.uk/speech/2015/inclusive-capitalism-conference-in-conversation-with-governor-mark-carney",
]

speech_html = []
for i in range(len(url_html)):
    speech_html.append(url_html[i].split("-mark-carney", 1)[1])

# Parsing HTML using requests and BeautifulSoup
soup = {}
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
}

for i, url in enumerate(url_html):
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        soup[i] = BeautifulSoup(response.text, features="html.parser")
    else:
        print(f"Failed to fetch {url} with status code {response.status_code}")

# Extracting and cleaning text
d3 = {}

for i in range(len(soup)):
    d3[speech_html[i]] = soup[i].get_text()
    lines = (line.strip() for line in d3[speech_html[i]].splitlines())
    chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
    d3[speech_html[i]] = " ".join(chunk for chunk in chunks if chunk)
    d3[speech_html[i]] = d3[speech_html[i]].lower()
    d3[speech_html[i]] = unicodedata.normalize("NFKD", d3[speech_html[i]])
    d3[speech_html[i]] = re.sub(r"[^\w\s]", "", d3[speech_html[i]])  # Remove punctuation
    d3[speech_html[i]] = re.sub(r"\s+", " ", d3[speech_html[i]])  # Remove excess whitespaces

# Tokenization and stop word removal using spaCy
processed_text = {}

for key, text in d3.items():
    doc = nlp(text)
    tokens = [token.text for token in doc if not token.is_stop and not token.is_punct]
    processed_text[key] = " ".join(tokens)

# Print the processed text
for key, text in processed_text.items():
    print(f"Processed text for {key}:\n{text}\n")


# Tokenize speech words using spaCy and remove stop words
d4 = {}

for key, text in d3.items():
    doc = nlp(text)
    # Extract tokens and remove stop words
    tokens = [token.text for token in doc if not token.is_stop and not token.is_punct]
    d4[key] = tokens

# Join tokens back into a single string for further processing
for key in d4:
    d4[key] = " ".join(d4[key])

# Print the tokenized and cleaned speeches
for key, text in d4.items():
    print(f"Tokenized and processed text for {key}:\n{text}\n")


Processed text for :
inclusive capitalism conference conversation governor mark carney bank england use cookies use necessary cookies site work example manage session d like use nonessential cookies including thirdparty cookies help improve site clicking accept recommended settings banner accept use optional cookies necessary cookies analytics cookies yes yes accept recommended cookies yes proceed necessary cookies necessary cookies necessary cookies enable core functionality website security network management accessibility disable changing browser settings affect website functions analytics cookies use analytics cookies track number visitors parts site understand website information cookies work cookie policy skip main content main menu close main menu topics open topics sub menu main menu banknotes open banknotes sub menu main menu choosing banknote characters counterfeit banknotes current banknotes damaged contaminated banknotes exchanging old banknotes note circulation scheme advi

In [36]:

#@title Create dataframe of speech titles, texts, and tokens
# Combine elements from d1 and d2 first
speeches = pd.DataFrame(data = {"speech": speech})
speeches["text"] = speeches["speech"].map(d1)
speeches["token"] = speeches["speech"].map(d2)

# Create a temporary dataframe that combines elements from d3 and d4
temp = pd.DataFrame(data = {"speech": speech_html})
temp["text"] = temp["speech"].map(d3)
temp["token"] = temp["speech"].map(d3)

# Merge both dataframes
speeches = pd.concat([speeches, temp], ignore_index = True)
speeches


Unnamed: 0,speech,text,token
0,,inclusive capitalism conference in conversatio...,inclusive capitalism conference in conversatio...
