<a href="https://colab.research.google.com/github/sheldonkemper/bank_of_england/blob/main/main.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
"""
===================================================
Team Name: Quant Collective
Author/s: Sheldon Kemper, Rita, Kasia, Chiaki, Oscar, Arijit

LinkedIn Profiles:
    Sheldon: https://www.linkedin.com/in/sheldon-kemper
    Rita: [Insert Rita's LinkedIn URL]
    Kasia: [Insert Kasia's LinkedIn URL]
    Chiaki: [Insert Chiaki's LinkedIn URL]
    Oscar: [Insert Oscar's LinkedIn URL]
    Arijit: [Insert Arijit's LinkedIn URL]

Date: 2025-02-04
Version: 1.1

Description:
    This notebook serves as the central orchestrator of our end-to-end NLP pipeline,
    which has been developed to transform unstructured quarterly announcements and Q&A
    transcripts from Global Systemically Important Banks (G-SIBs) into actionable insights
    for the Bank of England. The processes integrated into this pipeline are:

    1. Data Collection & Preprocessing:
       - Ingestion of raw data from multiple sources (e.g., PDFs, HTML, transcripts)
       - Data cleaning, noise removal, and formatting standardization
       - Initial Exploratory Data Analysis (EDA) to understand data characteristics

    2. Methodology & Modelling:
       - Topic Modelling with BERTopic to extract latent themes and topics
       - Sentiment Analysis with FinBERT to gauge market sentiment within transcript segments
       - Summarisation Pipeline to generate concise summaries from lengthy texts

    3. Integration & Pipeline Development:
       - Sequential execution of the above processes to ensure a cohesive workflow
       - Handling of inter-process dependencies and data hand-offs
       - Iterative refinements based on challenges and model performance evaluations

    4. Results and Reporting:
       - Aggregation of model outputs, key findings, and visualisations
       - Generation of actionable insights and business recommendations for risk assessment

This collaborative effort demonstrates the combined expertise of Quant Collective in building robust,
scalable data engineering solutions tailored for complex financial datasets.

===================================================
"""



In [1]:
!wget -q https://raw.githubusercontent.com/sheldonkemper/bank_of_england/refs/heads/main/notebooks/import/sk_import_PDF.ipynb
!wget -q https://raw.githubusercontent.com/sheldonkemper/bank_of_england/refs/heads/main/notebooks/cleansed/sk_processed_ubs.ipynb
!wget -q https://raw.githubusercontent.com/sheldonkemper/bank_of_england/refs/heads/main/notebooks/cleansed/sk_processed_jpmorgan.ipynb
!wget -q https://raw.githubusercontent.com/sheldonkemper/bank_of_england/refs/heads/main/notebooks/cleansed/kk_eda.ipynb


In [2]:
import os
from google.colab import drive
# Mount Google Drive to the root location with force_remount
drive.mount('/content/drive', force_remount=True)

# Assuming 'BOE' folder is in 'MyDrive' and already shared
BOE_path = '/content/drive/MyDrive/BOE/bank_of_england/data'

# Now you (and others with access) can work with files in this directory
# For example, you can list the contents:
print(os.listdir(BOE_path))

Mounted at /content/drive
['raw', 'jpm_presentation_final.csv', 'cleansed', 'model_outputs', 'preprocessed_data']


# 1. Data Collection & Preprocessing

## Ingestion of raw data from multiple sources

In [None]:
sk_bank_17 = "ubs"
%run sk_import_PDF.ipynb

Mounted at /content/drive
The value of my_variable is: ubs

Summary of Downloads:
('https://www.ubs.com/global/en/investor-relations/financial-information/quarterly-reporting/qr-shared/2023/4q23/_jcr_content/mainpar/toplevelgrid_copy_co/col1/linklistreimagined_c/link_984441358_copy_.1148964796.file/PS9jb250ZW50L2RhbS9hc3NldHMvY2MvaW52ZXN0b3ItcmVsYXRpb25zL3F1YXJ0ZXJsaWVzLzIwMjMvNHEyMy80cTIzLWVhcm5pbmdzLWNhbGwtcmVtYXJrcy5wZGY%3D/4q23-earnings-call-remarks.pdf', 'exists', '/content/drive/MyDrive/BOE/bank_of_england/data/raw/ubs/4q23-earnings-call-remarks.pdf')
('https://www.ubs.com/global/en/investor-relations/financial-information/quarterly-reporting/qr-shared/2023/2q23/_jcr_content/mainpar/toplevelgrid_copy_co/col1/linklistreimagined_c/link_2038370922_copy.1634234040.file/PS9jb250ZW50L2RhbS9hc3NldHMvY2MvaW52ZXN0b3ItcmVsYXRpb25zL3F1YXJ0ZXJsaWVzLzIwMjMvMnEyMy8ycTIzLWVhcm5pbmdzLWNhbGwtcmVtYXJrcy5wZGY=/2q23-earnings-call-remarks.pdf', 'exists', '/content/drive/MyDrive/BOE/bank_of_england/da

In [None]:
sk_bank_17 = "jpmorgan"
%run sk_import_PDF.ipynb

Mounted at /content/drive
The value of my_variable is: jpmorgan

Summary of Downloads:
('https://www.jpmorganchase.com/content/dam/jpmc/jpmorgan-chase-and-co/investor-relations/documents/quarterly-earnings/2023/4th-quarter/jpm-4q23-earnings-call-transcript.pdf', 'exists', '/content/drive/MyDrive/BOE/bank_of_england/data/raw/jpmorgan/jpm-4q23-earnings-call-transcript.pdf')
('https://www.jpmorganchase.com/content/dam/jpmc/jpmorgan-chase-and-co/investor-relations/documents/quarterly-earnings/2023/3rd-quarter/jpm-3q23-earnings-call-transcript.pdf', 'exists', '/content/drive/MyDrive/BOE/bank_of_england/data/raw/jpmorgan/jpm-3q23-earnings-call-transcript.pdf')
('https://www.jpmorganchase.com/content/dam/jpmc/jpmorgan-chase-and-co/investor-relations/documents/quarterly-earnings/2024/1st-quarter/jpm-1q24-earnings-call-transcript.pdf', 'exists', '/content/drive/MyDrive/BOE/bank_of_england/data/raw/jpmorgan/jpm-1q24-earnings-call-transcript.pdf')
('https://www.jpmorganchase.com/content/dam/jpmc/

## Data cleaning, noise removal, and formatting standardization

## Initial Data cleaning

In [None]:
%run sk_processed_ubs.ipynb

Mounted at /content/drive
Processing file: 1q23-earnings-call-remarks.pdf
Processing file: 1q24-earnings-call-remarks.pdf
Processing file: 2q23-earnings-call-remarks.pdf
Processing file: 2q24-earnings-call-remarks.pdf
Processing file: 3q23-earnings-call-remarks.pdf
Processing file: 3q24-earnings-call-remarks.pdf
Processing file: 4q23-earnings-call-remarks.pdf
Processing file: 4q24-earnings-call-remarks.pdf
Management announcements saved to: /content/drive/MyDrive/BOE/bank_of_england/data/cleansed/ubs_management_discussion.csv
Q&A section saved to: /content/drive/MyDrive/BOE/bank_of_england/data/cleansed/ubs_qna_section.csv


In [None]:
%run sk_processed_jpmorgan.ipynb

Mounted at /content/drive
Processing file: /content/drive/My Drive/BOE/bank_of_england/data/raw/jpmorgan/1q23-earnings-transcript.pdf
Processed file: 1q23-earnings-transcript.pdf
Processing file: /content/drive/My Drive/BOE/bank_of_england/data/raw/jpmorgan/2q23-earnings-transcript.pdf
Processed file: 2q23-earnings-transcript.pdf
Processing file: /content/drive/My Drive/BOE/bank_of_england/data/raw/jpmorgan/4q24-earnings-transcript.pdf
Processed file: 4q24-earnings-transcript.pdf
Processing file: /content/drive/My Drive/BOE/bank_of_england/data/raw/jpmorgan/jpm-1q24-earnings-call-transcript.pdf
Processed file: jpm-1q24-earnings-call-transcript.pdf
Processing file: /content/drive/My Drive/BOE/bank_of_england/data/raw/jpmorgan/jpm-2q24-earnings-call-transcript-final.pdf
Processed file: jpm-2q24-earnings-call-transcript-final.pdf
Processing file: /content/drive/My Drive/BOE/bank_of_england/data/raw/jpmorgan/jpm-3q23-earnings-call-transcript.pdf
Processed file: jpm-3q23-earnings-call-trans

## Preprocessing

## Initial Exploratory Data Analysis (EDA) to understand data characteristics

In [3]:
%run kk_eda.ipynb

Mounted at /content/drive
['jpmorgan_qna_df_preprocessed_final.csv', 'jpmorgan_management_discussion.csv', 'jpmorgan_qna preprocessed.csv', 'archived', 'ubs_qa_df_preprocessed_ver2.csv']

📌 **File: jpmorgan_qna_df_preprocessed_final.csv**
['Index', 'Quarter-Year', 'Question', 'Question_cleaned', 'Asked By', 'Role of the person asked the question', 'Answer', 'Answer_cleaned', 'Answered By', 'Role of the person answered the question']

⚠️ Could not read jpmorgan_management_df_preprocessed_final.csv: [Errno 2] No such file or directory: '/content/drive/MyDrive/BOE/bank_of_england/data/preprocessed_data/jpmorgan_management_df_preprocessed_final.csv'


FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/MyDrive/BOE/bank_of_england/data/preprocessed_data/jpmorgan_management_df_preprocessed_final.csv'

FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/MyDrive/BOE/bank_of_england/data/preprocessed_data/jpmorgan_management_df_preprocessed_final.csv'

# 2. Methodology & Modelling

## Topic Modelling with BERTopic to extract latent themes and topics

## Sentiment Analysis with FinBERT to gauge market sentiment within transcript segments

## Summarisation Pipeline to generate concise summaries from lengthy texts

# 3. Integration & Pipeline Development

## Sequential execution of the above processes to ensure a cohesive workflow

## Handling of inter-process dependencies and data hand-offs

## Iterative refinements based on challenges and model performance evaluations

# 4. Results and Reporting

## Aggregation of model outputs, key findings, and visualisations

## Generation of actionable insights and business recommendations for risk assessment