<a href="https://colab.research.google.com/github/sheldonkemper/bank_of_england/blob/main/main.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
"""
===================================================
Author/s: Sheldon Kemper, Rita, Kasia, Chiaki, Oscar, Arijit

LinkedIn Profiles:
    Sheldon: https://www.linkedin.com/in/sheldon-kemper
    Rita: [Insert Rita's LinkedIn URL]
    Kasia: [Insert Kasia's LinkedIn URL]
    Chiaki: [Insert Chiaki's LinkedIn URL]
    Oscar: [Insert Oscar's LinkedIn URL]
    Arijit: [Insert Arijit's LinkedIn URL]

Date: 2025-02-04
Version: 1.1

Description:
    This notebook serves as the main pipeline runner for our team project. It orchestrates the
    complete data engineering workflow by integrating all data and models into a cohesive execution
    flow. The pipeline downloads and executes individual notebooks for importing raw PDF data and
    processing UBS-related information. It leverages data ingestion from Google Drive combined with
    advanced processing techniques to facilitate a robust Retrieval-Augmented Generation (RAG) system.
    This collaborative effort supports downstream tasks such as text extraction, embedding generation,
    and contextual query handling, showcasing the collective expertise of our team.

===================================================
"""

In [12]:
!wget -q https://raw.githubusercontent.com/sheldonkemper/bank_of_england/main/notebooks/import/sk_import_PDF.ipynb
!wget -q https://raw.githubusercontent.com/sheldonkemper/bank_of_england/refs/heads/main/notebooks/cleansed/sk_processed_ubs.ipynb


In [13]:
import os
from google.colab import drive
# Mount Google Drive to the root location with force_remount
drive.mount('/content/drive', force_remount=True)

# Assuming 'BOE' folder is in 'MyDrive' and already shared
BOE_path = '/content/drive/MyDrive/BOE/bank_of_england/data'

# Now you (and others with access) can work with files in this directory
# For example, you can list the contents:
print(os.listdir(BOE_path))

Mounted at /content/drive
['cleansed', 'preprocessed_data', 'raw', 'model_outputs', 'processed']


In [14]:
%run sk_import_PDF.ipynb

Mounted at /content/drive
['cleansed', 'preprocessed_data', 'raw', 'model_outputs', 'processed']
('https://www.jpmorganchase.com/content/dam/jpmc/jpmorgan-chase-and-co/investor-relations/documents/quarterly-earnings/2023/3rd-quarter/jpm-3q23-earnings-call-transcript.pdf', 'exists', '/content/drive/MyDrive/BOE/bank_of_england/data/raw/jpmorgan/jpm-3q23-earnings-call-transcript.pdf')
('https://www.jpmorganchase.com/content/dam/jpmc/jpmorgan-chase-and-co/investor-relations/documents/quarterly-earnings/2023/4th-quarter/jpm-4q23-earnings-call-transcript.pdf', 'exists', '/content/drive/MyDrive/BOE/bank_of_england/data/raw/jpmorgan/jpm-4q23-earnings-call-transcript.pdf')
('https://www.jpmorganchase.com/content/dam/jpmc/jpmorgan-chase-and-co/investor-relations/documents/quarterly-earnings/2024/2nd-quarter/jpm-2q24-earnings-call-transcript-final.pdf', 'exists', '/content/drive/MyDrive/BOE/bank_of_england/data/raw/jpmorgan/jpm-2q24-earnings-call-transcript-final.pdf')
('https://www.jpmorganchase

In [15]:
%run sk_processed_ubs.ipynb

Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m74.7 MB/s[0m eta [36m0:00:00[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.
Mounted at /content/drive
['cleansed', 'preprocessed_data', 'raw', 'model_outputs', 'processed']
Processing file: 1q23-earnings-call-remarks.pdf
Processing file: 1q24-earnings-call-remarks.pdf
Processing file: 2q23-earnings-call-remarks.pdf
Processing file: 2q24-earnings-call-remarks.pdf
Processing file: 3q23-earnings-call-remarks.pdf
Proc