<a href="https://colab.research.google.com/github/sheldonkemper/bank_of_england/blob/main/notebooks/modelling/rb_jomorgan_summarisation_tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **FLAN-T5-Large** is tested for text summarisation using JPMorgan Financial transcripts.

The model is applied to three different parts of the transcripts:

**1)** individual financial analyst questions

**2)** responses to these questions.

**Different prompts** are applied on the three points above to allow the extraction of tailored outputs that serve different purposes:

**1)** *prompt = f"Extract and summarize the key questions asked by analysts in the following text: {text}"*

**2)** *prompt = f"Summarize by extracting different statements the following text: {text}"*

**3)** *prompt = f"Rewrite the following text into a concise and original summary while maintaining its key ideas: {text}"*

**ROUGE scores**, which measure alignment with reference texts through precision, recall, and F-measure, are used, helping assess models performance.


In [None]:
!pip install bertopic umap-learn hdbscan sentence-transformers
!pip install transformers torch
!pip install rouge_score
!pip install evaluate
!pip install --upgrade protobuf
!pip install tensorboard

Collecting bertopic
  Downloading bertopic-0.16.4-py3-none-any.whl.metadata (23 kB)
Collecting umap-learn
  Downloading umap_learn-0.5.7-py3-none-any.whl.metadata (21 kB)
Collecting hdbscan
  Downloading hdbscan-0.8.40-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (15 kB)
Collecting pynndescent>=0.5 (from umap-learn)
  Downloading pynndescent-0.5.13-py3-none-any.whl.metadata (6.8 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12=

In [None]:
!pip install tensorflow
import tensorflow as tf
import numpy as np
import random



In [None]:
import time
import torch
from google.colab import drive
import os
import sys
import pandas as pd
import re
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer,Seq2SeqTrainingArguments, Seq2SeqTrainer, DataCollatorForSeq2Seq
import evaluate
from rouge_score import rouge_scorer
from typing import List, Union, Optional
import logging

import warnings
warnings.filterwarnings("ignore", category=UserWarning)

In [None]:
def reset_session():
    tf.keras.backend.clear_session()
    np.random.seed(42)
    random.seed(42)
    tf.random.set_seed(42)

In [None]:

# Load data (questions and answers for JPM and UBS)

drive.mount('/content/drive', force_remount=True)

In [None]:

path1 = "/content/drive/MyDrive/BOE/bank_of_england/data/preprocessed_data/jp_morgan_qna.csv"

path2 = "/content/drive/MyDrive/BOE/bank_of_england/data/preprocessed_data/ubs_qa_df_preprocessed_ver2.csv"

JP_qna = pd.read_csv(path1)
UBS_qna = pd.read_csv(path2)

In [None]:
JP_qna = JP_qna[JP_qna["Quarter"] != "1Q23"]
UBS_qna = UBS_qna[UBS_qna["Quarter"] != "1Q23"]

##**Q&A summarisation**

### **Analysing Jim Mitchell data Q2-2024 data**

In [None]:
reset_session()

In [None]:
filtered_df = df_qna[(df_qna["Analyst"] == 'Jim Mitchell')& (df_qna["Quarter"] == "2024-Q4")]

# Display results
print(filtered_df)

   Quarter       Analyst Analyst_Role  \
2  2024-Q4  Jim Mitchell      Analyst   

                                            Question      Executive  \
2  Hey. Good morning. Maybe just on regulation, w...  Jeremy Barnum   

  Executive_Role                                           Response Type  \
2            CFO  Hey, Jim. I mean, it's obviously something we'...  Q&A   

                                      Response_clean  \
2  hey, jim. i mean, it's obviously something we'...   

                                      Question_clean  \
2  hey. good morning. maybe just on regulation, w...   

                                       Answer_tokens  \
2  [hey, jim, i, mean, its, obviously, something,...   

                                     Question_tokens  
2  [Hey, Good, morning, Maybe, just, on, regulati...  


**Summarising questions**

In [None]:
analyst_q = filtered_df["Question"].tolist()  #### genertaing list for modeling

Updating code to reflect new modeling parameters, i.e. prompt and output size, for questions summarisation

In [None]:

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class TextSummarizer:
    def __init__(self, model_name: str = "google/flan-t5-large", device: Optional[str] = None):
        """Initialize the summarizer with model and device."""
        self.device = device or ('cuda' if torch.cuda.is_available() else 'cpu')
        logger.info(f"Using device: {self.device}")

        try:
            self.tokenizer = AutoTokenizer.from_pretrained(model_name)
            self.model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(self.device)
            logger.info(f"Successfully loaded {model_name}")
        except Exception as e:
            logger.error(f"Error loading model: {str(e)}")
            raise

    def chunk_text(self,
                  text: Union[str, List[str]],
                  chunk_size: int = 400,
                  overlap: int = 50) -> List[str]:
        """Split text into overlapping chunks."""
        if chunk_size <= 0 or overlap < 0 or overlap >= chunk_size:
            raise ValueError("Invalid chunk_size or overlap parameters")

        try:
            if isinstance(text, list):
                text = " ".join(text)

            if not text.strip():
                return []

            words = text.split()
            chunks = []
            start = 0

            while start < len(words):
                end = min(start + chunk_size, len(words))
                chunk = " ".join(words[start:end])
                chunks.append(chunk)
                start += chunk_size - overlap

            logger.debug(f"Split text into {len(chunks)} chunks")
            return chunks

        except Exception as e:
            logger.error(f"Error in chunk_text: {str(e)}")
            raise

    def summarize_text_q(self,
                      text: str,
                      min_new_tokens: int = 50,
                      max_new_tokens: int = 250) -> str:
        """Summarize a single piece of text focusing on analyst questions."""
        if pd.isna(text) or not text.strip():
            logger.warning("Empty or NaN text provided")
            return ""

        try:
            # Specialized prompt for analyst questions
            prompt = f"Extract and summarize the key questions asked by analysts in the following text: {text}"

            inputs = self.tokenizer(
                prompt,
                return_tensors="pt",
                truncation=True,
                max_length=512
            ).to(self.device)

            with torch.no_grad():
                outputs = self.model.generate(
                    inputs.input_ids,
                    min_new_tokens=min_new_tokens,
                    max_new_tokens=max_new_tokens,
                    num_beams=4,
                    length_penalty=2.0,
                    no_repeat_ngram_size=3,
                    early_stopping=True,
                    do_sample=False
                )

            summary = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
            return summary

        except Exception as e:
            logger.error(f"Error in summarize_text: {str(e)}")
            raise

    def summarize_long_text_q(self,
                          text: Union[str, List[str]],
                          chunk_size: int = 400,
                          overlap: int = 50) -> str:
        """Handle long text summarization focusing on analyst questions."""
        try:
            chunks = self.chunk_text(text, chunk_size, overlap)
            if not chunks:
                logger.warning("No valid chunks to summarize")
                return ""

            chunk_summaries = []
            for i, chunk in enumerate(chunks):
                logger.debug(f"Summarizing chunk {i+1}/{len(chunks)}")
                summary = self.summarize_text_q(chunk)
                if summary.strip():
                    chunk_summaries.append(summary)

            if not chunk_summaries:
                logger.warning("No valid summaries generated")
                return ""

            if len(chunk_summaries) == 1:
                return chunk_summaries[0]

            logger.debug("Generating final summary")
            final_summary = self.summarize_text_q(
                " ".join(chunk_summaries),
                min_new_tokens=150,
                max_new_tokens=300
            )

            return final_summary

        except Exception as e:
            logger.error(f"Error in summarize_long_text: {str(e)}")
            raise

# Running the model
try:

    summarizer_q = TextSummarizer()


    logger.info("Starting summarization of analyst questions")
    question_summary = summarizer_q.summarize_long_text_q(analyst_q)


    print("\nSummary of Analyst Questions:")
    print(question_summary)

except Exception as e:
    logger.error(f"Error during summarization: {str(e)}")
    print(f"An error occurred: {str(e)}")


Summary of Analyst Questions:
Q&A: What areas of the regulatory structure would be most impactful if it were to change? Is there any area where capital requirements could actually go down? Are there any areas where requirements just simply stop going up? Are you starting to see any improvement in demand on lending?


**Adding ROUGE score for valuation**

In [None]:
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)

In [None]:
analyst_q_str = " ".join(analyst_q)

# Calculate ROUGE scores
scores_q = scorer.score(analyst_q_str, question_summary)
for key in scores_q:
    print(f'{key}: {scores_q[key]}')

rouge1: Score(precision=0.92, recall=0.34074074074074073, fmeasure=0.49729729729729727)
rouge2: Score(precision=0.7346938775510204, recall=0.26865671641791045, fmeasure=0.39344262295081966)
rougeL: Score(precision=0.8, recall=0.2962962962962963, fmeasure=0.43243243243243246)


ROUGE results reveal good model's ability to capture both the relevance and completeness of the content. The higher recall values, combined with solid precision, have resulted in strong F1 scores across all ROUGE metrics, demonstrating that the model is  producing comprehensive and accurate summaries.

**Summarising answers**

In [None]:
analyst_a = filtered_df["Response"].tolist()  #### genertaing list for modeling

In [None]:
reset_session()

Updating code to reflect new modeling parameters, i.e. prompt and output size, for questions summarisation

In [None]:
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class TextSummarizer:
    def __init__(self, model_name: str = "google/flan-t5-large", device: Optional[str] = None):
        """Initialize the summarizer with model and device."""
        self.device = device or ('cuda' if torch.cuda.is_available() else 'cpu')
        logger.info(f"Using device: {self.device}")

        try:
            self.tokenizer = AutoTokenizer.from_pretrained(model_name)
            self.model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(self.device)
            logger.info(f"Successfully loaded {model_name}")
        except Exception as e:
            logger.error(f"Error loading model: {str(e)}")
            raise

    def chunk_text(self,
                  text: Union[str, List[str]],
                  chunk_size: int = 400,
                  overlap: int = 50) -> List[str]:
        """Split text into overlapping chunks."""
        if chunk_size <= 0 or overlap < 0 or overlap >= chunk_size:
            raise ValueError("Invalid chunk_size or overlap parameters")

        try:
            if isinstance(text, list):
                text = " ".join(text)

            if not text.strip():
                return []

            words = text.split()
            chunks = []
            start = 0

            while start < len(words):
                end = min(start + chunk_size, len(words))
                chunk = " ".join(words[start:end])
                chunks.append(chunk)
                start += chunk_size - overlap

            logger.debug(f"Split text into {len(chunks)} chunks")
            return chunks

        except Exception as e:
            logger.error(f"Error in chunk_text: {str(e)}")
            raise

    def summarize_text(self,
                      text: str,
                      min_new_tokens: int = 100,
                      max_new_tokens: int = 500) -> str:
        """Summarize a single piece of text focusing on analyst questions."""
        if pd.isna(text) or not text.strip():
            logger.warning("Empty or NaN text provided")
            return ""

        try:
            # Specialized prompt for answer to analyst
            prompt = f"Summarize by extracting different statements the following text: {text}"

            inputs = self.tokenizer(
                prompt,
                return_tensors="pt",
                truncation=True,
                max_length=512
            ).to(self.device)

            with torch.no_grad():
                outputs = self.model.generate(
                    inputs.input_ids,
                    min_new_tokens=min_new_tokens,
                    max_new_tokens=max_new_tokens,
                    num_beams=4,
                    length_penalty=2.0,
                    no_repeat_ngram_size=3,
                    early_stopping=True,
                    do_sample=False
                )

            summary = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
            return summary

        except Exception as e:
            logger.error(f"Error in summarize_text: {str(e)}")
            raise

    def summarize_long_text(self,
                          text: Union[str, List[str]],
                          chunk_size: int = 400,
                          overlap: int = 50) -> str:
        """Handle long text summarization focusing on analyst questions."""
        try:
            chunks = self.chunk_text(text, chunk_size, overlap)
            if not chunks:
                logger.warning("No valid chunks to summarize")
                return ""

            chunk_summaries = []
            for i, chunk in enumerate(chunks):
                logger.debug(f"Summarizing chunk {i+1}/{len(chunks)}")
                summary = self.summarize_text(chunk)
                if summary.strip():
                    chunk_summaries.append(summary)

            if not chunk_summaries:
                logger.warning("No valid summaries generated")
                return ""

            if len(chunk_summaries) == 1:
                return chunk_summaries[0]

            logger.debug("Generating final summary")
            final_summary = self.summarize_text(
                " ".join(chunk_summaries),
                min_new_tokens=150,
                max_new_tokens=500
            )

            return final_summary

        except Exception as e:
            logger.error(f"Error in summarize_long_text: {str(e)}")
            raise

# Running the model
try:

    summarizer = TextSummarizer()


    logger.info("Starting summarization of answer")
    answer_summary = summarizer.summarize_long_text(analyst_a)


    print("\nSummary of Answer:")
    print(answer_summary)

except Exception as e:
    logger.error(f"Error during summarization: {str(e)}")
    print(f"An error occurred: {str(e)}")


Summary of Answer:
Jamie Dimon: What do you think about the regulatory framework for banks? Jim: We've been saying for a long time that we want a coherent, rational, holistically-assessed regulatory framework that allows banks to do their job, supporting the economy, that isn't reflexively anti-bank, that doesn't default to the answer every question being more of everything, more capital, more liquidity, that uses data and that balances the obvious goal that we all share of a safe and sound banking system with actually recognizing that banks play a critical role in supporting growth, and the hope is that we get some of that There's a bit of caution in some areas, but we'll see what the new year brings as the current optimism starts getting tested with reality, one way or the other, and you'll actually see that come through c&i loan growth in particular.


In [None]:
analyst_a_str = " ".join(analyst_a)

# Calculate ROUGE scores
scores_a = scorer.score(analyst_a_str, answer_summary)
for key in scores_a:
    print(f'{key}: {scores_a[key]}')

rouge1: Score(precision=0.9671052631578947, recall=0.31343283582089554, fmeasure=0.47342995169082125)
rouge2: Score(precision=0.8278145695364238, recall=0.2670940170940171, fmeasure=0.4038772213247173)
rougeL: Score(precision=0.9210526315789473, recall=0.29850746268656714, fmeasure=0.45088566827697263)


The model demonstrates strong precision across all ROUGE metrics, with a particularly high ROUGE-1 precision of 96.7%. However, recall values are relatively lower, indicating that while the model excels at accurately identifying relevant content, it may miss some aspects of the full text. Overall, the F1 scores suggest a moderate balance between precision and recall, with ROUGE-2 showing a solid performance in capturing key bigrams and ROUGE-L reflecting good overall summary structure.

### **Analysing John McDonald data Q2-2024 data**

In [None]:
filtered_df2 = df_qna[(df_qna["Analyst"] == 'John McDonald')& (df_qna["Quarter"] == "2024-Q4")]

print(filtered_df2)

   Quarter        Analyst                      Analyst_Role  \
0  2024-Q4  John McDonald  Analyst, Truist Securities, Inc.   

                                            Question      Executive  \
0  Hi. Good morning. Jeremy, I wanted to ask abou...  Jeremy Barnum   

  Executive_Role                                           Response Type  \
0            CFO  Yeah. Good question, John, and welcome back, b...  Q&A   

                                      Response_clean  \
0  yeah. good question, john, and welcome back, b...   

                                      Question_clean  \
0  hi. good morning. jeremy, i wanted to ask abou...   

                                       Answer_tokens  \
0  [yeah, good, question, john, and, welcome, bac...   

                                     Question_tokens  
0  [Hi, Good, morning, Jeremy, I, wanted, to, ask...  


**Summarising questions**

In [None]:
analyst_q2 = filtered_df2["Question"].tolist()  #### genertaing list for modeling

In [None]:
reset_session()

In [None]:
try:

    summarizer_q = TextSummarizer()


    logger.info("Starting summarization of analyst questions")
    question_summary2 = summarizer_q.summarize_long_text_q(analyst_q2)

    print("\nSummary of Analyst Questions:")
    print(question_summary2)

except Exception as e:
    logger.error(f"Error during summarization: {str(e)}")
    print(f"An error occurred: {str(e)}")


Summary of Analyst Questions:
Key questions asked by analysts: What's the framework for thinking about the opportunity cost of sitting on the growing base of capital and how high you might let that go versus your patience in waiting for more attractive deployment opportunities? When we think about the investment spend agenda this year, how does it differ from last year or last couple of years across lines of business?


In [None]:
analyst_q2_str = " ".join(analyst_q2)

# Calculate ROUGE scores
scores_q2 = scorer.score(analyst_q2_str, question_summary2)
for key in scores_a:
    print(f'{key}: {scores_a[key]}')

rouge1: Score(precision=0.5294117647058824, recall=0.24161073825503357, fmeasure=0.33179723502304154)
rouge2: Score(precision=0.16417910447761194, recall=0.07432432432432433, fmeasure=0.10232558139534885)
rougeL: Score(precision=0.35294117647058826, recall=0.1610738255033557, fmeasure=0.2211981566820276)


**Summarinsing answsers**

In [None]:
analyst_text2 = filtered_df2["Response"].tolist()  #### genertaing list for modeling

In [None]:
reset_session()

In [None]:
# Running the model
try:

    summarizer = TextSummarizer()


    logger.info("Starting summarization of answer")
    answer_summary2 = summarizer.summarize_long_text(analyst_text2)


    print("\nSummary of Answer:")
    print(answer_summary2)

except Exception as e:
    logger.error(f"Error during summarization: {str(e)}")
    print(f"An error occurred: {str(e)}")


Summary of Answer:
"We're going to try to run things, with some important exceptions that I'll highlight in a second, on roughly flat head count and have that lead to people generating internal efficiencies as they get creative with their teams," he said. "The obvious exceptions are the ongoing areas of high certainty investment and growth, so, obviously, branches and bankers, so on. and also, critical non-negotiable areas of risk and control like cyber or whatever independent risk management needs to ensure that we're running the company safely." He added that the company has been able to generate a little bit of efficiency over the last few years, and that's a testament to the bottoms-up culture that they've developed at the company.


In [None]:
analyst_text2_str = " ".join(analyst_text2)

# Calculate ROUGE scores
scores2 = scorer.score(analyst_text2_str, answer_summary2)
for key in scores2:
    print(f'{key}: {scores2[key]}')

rouge1: Score(precision=0.9603174603174603, recall=0.12683438155136267, fmeasure=0.22407407407407406)
rouge2: Score(precision=0.832, recall=0.10912906610703044, fmeasure=0.19294990723562153)
rougeL: Score(precision=0.7142857142857143, recall=0.09433962264150944, fmeasure=0.16666666666666669)


##**Tuning parameters and changing prompt to summarise Answers text**

In [None]:
reset_session()

**Changing prompt and updating chunks size.**

In [None]:
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


class TextSummarizer:
    def __init__(self, model_name: str = "google/flan-t5-large", device: Optional[str] = None):
        """Initialize the summarizer with model and device."""
        self.device = device or ('cuda' if torch.cuda.is_available() else 'cpu')
        logger.info(f"Using device: {self.device}")

        try:
            self.tokenizer = AutoTokenizer.from_pretrained(model_name)
            self.model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(self.device)
            logger.info(f"Successfully loaded {model_name}")
        except Exception as e:
            logger.error(f"Error loading model: {str(e)}")
            raise

    def chunk_text(self,
                  text: Union[str, List[str]],
                  chunk_size: int = 400,
                  overlap: int = 50) -> List[str]:
        """Split text into overlapping chunks."""

        if chunk_size <= 0 or overlap < 0 or overlap >= chunk_size:
            raise ValueError("Invalid chunk_size or overlap parameters")

        try:

            if isinstance(text, list):
                text = " ".join(text)

            if not text.strip():
                return []

            words = text.split()
            chunks = []
            start = 0

            while start < len(words):
                end = min(start + chunk_size, len(words))
                chunk = " ".join(words[start:end])
                chunks.append(chunk)
                start += chunk_size - overlap

            logger.debug(f"Split text into {len(chunks)} chunks")
            return chunks

        except Exception as e:
            logger.error(f"Error in chunk_text: {str(e)}")
            raise

    def summarize_text(self,
                      text: str,
                      min_new_tokens: int = 100,
                      max_new_tokens: int = 400) -> str:
        """Summarize a single piece of text."""

        if pd.isna(text) or not text.strip():
            logger.warning("Empty or NaN text provided")
            return ""

        try:
            prompt = f"Rewrite the following text into a concise and original summary while maintaining its key ideas: {text}"


            inputs = self.tokenizer(
                prompt,
                return_tensors="pt",
                truncation=True,
                max_length=512
            ).to(self.device)


            with torch.no_grad():
                outputs = self.model.generate(
                    inputs.input_ids,
                    min_new_tokens=min_new_tokens,
                    max_new_tokens=max_new_tokens,
                    num_beams=4,
                    length_penalty=2,
                    no_repeat_ngram_size=3,
                    early_stopping=True,
                    do_sample=False
                )

            summary = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
            return summary

        except Exception as e:
            logger.error(f"Error in summarize_text: {str(e)}")
            raise

    def summarize_long_text_2(self,
                          text: Union[str, List[str]],
                          chunk_size: int = 300,
                          overlap: int = 50) -> str:
        """Handle long text summarization."""
        try:
            # Get chunks
            chunks = self.chunk_text(text, chunk_size, overlap)
            if not chunks:
                logger.warning("No valid chunks to summarize")
                return ""

            # Summarize chunks
            chunk_summaries = []
            for i, chunk in enumerate(chunks):
                logger.debug(f"Summarizing chunk {i+1}/{len(chunks)}")
                summary = self.summarize_text(chunk)
                if summary.strip():
                    chunk_summaries.append(summary)

            if not chunk_summaries:
                logger.warning("No valid summaries generated")
                return ""


            if len(chunk_summaries) == 1:
                return chunk_summaries[0]

            # Summarize the combined summaries
            logger.debug("Generating final summary")
            final_summary = self.summarize_text(
                " ".join(chunk_summaries),
                min_new_tokens=150,
                max_new_tokens=300
            )

            return final_summary

        except Exception as e:
            logger.error(f"Error in summarize_long_text: {str(e)}")
            raise


In [None]:
# Running the model
try:

    summarizer = TextSummarizer()


    logger.info("Starting summarization of answer")
    answer_summary3 = summarizer.summarize_long_text_2(analyst_text2)


    print("\nSummary of Answer:")
    print(answer_summary2)

except Exception as e:
    logger.error(f"Error during summarization: {str(e)}")
    print(f"An error occurred: {str(e)}")

In [None]:
analyst_text3_str = " ".join(analyst_text2)

# Calculate ROUGE scores
scores2 = scorer.score(analyst_text3_str, answer_summary3)
for key in scores2:
    print(f'{key}: {scores2[key]}')