<a href="https://colab.research.google.com/github/sheldonkemper/bank_of_england/blob/main/notebooks/modelling/rb_jomorgan_summarisation_v4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **FLAN-T5-Large** is tested for text summarisation using JPMorgan Financial transcripts.

The model is applied to three different parts of the transcritps:

**1)** management discussion

**2)**individual financial analyst questions

**3)** responses to these questions.

**Different prompts** are applied on the three points above to allow the extraction of tailored outputs that serve different purposes:

**1)** *prompt = f"Generate a detailed and comprehensive summary that captures all key points: {text}"*

**2)** *prompt = f"Extract and summarize the key questions asked by analysts in the following text: {text}"*

**3)** *prompt = f"Summarize by extracting different statements the following text: {text}"*

**ROUGE scores**, which measure alignment with reference texts through precision, recall, and F-measure, are used, helping assess models performance.


In [1]:
!pip install bertopic umap-learn hdbscan sentence-transformers
!pip install transformers torch
!pip install rouge_score
!pip install evaluate
!pip install --upgrade protobuf
!pip install tensorboard

Collecting bertopic
  Downloading bertopic-0.16.4-py3-none-any.whl.metadata (23 kB)
Collecting umap-learn
  Downloading umap_learn-0.5.7-py3-none-any.whl.metadata (21 kB)
Collecting hdbscan
  Downloading hdbscan-0.8.40-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (15 kB)
Collecting pynndescent>=0.5 (from umap-learn)
  Downloading pynndescent-0.5.13-py3-none-any.whl.metadata (6.8 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12=

In [2]:
!pip install tensorflow
import tensorflow as tf
import numpy as np
import random



In [3]:
import time
import torch
from google.colab import drive
import os
import sys
import pandas as pd
import re
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer,Seq2SeqTrainingArguments, Seq2SeqTrainer, DataCollatorForSeq2Seq
import evaluate
from rouge_score import rouge_scorer
from typing import List, Union, Optional
import logging

import warnings
warnings.filterwarnings("ignore", category=UserWarning)

In [4]:
def reset_session():
    tf.keras.backend.clear_session()
    np.random.seed(42)
    random.seed(42)
    tf.random.set_seed(42)

In [5]:
# creating a pdf reader object
df_qna = pd.read_csv('/content/sample_data/JPMorgan_QNA_processed_data.csv', header=0)
df_mgmt = pd.read_csv('/content/sample_data/jpmorgan_management_df_preprocessed_final.csv', header=0)

print("Q&A DataFrame:")
display(df_qna.head(3))

print("\nManagement Discussion DataFrame:")
display(df_mgmt.head(3))

Q&A DataFrame:


Unnamed: 0,Index,Quarter-Year,Asked By,Role of the person Asked the question,Question,Answered By,Role of the person answered the question,Answer
0,1,4Q24,John McDonald,"Analyst, Truist Securities, Inc.","Hi. Good morning. Jeremy, I wanted to ask abou...",Jeremy Barnum,"Chief Financial Officer, JPMorganChase","Yeah. Good question, John, and welcome back, b..."
1,2,4Q24,Mike Mayo,"Analyst, Wells Fargo Securities LLC","Hi. Simple and then more difficult, I guess. J...",Jamie Dimon,"Chairman & Chief Executive Officer, JPMorganChase",I do love what I do. And answering the second ...
2,3,4Q24,Jim Mitchell,"Analyst, Seaport Global Securities LLC","Hey. Good morning. Maybe just on regulation, w...",Jeremy Barnum,"Chief Financial Officer, JPMorganChase","Hey, Jim. I mean, it's obviously something we'..."



Management Discussion DataFrame:


Unnamed: 0,Index,Quarter-Year,Text,Text_cleaned
0,,4Q24,MANAGEMENT DISCUSSION SECTION \n \nOperator : ...,['management discussion section operator : goo...
1,,3Q24,MANAGEMENT DISCUSSION SECTION \n \n...,['management discussion section operator : goo...
2,,2Q24,MANAGEMENT DISCUSSION SECTION \n \n...,['management discussion section operator : goo...


###**Data Preparation**

In [6]:
# Drop Unnecessary Columns
df_qna.drop(columns=["Index"], inplace=True, errors='ignore')
df_mgmt.drop(columns=["Index"], inplace=True, errors='ignore')

# Standardize Column Names
df_qna.rename(columns={
    "Quarter-Year": "Quarter",
    "Asked By": "Analyst",
    "Answer": "Response",
    "Answered By": "Executive",
    "Role of the person Asked the question": "Analyst_Role",
    "Role of the person answered the question": "Executive_Role"
}, inplace=True)

In [7]:
df_mgmt.rename(columns={
    "Quarter-Year": "Quarter",
    "Text": "Transcript"
}, inplace=True)

# Drop Missing Q&A Entries (2 rows in the Q&A transcript)
df_qna.dropna(subset=["Question", "Response"], inplace=True)

In [8]:
# Format `Quarter` Properly
def format_quarter(quarter_str):
    match = re.search(r'(\d)Q(\d{2})', quarter_str)
    if match:
        return f"20{match.group(2)}-Q{match.group(1)}"
    return quarter_str


In [9]:
df_qna["Quarter"] = df_qna["Quarter"].astype(str).apply(format_quarter)
df_mgmt["Quarter"] = df_mgmt["Quarter"].astype(str).apply(format_quarter)

In [10]:
# Standardize Executive & Analyst Roles
role_mapping = {
    "Chief Executive Officer": "CEO",
    "Chairman & Chief Executive Officer": "CEO",
    "Chief Financial Officer": "CFO",
    "Chief Operating Officer": "COO",
    "President": "President",
    "Vice Chairman": "Vice Chairman",
    "Head of Investor Relations": "Head of IR",
    "Managing Director": "Managing Director",
    "Analyst, Wolfe Research LLC": "Analyst",
    "Analyst, Jefferies LLC": "Analyst",
    "Analyst, Autonomous Research": "Analyst",
    "Analyst, UBS Securities LLC": "Analyst",
    "Analyst, Seaport Global Securities LLC": "Analyst"
}

# Apply role mapping (handles cases where multiple roles are listed)
def standardize_role(role):
    if pd.isna(role):
        return None
    for key, value in role_mapping.items():
        if key.lower() in role.lower():
            return value
    return role

In [11]:
df_qna["Executive_Role"] = df_qna["Executive_Role"].apply(standardize_role)
df_qna["Analyst_Role"] = df_qna["Analyst_Role"].apply(standardize_role)

In [12]:
# Add `Type` Column
df_qna["Type"] = "Q&A"
df_mgmt["Type"] = "Management Discussion"

print("Q&A DataFrame:")
display(df_qna.head(3))

print("\nManagement Discussion DataFrame:")
display(df_mgmt.head(3))


Q&A DataFrame:


Unnamed: 0,Quarter,Analyst,Analyst_Role,Question,Executive,Executive_Role,Response,Type
0,2024-Q4,John McDonald,"Analyst, Truist Securities, Inc.","Hi. Good morning. Jeremy, I wanted to ask abou...",Jeremy Barnum,CFO,"Yeah. Good question, John, and welcome back, b...",Q&A
1,2024-Q4,Mike Mayo,"Analyst, Wells Fargo Securities LLC","Hi. Simple and then more difficult, I guess. J...",Jamie Dimon,CEO,I do love what I do. And answering the second ...,Q&A
2,2024-Q4,Jim Mitchell,Analyst,"Hey. Good morning. Maybe just on regulation, w...",Jeremy Barnum,CFO,"Hey, Jim. I mean, it's obviously something we'...",Q&A



Management Discussion DataFrame:


Unnamed: 0,Quarter,Transcript,Text_cleaned,Type
0,2024-Q4,MANAGEMENT DISCUSSION SECTION \n \nOperator : ...,['management discussion section operator : goo...,Management Discussion
1,2024-Q3,MANAGEMENT DISCUSSION SECTION \n \n...,['management discussion section operator : goo...,Management Discussion
2,2024-Q2,MANAGEMENT DISCUSSION SECTION \n \n...,['management discussion section operator : goo...,Management Discussion


In [13]:
# Recheck for short, non-substantive responses as indicated by EDA (separate notebook)

# convert Answer_cleaned from string to a list of words
df_qna["Response_clean"] = df_qna["Response"].apply(lambda x: str(x).lower().split() if isinstance(x, str) else [])

# define a threshold for what is considered a "short" response
SHORT_RESPONSE_THRESHOLD = 5

# filter for responses that contain very few words
short_responses = df_qna[df_qna["Response_clean"].apply(lambda x: isinstance(x, list) and len(x) <= SHORT_RESPONSE_THRESHOLD)]

print("Examples of Short Responses:")
print(short_responses[["Quarter", "Response_clean"]].head())

print(f"\nTotal number of short responses: {len(short_responses)}")

Examples of Short Responses:
    Quarter                       Response_clean
30  2024-Q2  [we, have, no, further, questions.]
78  2023-Q2                 [thank, you,, guys.]

Total number of short responses: 2


In [14]:
# Recheck for short, non-substantive responses as indicated by EDA (separate notebook)

# convert Answer_cleaned from string to a list of words
df_qna["Question_clean"] = df_qna["Question"].apply(lambda x: str(x).lower().split() if isinstance(x, str) else [])

# define a threshold for what is considered a "short" response
SHORT_RESPONSE_THRESHOLD = 5

# filter for responses that contain very few words
short_responses = df_qna[df_qna["Question_clean"].apply(lambda x: isinstance(x, list) and len(x) <= SHORT_RESPONSE_THRESHOLD)]

print("Examples of Short Questions:")
print(short_responses[["Quarter", "Question_clean"]].head())

print(f"\nTotal number of short questions: {len(short_responses)}")

Examples of Short Questions:
    Quarter                           Question_clean
9   2024-Q4            [great., thanks, very, much.]
52  2023-Q4  [okay., thanks, very, much,, everyone.]
64  2023-Q3                [thank, you, very, much.]
78  2023-Q2                [thank, you, very, much.]

Total number of short questions: 4


In [15]:
# Remove short, non-informative responses

# flatten nested lists
def flatten_list(nested_list):
    if isinstance(nested_list, list) and len(nested_list) == 1 and isinstance(nested_list[0], list):
        return nested_list[0]
    return nested_list

df_qna["Response_clean"] = df_qna["Response_clean"].apply(flatten_list)
df_qna_filtered = df_qna[df_qna["Response_clean"].apply(lambda x: isinstance(x, list) and len(x) >= SHORT_RESPONSE_THRESHOLD)]

print(f"Removed {len(df_qna) - len(df_qna_filtered)} short non-informative responses.")
df_qna = df_qna_filtered

Removed 1 short non-informative responses.


In [16]:
# Remove short, non-informative responses

# flatten nested lists
def flatten_list(nested_list):
    if isinstance(nested_list, list) and len(nested_list) == 1 and isinstance(nested_list[0], list):
        return nested_list[0]
    return nested_list

df_qna["Question_clean"] = df_qna["Question_clean"].apply(flatten_list)
df_qna_filtered = df_qna[df_qna["Question_clean"].apply(lambda x: isinstance(x, list) and len(x) >= SHORT_RESPONSE_THRESHOLD)]

print(f"Removed {len(df_qna) - len(df_qna_filtered)} short non-informative questions.")
df_qna = df_qna_filtered

Removed 2 short non-informative questions.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_qna["Question_clean"] = df_qna["Question_clean"].apply(flatten_list)


In [17]:
# ensure cleaned text is a proper string
df_qna["Question_clean"] = df_qna["Question_clean"].apply(lambda x: " ".join(x) if isinstance(x, list) else str(x))
df_qna["Response_clean"] = df_qna["Response_clean"].apply(lambda x: " ".join(x) if isinstance(x, list) else str(x))
df_mgmt["Text_cleaned"] = df_mgmt["Text_cleaned"].apply(lambda x: " ".join(x) if isinstance(x, list) else str(x))

# convert text into tokenized lists (split by space)
df_qna["Answer_tokens"] = df_qna["Response_clean"].apply(lambda x: x.split())
df_qna["Question_tokens"] = df_qna["Question"].apply(lambda x: x.split())
df_mgmt["Text_tokens"] = df_mgmt["Text_cleaned"].apply(lambda x: x.split())

In [18]:
# Remove artifacts

def clean_tokens(token_list):
    if isinstance(token_list, list):
        refined_tokens = []
        for token in token_list:
            token = re.sub(r"â", "", token)
            token = re.sub(r"heldtomaturity", "held to maturity", token)
            token = re.sub(r"yearonyear", "year on year", token)
            token = re.sub(r"cohead", "co-head", token)
            token = re.sub(r"typesize", "type size", token)
            token = re.sub(r"[^\w$%&-]", "", token)
            if token.strip():
                refined_tokens.append(token)
        return refined_tokens
    return token_list

df_qna["Question_tokens"] = df_qna["Question_tokens"].apply(clean_tokens)
df_qna["Answer_tokens"] = df_qna["Answer_tokens"].apply(clean_tokens)
df_mgmt["Text_tokens"] = df_mgmt["Text_tokens"].apply(clean_tokens)


In [19]:
# Remove operator text from management discussion

operator_phrases = {
    "operator", "good morning", "ladies", "gentlemen", "welcome",
    "muted", "duration", "call", "please", "refer", "stand", "turn",
    "line", "available", "website", "ahead", "go"
}

def remove_operator_text(tokens):
    if isinstance(tokens, list):
        return [word for word in tokens if word.lower() not in operator_phrases]
    return tokens

df_mgmt["Text_tokens"] = df_mgmt["Text_tokens"].apply(remove_operator_text)


In [20]:
# Convert token lists back to full sentences
df_mgmt["Text_processed"] = df_mgmt["Text_tokens"].apply(lambda x: " ".join(x))

## **1) Management discussion summarisation**

Filtering data on Q4-2024 for testing

In [21]:
df_mgmt_q2 = df_mgmt[df_mgmt["Quarter"] == "2024-Q4"]   #### filtering on selected quarter

management_discussion = df_mgmt_q2["Text_processed"].tolist()  #### genertaing list for modeling

### **Running summarisation model Flan-T5**

In [23]:
# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class TextSummarizer:
    def __init__(self, model_name: str = "google/flan-t5-large", device: Optional[str] = None):
        """Initialize the summarizer with model and device."""
        self.device = device or ('cuda' if torch.cuda.is_available() else 'cpu')
        logger.info(f"Using device: {self.device}")

        try:
            self.tokenizer = AutoTokenizer.from_pretrained(model_name)
            self.model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(self.device)
            logger.info(f"Successfully loaded {model_name}")
        except Exception as e:
            logger.error(f"Error loading model: {str(e)}")
            raise

    def chunk_text(self,
                  text: Union[str, List[str]],
                  chunk_size: int = 400,
                  overlap: int = 50) -> List[str]:
        """Split text into overlapping chunks."""
        # Validate parameters
        if chunk_size <= 0 or overlap < 0 or overlap >= chunk_size:
            raise ValueError("Invalid chunk_size or overlap parameters")

        try:
            # Handle list input
            if isinstance(text, list):
                text = " ".join(text)

            # Handle empty text
            if not text.strip():
                return []

            words = text.split()
            chunks = []
            start = 0

            while start < len(words):
                end = min(start + chunk_size, len(words))
                chunk = " ".join(words[start:end])
                chunks.append(chunk)
                start += chunk_size - overlap

            logger.debug(f"Split text into {len(chunks)} chunks")
            return chunks

        except Exception as e:
            logger.error(f"Error in chunk_text: {str(e)}")
            raise

    def summarize_text(self,
                      text: str,
                      min_new_tokens: int = 200,
                      max_new_tokens: int = 400) -> str:
        """Summarize a single piece of text."""
        # Validate input
        if pd.isna(text) or not text.strip():
            logger.warning("Empty or NaN text provided")
            return ""

        try:
            prompt = f"Generate a detailed and comprehensive summary that captures all key points: {text}"

            # Prepare inputs
            inputs = self.tokenizer(
                prompt,
                return_tensors="pt",
                truncation=True,
                max_length=512
            ).to(self.device)

            # Generate summary with improved parameters
            with torch.no_grad():
                outputs = self.model.generate(
                    inputs.input_ids,
                    min_new_tokens=min_new_tokens,
                    max_new_tokens=max_new_tokens,
                    num_beams=4,  # Use beam search
                    length_penalty=2.0,  # Encourage longer summaries
                    no_repeat_ngram_size=3,  # Prevent repetition
                    early_stopping=True,
                    do_sample=False  # Deterministic generation
                )

            summary = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
            return summary

        except Exception as e:
            logger.error(f"Error in summarize_text: {str(e)}")
            raise

    def summarize_long_text(self,
                          text: Union[str, List[str]],
                          chunk_size: int = 400,
                          overlap: int = 50) -> str:
        """Handle long text summarization."""
        try:
            # Get chunks
            chunks = self.chunk_text(text, chunk_size, overlap)
            if not chunks:
                logger.warning("No valid chunks to summarize")
                return ""

            # Summarize chunks
            chunk_summaries = []
            for i, chunk in enumerate(chunks):
                logger.debug(f"Summarizing chunk {i+1}/{len(chunks)}")
                summary = self.summarize_text(chunk)
                if summary.strip():  # Only add non-empty summaries
                    chunk_summaries.append(summary)

            if not chunk_summaries:
                logger.warning("No valid summaries generated")
                return ""

            # If single chunk, return its summary
            if len(chunk_summaries) == 1:
                return chunk_summaries[0]

            # Summarize the combined summaries
            logger.debug("Generating final summary")
            final_summary = self.summarize_text(
                " ".join(chunk_summaries),
                min_new_tokens=150,  # Longer final summary
                max_new_tokens=300
            )

            return final_summary

        except Exception as e:
            logger.error(f"Error in summarize_long_text: {str(e)}")
            raise

# Running the model
try:
    # Initialize the summarizer
    summarizer = TextSummarizer()

    # Summarizing the management discussion
    logger.info("Starting summarization of management discussion")
    summary_mgmt = summarizer.summarize_long_text(management_discussion)

    # Print the final summary
    print("Final Summary:", summary_mgmt)

# Save summary to a CSV file
    df_summary = pd.DataFrame({"Summary": [summary_mgmt]})  # Create a DataFrame
    df_summary.to_csv("summary_output.csv", index=False)  # Export to CSV

    logger.info("Summary saved to summary_output.csv")

except Exception as e:
    logger.error(f"Error during summarization: {str(e)}")
    print(f"An error occurred: {str(e)}")

Final Summary: jpmorganchases reported net income of $14 billion eps $481 revenue of $437 billion rotce 21% in the fourth quarter of fiscal year 2024 and revenue of $38 billion nii ex markets $548 million 2% driven impact lower rates associated with deposit margin compression well lower deposit balances ccb largely offset impact securities reinvestment higher revolving balances card higher wholesale depositbalances net investment securities losses 21% largely higher asset management fees investment banking fees markets revenue $12 billion 21% expenses $228 billion $17 billion 7% year-on-year excluding prior years fdic special assessment expenses $12 billion 5% predominantly driven compensation well higher brokerage distribution fees credit costs $26 billion reflecting net charge-offs $24 billion net reserve $267 million page 3 see reported results full year ill remind number significant items 2024 excluding items firm reports net income $54 billion ips $1822 revenue $173 billion delive

Some of the punctuation is not captured in the numerical extracts suggesting that the model may need retraining to retain number formatting.

**Adding ROUGE score for valuation**

In [28]:
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)

In [43]:
# Join the list of management discussion strings into a single string
management_discussion_str = " ".join(management_discussion)

# Calculate ROUGE scores
scores = scorer.score(management_discussion_str, summary_mgmt)
for key in scores:
    print(f'{key}: {scores[key]}')

rouge1: Score(precision=0.9325842696629213, recall=0.11976911976911978, fmeasure=0.21227621483375958)
rouge2: Score(precision=0.847457627118644, recall=0.10830324909747292, fmeasure=0.19206145966709348)
rougeL: Score(precision=0.9101123595505618, recall=0.11688311688311688, fmeasure=0.2071611253196931)


The model demonstrates strong precision, producing concise and relevant summaries, though there is room to improve recall to ensure a more comprehensive representation of the reference content.

**ROUGE-1** shows a high precision of 93.26%, meaning the model is effective in selecting relevant unigrams (words) from the reference summaries. However, its recall stands at 11.98%, indicating that it captures only a small portion of the reference summary's unigrams. This disparity results in a relatively modest F1 score of 21.23%, highlighting that while the model's choices are accurate, it misses a considerable amount of content.

**ROUGE-2**, which evaluates the precision and recall of bigrams, exhibits a precision of 84.75% and a recall of 10.83%. The model is again very precise but struggles more with recall, yielding an F1 score of 19.21%, which further emphasizes the need for better coverage in the summarization.

**ROUGE-L**, which assesses the longest common subsequence, presents a precision of 91.01% and a recall of 11.69%, with an F1 score of 20.72%. Similar to the other ROUGE metrics, the model is precise but lacks completeness in terms of recall, suggesting that it may need additional mechanisms for capturing more comprehensive information.


##**2) Q&A summarisation**

### **Analysing Jim Mitchell data Q2-2024 data**

In [103]:
reset_session()

In [24]:
filtered_df = df_qna[(df_qna["Analyst"] == 'Jim Mitchell')& (df_qna["Quarter"] == "2024-Q4")]

# Display results
print(filtered_df)

   Quarter       Analyst Analyst_Role  \
2  2024-Q4  Jim Mitchell      Analyst   

                                            Question      Executive  \
2  Hey. Good morning. Maybe just on regulation, w...  Jeremy Barnum   

  Executive_Role                                           Response Type  \
2            CFO  Hey, Jim. I mean, it's obviously something we'...  Q&A   

                                      Response_clean  \
2  hey, jim. i mean, it's obviously something we'...   

                                      Question_clean  \
2  hey. good morning. maybe just on regulation, w...   

                                       Answer_tokens  \
2  [hey, jim, i, mean, its, obviously, something,...   

                                     Question_tokens  
2  [Hey, Good, morning, Maybe, just, on, regulati...  


**Summarising questions**

In [80]:
analyst_q = filtered_df["Question_clean"].tolist()  #### genertaing list for modeling

Updating code to reflect new modeling parameters, i.e. prompt and output size, for questions summarisation

In [120]:

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class TextSummarizer:
    def __init__(self, model_name: str = "google/flan-t5-large", device: Optional[str] = None):
        """Initialize the summarizer with model and device."""
        self.device = device or ('cuda' if torch.cuda.is_available() else 'cpu')
        logger.info(f"Using device: {self.device}")

        try:
            self.tokenizer = AutoTokenizer.from_pretrained(model_name)
            self.model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(self.device)
            logger.info(f"Successfully loaded {model_name}")
        except Exception as e:
            logger.error(f"Error loading model: {str(e)}")
            raise

    def chunk_text(self,
                  text: Union[str, List[str]],
                  chunk_size: int = 400,
                  overlap: int = 50) -> List[str]:
        """Split text into overlapping chunks."""
        if chunk_size <= 0 or overlap < 0 or overlap >= chunk_size:
            raise ValueError("Invalid chunk_size or overlap parameters")

        try:
            if isinstance(text, list):
                text = " ".join(text)

            if not text.strip():
                return []

            words = text.split()
            chunks = []
            start = 0

            while start < len(words):
                end = min(start + chunk_size, len(words))
                chunk = " ".join(words[start:end])
                chunks.append(chunk)
                start += chunk_size - overlap

            logger.debug(f"Split text into {len(chunks)} chunks")
            return chunks

        except Exception as e:
            logger.error(f"Error in chunk_text: {str(e)}")
            raise

    def summarize_text_q(self,
                      text: str,
                      min_new_tokens: int = 50,
                      max_new_tokens: int = 250) -> str:
        """Summarize a single piece of text focusing on analyst questions."""
        if pd.isna(text) or not text.strip():
            logger.warning("Empty or NaN text provided")
            return ""

        try:
            # Specialized prompt for analyst questions
            prompt = f"Extract and summarize the key questions asked by analysts in the following text: {text}"

            inputs = self.tokenizer(
                prompt,
                return_tensors="pt",
                truncation=True,
                max_length=512
            ).to(self.device)

            with torch.no_grad():
                outputs = self.model.generate(
                    inputs.input_ids,
                    min_new_tokens=min_new_tokens,
                    max_new_tokens=max_new_tokens,
                    num_beams=4,
                    length_penalty=2.0,
                    no_repeat_ngram_size=3,
                    early_stopping=True,
                    do_sample=False
                )

            summary = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
            return summary

        except Exception as e:
            logger.error(f"Error in summarize_text: {str(e)}")
            raise

    def summarize_long_text_q(self,
                          text: Union[str, List[str]],
                          chunk_size: int = 400,
                          overlap: int = 50) -> str:
        """Handle long text summarization focusing on analyst questions."""
        try:
            chunks = self.chunk_text(text, chunk_size, overlap)
            if not chunks:
                logger.warning("No valid chunks to summarize")
                return ""

            chunk_summaries = []
            for i, chunk in enumerate(chunks):
                logger.debug(f"Summarizing chunk {i+1}/{len(chunks)}")
                summary = self.summarize_text_q(chunk)
                if summary.strip():
                    chunk_summaries.append(summary)

            if not chunk_summaries:
                logger.warning("No valid summaries generated")
                return ""

            if len(chunk_summaries) == 1:
                return chunk_summaries[0]

            logger.debug("Generating final summary")
            final_summary = self.summarize_text_q(
                " ".join(chunk_summaries),
                min_new_tokens=150,
                max_new_tokens=300
            )

            return final_summary

        except Exception as e:
            logger.error(f"Error in summarize_long_text: {str(e)}")
            raise

# Running the model
try:
    # Initialize the summarizer
    summarizer_q = TextSummarizer()

    # Summarizing the analyst questions
    logger.info("Starting summarization of analyst questions")
    question_summary = summarizer_q.summarize_long_text_q(analyst_q)

    # Print the summary of questions
    print("\nSummary of Analyst Questions:")
    print(question_summary)

except Exception as e:
    logger.error(f"Error during summarization: {str(e)}")
    print(f"An error occurred: {str(e)}")


Summary of Analyst Questions:
Q&A: What areas of the regulatory structure would be most impactful if it were to change? Is there any area where capital requirements could actually go down? Are there any areas where requirements just simply stop going up? Are you starting to see any improvement in demand on lending?


In [95]:
# Join the list of question strings into a single string
analyst_q_str = " ".join(analyst_q)

# Calculate ROUGE scores
scores_q = scorer.score(analyst_q_str, question_summary)
for key in scores_q:
    print(f'{key}: {scores_q[key]}')

rouge1: Score(precision=0.92, recall=0.34074074074074073, fmeasure=0.49729729729729727)
rouge2: Score(precision=0.7346938775510204, recall=0.26865671641791045, fmeasure=0.39344262295081966)
rougeL: Score(precision=0.8, recall=0.2962962962962963, fmeasure=0.43243243243243246)


ROUGE results reveal good model's ability to capture both the relevance and completeness of the content. The higher recall values, combined with solid precision, have resulted in strong F1 scores across all ROUGE metrics, demonstrating that the model is  producing comprehensive and accurate summaries.

**Summarising answers**

In [40]:
analyst_a = filtered_df["Response_clean"].tolist()  #### genertaing list for modeling

In [41]:
reset_session()

Updating code to reflect new modeling parameters, i.e. prompt and output size, for questions summarisation

In [42]:
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class TextSummarizer:
    def __init__(self, model_name: str = "google/flan-t5-large", device: Optional[str] = None):
        """Initialize the summarizer with model and device."""
        self.device = device or ('cuda' if torch.cuda.is_available() else 'cpu')
        logger.info(f"Using device: {self.device}")

        try:
            self.tokenizer = AutoTokenizer.from_pretrained(model_name)
            self.model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(self.device)
            logger.info(f"Successfully loaded {model_name}")
        except Exception as e:
            logger.error(f"Error loading model: {str(e)}")
            raise

    def chunk_text(self,
                  text: Union[str, List[str]],
                  chunk_size: int = 400,
                  overlap: int = 50) -> List[str]:
        """Split text into overlapping chunks."""
        if chunk_size <= 0 or overlap < 0 or overlap >= chunk_size:
            raise ValueError("Invalid chunk_size or overlap parameters")

        try:
            if isinstance(text, list):
                text = " ".join(text)

            if not text.strip():
                return []

            words = text.split()
            chunks = []
            start = 0

            while start < len(words):
                end = min(start + chunk_size, len(words))
                chunk = " ".join(words[start:end])
                chunks.append(chunk)
                start += chunk_size - overlap

            logger.debug(f"Split text into {len(chunks)} chunks")
            return chunks

        except Exception as e:
            logger.error(f"Error in chunk_text: {str(e)}")
            raise

    def summarize_text(self,
                      text: str,
                      min_new_tokens: int = 100,
                      max_new_tokens: int = 500) -> str:
        """Summarize a single piece of text focusing on analyst questions."""
        if pd.isna(text) or not text.strip():
            logger.warning("Empty or NaN text provided")
            return ""

        try:
            # Specialized prompt for analyst questions
            prompt = f"Summarize by extracting different statements the following text: {text}"

            inputs = self.tokenizer(
                prompt,
                return_tensors="pt",
                truncation=True,
                max_length=512
            ).to(self.device)

            with torch.no_grad():
                outputs = self.model.generate(
                    inputs.input_ids,
                    min_new_tokens=min_new_tokens,
                    max_new_tokens=max_new_tokens,
                    num_beams=4,
                    length_penalty=2.0,
                    no_repeat_ngram_size=3,
                    early_stopping=True,
                    do_sample=False
                )

            summary = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
            return summary

        except Exception as e:
            logger.error(f"Error in summarize_text: {str(e)}")
            raise

    def summarize_long_text(self,
                          text: Union[str, List[str]],
                          chunk_size: int = 400,
                          overlap: int = 50) -> str:
        """Handle long text summarization focusing on analyst questions."""
        try:
            chunks = self.chunk_text(text, chunk_size, overlap)
            if not chunks:
                logger.warning("No valid chunks to summarize")
                return ""

            chunk_summaries = []
            for i, chunk in enumerate(chunks):
                logger.debug(f"Summarizing chunk {i+1}/{len(chunks)}")
                summary = self.summarize_text(chunk)
                if summary.strip():
                    chunk_summaries.append(summary)

            if not chunk_summaries:
                logger.warning("No valid summaries generated")
                return ""

            if len(chunk_summaries) == 1:
                return chunk_summaries[0]

            logger.debug("Generating final summary")
            final_summary = self.summarize_text(
                " ".join(chunk_summaries),
                min_new_tokens=150,
                max_new_tokens=500
            )

            return final_summary

        except Exception as e:
            logger.error(f"Error in summarize_long_text: {str(e)}")
            raise

# Running the model
try:
    # Initialize the summarizer
    summarizer = TextSummarizer()

    # Summarizing the analyst questions
    logger.info("Starting summarization of answer")
    answer_summary = summarizer.summarize_long_text(analyst_a)

    # Print the summary of questions
    print("\nSummary of Answer:")
    print(answer_summary)

except Exception as e:
    logger.error(f"Error during summarization: {str(e)}")
    print(f"An error occurred: {str(e)}")


Summary of Answer:
Jamie Dimon: What do you think about the regulatory framework for banks? Jim: We've been saying for a long time that we want a coherent, rational, holistically-assessed regulatory framework that allows banks to do their job, supporting the economy, that isn't reflexively anti-bank, that doesn't default to the answer every question being more of everything, more capital, more liquidity, that uses data and that balances the obvious goal that we all share of a safe and sound banking system with actually recognizing that banks play a critical role in supporting growth, and the hope is that we get some of that There's a bit of caution in some areas, but we'll see what the new year brings as the current optimism starts getting tested with reality, one way or the other, and you'll actually see that come through c&i loan growth in particular.


In [43]:
# Join the list of management discussion strings into a single string
analyst_a_str = " ".join(analyst_a)

# Calculate ROUGE scores
scores_a = scorer.score(analyst_a_str, answer_summary)
for key in scores_a:
    print(f'{key}: {scores_a[key]}')

rouge1: Score(precision=0.9671052631578947, recall=0.31343283582089554, fmeasure=0.47342995169082125)
rouge2: Score(precision=0.8278145695364238, recall=0.2670940170940171, fmeasure=0.4038772213247173)
rougeL: Score(precision=0.9210526315789473, recall=0.29850746268656714, fmeasure=0.45088566827697263)


The model demonstrates strong precision across all ROUGE metrics, with a particularly high ROUGE-1 precision of 96.7%. However, recall values are relatively lower, indicating that while the model excels at accurately identifying relevant content, it may miss some aspects of the full text. Overall, the F1 scores suggest a moderate balance between precision and recall, with ROUGE-2 showing a solid performance in capturing key bigrams and ROUGE-L reflecting good overall summary structure.

### **Analysing John McDonald data Q2-2024 data**

In [32]:
filtered_df2 = df_qna[(df_qna["Analyst"] == 'John McDonald')& (df_qna["Quarter"] == "2024-Q4")]

# Display results
print(filtered_df2)

   Quarter        Analyst                      Analyst_Role  \
0  2024-Q4  John McDonald  Analyst, Truist Securities, Inc.   

                                            Question      Executive  \
0  Hi. Good morning. Jeremy, I wanted to ask abou...  Jeremy Barnum   

  Executive_Role                                           Response Type  \
0            CFO  Yeah. Good question, John, and welcome back, b...  Q&A   

                                      Response_clean  \
0  yeah. good question, john, and welcome back, b...   

                                      Question_clean  \
0  hi. good morning. jeremy, i wanted to ask abou...   

                                       Answer_tokens  \
0  [yeah, good, question, john, and, welcome, bac...   

                                     Question_tokens  
0  [Hi, Good, morning, Jeremy, I, wanted, to, ask...  


**Summarising questions**

In [122]:
analyst_q2 = filtered_df2["Question_clean"].tolist()  #### genertaing list for modeling

In [None]:
reset_session()

In [124]:
try:
    # Initialize the summarizer
    summarizer_q = TextSummarizer()

    # Summarizing the analyst questions
    logger.info("Starting summarization of analyst questions")
    question_summary2 = summarizer_q.summarize_long_text_q(analyst_q2)

    # Print the summary of questions
    print("\nSummary of Analyst Questions:")
    print(question_summary2)

except Exception as e:
    logger.error(f"Error during summarization: {str(e)}")
    print(f"An error occurred: {str(e)}")


Summary of Analyst Questions:
Key questions asked by analysts: What's the framework for thinking about the opportunity cost of sitting on the growing base of capital and how high you might let that go versus your patience in waiting for more attractive deployment opportunities? When we think about the investment spend agenda this year, how does it differ from last year or last couple of years across lines of business?


In [116]:
# Join the list of management discussion strings into a single string
analyst_q2_str = " ".join(analyst_q2)

# Calculate ROUGE scores
scores_q2 = scorer.score(analyst_q2_str, question_summary2)
for key in scores_a:
    print(f'{key}: {scores_a[key]}')

rouge1: Score(precision=0.5294117647058824, recall=0.24161073825503357, fmeasure=0.33179723502304154)
rouge2: Score(precision=0.16417910447761194, recall=0.07432432432432433, fmeasure=0.10232558139534885)
rougeL: Score(precision=0.35294117647058826, recall=0.1610738255033557, fmeasure=0.2211981566820276)


**Summarinsing answsers**

In [33]:
analyst_text2 = filtered_df2["Response_clean"].tolist()  #### genertaing list for modeling

In [34]:
reset_session()

In [111]:
# Running the model
try:
   # Initialize the summarizer
    summarizer = TextSummarizer()

    # Summarizing the analyst questions
    logger.info("Starting summarization of answer")
    answer_summary2 = summarizer.summarize_long_text(analyst_text2)

    # Print the summary of questions
    print("\nSummary of Answer:")
    print(answer_summary2)

except Exception as e:
    logger.error(f"Error during summarization: {str(e)}")
    print(f"An error occurred: {str(e)}")


Summary of Answer:
"We're going to try to run things, with some important exceptions that I'll highlight in a second, on roughly flat head count and have that lead to people generating internal efficiencies as they get creative with their teams," he said. "The obvious exceptions are the ongoing areas of high certainty investment and growth, so, obviously, branches and bankers, so on. and also, critical non-negotiable areas of risk and control like cyber or whatever independent risk management needs to ensure that we're running the company safely." He added that the company has been able to generate a little bit of efficiency over the last few years, and that's a testament to the bottoms-up culture that they've developed at the company.


In [112]:
# Join the list of management discussion strings into a single string
analyst_text2_str = " ".join(analyst_text2)

# Calculate ROUGE scores
scores2 = scorer.score(analyst_text2_str, answer_summary2)
for key in scores2:
    print(f'{key}: {scores2[key]}')

rouge1: Score(precision=0.9603174603174603, recall=0.12683438155136267, fmeasure=0.22407407407407406)
rouge2: Score(precision=0.832, recall=0.10912906610703044, fmeasure=0.19294990723562153)
rougeL: Score(precision=0.7142857142857143, recall=0.09433962264150944, fmeasure=0.16666666666666669)
