# AI-Assisted Financial Analysis Report

This Google Colab notebook demonstrates how to leverage a large language model (LLM) to assist in building a financial analysis report.
It uses the project instructions and lecture notes from the **Financial Accounting for Managers** course to guide the model and generate a structured report.

The workflow will:

1. Install required libraries from Hugging Face.
2. Load the provided Quarto (`.qmd`) files containing the project instructions and lecture topics.
3. Summarise long texts using a pre‑trained summarisation model.
4. Build a prompt that combines the key points from the instructions and lectures.
5. Use a text generation model to draft a report with six sections: Company description, Financing, Investing, Operating performance, Forecast, and Conclusions.
6. Save the generated report to a local file for further editing and review.

**Note**: This notebook is meant as a pedagogical example.  You can adapt the choice of model or parameters depending on your needs.  Colab provides internet access so installing and loading models from the Hugging Face Hub should work.  Make sure you have uploaded the `.qmd` files (project instructions and topics) into the Colab environment or adjust the file paths accordingly.


## 1. Setup

First we install and import the necessary libraries.  The `transformers` library from Hugging Face provides easy‑to‑use pipelines for text summarisation and generation.

Executing the cell below will install the dependencies (it may take a few minutes when run for the first time).

In [1]:
# Install Hugging Face Transformers (uncomment if running for the first time)

#%pip install torch --quiet
#%pip install ipywidgets --quiet
#%pip install -q transformers sentencepiece

# Import required modules
from transformers import pipeline
import os


## 2. Load project instructions and lecture topics

The project instructions and lecture topics are stored in Quarto (`.qmd`) files.
To use them in Colab, you can either upload the files manually (`File` → `Upload`) or mount them from Google Drive.
Make sure the file names below match the uploaded files.

Below we define a helper function to read a text file and return its contents as a string.  Then we read the project instructions and each of the six lecture topic files.

In [2]:
# Helper function to read a text file
def read_file(path):
    with open(path, 'r', encoding='utf-8') as f:
        return f.read()
# Change to the directory where your .qmd files are located
os.chdir('/home/miortiz/Documents/GitHub/lecture_financial_statements_analysis/')
# Paths to the .qmd files (adjust these if your files have different names or locations)
instruction_path = 'project_instructions.qmd'
topic_files = [
    'topic_1_foundations.qmd',
    'topic_2_financing.qmd',
    'topic_3_investing.qmd',
    'topic_4_operating.qmd',
    'topic_5_cashflows.qmd',
    'topic_6_forecasting.qmd'
]

# Read the project instructions
instructions_text = read_file(instruction_path)

# Read and concatenate the lecture topics into a single string
topics_text = "\n".join([read_file(path) for path in topic_files])

# Show a short preview of the instruction file (optional)
print('Instruction file preview:')
print(instructions_text[:500] + '\n...')


Instruction file preview:
---
title: "Financial Accounting for Managers: Final Project"
author: Marcelo Ortiz M.
date: November 2024
format:
  pdf:
    toc: true
    number-sections: false
---

# General instructions

In groups of 3 or 4 students, pursue a financial analysis of a publicly-listed company from @tbl-companies. This analysis must be based on the latest annual reports available, and complemented with additional information when needed.

| Industry                              | Company                 |
|----
...


## 3. Summarise long documents

Language models typically have a limit on the number of tokens they can process at once.  To summarise long documents like our instruction file or lecture notes, we split the text into manageable chunks, summarise each chunk, and then optionally summarise the summaries.

The function below uses a summarisation pipeline with a BART model (`facebook/bart-large-cnn`) to produce concise summaries.  Feel free to experiment with other models available on the Hugging Face Hub.

In [5]:
#%pip install nltk --quiet

# Create a summarisation pipeline
# You can choose another summarisation model from Hugging Face if you prefer
summariser = pipeline('summarization', model='facebook/bart-large-cnn')

# Helper function to summarise text in chunks
def summarise_text(text, max_chunk=1000, summary_max_length=200, summary_min_length=50):
    """Summarise a long piece of text by splitting it into chunks."""
    # Split text into sentences to avoid cutting sentences in half
    import nltk
    nltk.download('punkt_tab', quiet=True)
    nltk.download('punkt', quiet=True)
    from nltk.tokenize import sent_tokenize

    sentences = sent_tokenize(text)
    chunks = []
    current_chunk = ''
    current_len = 0

    for sentence in sentences:
        # If adding the next sentence would exceed the max_chunk length, start a new chunk
        if current_len + len(sentence) > max_chunk:
            chunks.append(current_chunk.strip())
            current_chunk = sentence + ' '
            current_len = len(sentence)
        else:
            current_chunk += sentence + ' '
            current_len += len(sentence)

    # Append the last chunk
    if current_chunk:
        chunks.append(current_chunk.strip())

    summaries = []
    for chunk in chunks:
        summary = summariser(chunk, max_length=summary_max_length, min_length=summary_min_length, do_sample=False)[0]['summary_text']
        summaries.append(summary)

    # If there is more than one summary, summarise the summaries
    if len(summaries) > 1:
        combined_text = " ".join(summaries)
        # Truncate to 2000 characters (adjust as needed)
        combined_text = combined_text[:2000]
        combined_summary = summariser(combined_text, max_length=summary_max_length, min_length=summary_min_length, do_sample=False)[0]['summary_text']
        return combined_summary
    else:
        return summaries[0]

# Generate summaries of the instructions and topics
instructions_summary = summarise_text(instructions_text, max_chunk=1500, summary_max_length=250, summary_min_length=80)
topics_summary = summarise_text(topics_text, max_chunk=1500, summary_max_length=250, summary_min_length=80)

print("Summary of project instructions:\n", instructions_summary)

Device set to use cpu
Your max_length is set to 250, but your input_length is only 74. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=37)
Your max_length is set to 250, but your input_length is only 74. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=37)
Your max_length is set to 250, but your input_length is only 244. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=122)
Your max_length is set to 250, but your input_length is only 244. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('

Summary of project instructions:
 In groups of 3 or 4 students, pursue a financial analysis of a publicly-listed company from @tbl-companies. This analysis must be based on the latest annual reports available, and complemented with additional information when needed. The final project is titled "Financial Accounting for Managers: Final Project" It will be published in November 2024 and will be available to all students at the end of the project.


## 4. Generate a draft report using a text generation model

With our summaries ready, we can now build a prompt that instructs the language model to generate a structured financial analysis report.
We'll use the FLAN‑T5 model (`google/flan-t5-base`) for text generation.  You can swap in `google/flan-t5-large` or any other instruction‑tuned model depending on the resources you have available.

The prompt explicitly lists the six sections required in the project: (1) company description and recent news, (2) financing situation, (3) investing situation, (4) operating performance, (5) forecast, and (6) conclusions.  The model uses the summaries we generated to inform its output.

In [6]:
# Create a text generation pipeline using FLAN‑T5
report_generator = pipeline('text2text-generation', model='google/flan-t5-base')

# Build the prompt for report generation
prompt = f"""
You are a financial analyst asked to write a detailed report for a publicly listed company.
The report should follow the structure below and be based on the provided summaries.

Project instructions summary:
{instructions_summary}

Lecture topics summary:
{topics_summary}

Write a report with the following sections:
1. Company description and recent news
2. Analysis of the financing situation
3. Analysis of the investing situation
4. Analysis of operating performance
5. Forecast and assumptions
6. Conclusions and recommendations

For each section, provide clear, concise paragraphs explaining the key points, incorporating information from the summaries where relevant.
End the report with a brief discussion of potential limitations and risks associated with the analysis.
"""

# Generate the report
report_output = report_generator(prompt, max_length=1024, num_beams=4, do_sample=False)[0]['generated_text']

print("Generated report:\n", report_output)


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

Device set to use cpu
Both `max_new_tokens` (=256) and `max_length`(=1024) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=256) and `max_length`(=1024) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Generated report:
 Summary of a financial analysis of a publicly listed company.


In [None]:
# Save the generated report to a text file for later editing or inclusion in your project
output_path = '/content/generated_financial_report.txt'
with open(output_path, 'w', encoding='utf-8') as f:
    f.write(report_output)

print(f"Report saved to {output_path}")


## 5. Next steps

The generated report serves as a draft that you can refine and expand.  Consider the following when working on your project:

- **Validate facts and figures**: The language model synthesises information from the summaries but does not access real‑time financial data. You should cross‑check numbers and company‑specific details using the latest annual reports and reliable news sources.
- **Incorporate charts and tables**: Supplement the textual analysis with your own calculations in Excel and visualise key metrics (e.g., debt structure, revenue breakdown, forecast scenarios).
- **Tailor the narrative**: Adjust the tone and focus of the report to match the intended audience (investors, creditors, etc.) and highlight insights that are most relevant.
- **Document your assumptions**: Clearly state the assumptions underlying your forecast and discuss how sensitive your conclusions are to changes in these assumptions.

By combining structured financial analysis with AI‑powered drafting, you can accelerate the reporting process while maintaining a high level of clarity and professionalism.


In [None]:
# Step 1: Upload the annual report PDF file
from google.colab import files
uploaded = files.upload()
import os
pdf_filename = list(uploaded.keys())[0]
print(f"Uploaded file: {pdf_filename}")

# Step 2: Extract text from the PDF file
!pip install -q pdfplumber
import pdfplumber

pdf_text = ""
with pdfplumber.open(pdf_filename) as pdf:
    for page in pdf.pages:
        pdf_text += page.extract_text() + "\n"

print("First 1000 characters of extracted text:")
print(pdf_text[:1000])