# Financial Report Generation

<a href="https://colab.research.google.com/github/run-llama/llamacloud-demo/blob/main/examples/report_generation/report_generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook we show you how to perform financial report generation with LlamaCloud consisting of text and tables, given an existing bank of reports.

LlamaCloud provides advanced retrieval endpoints allowing you to fetch chunk and document-level context from complex financial reports consisting of text, tables, and sometimes images/diagrams.

We build an agentic workflow on top of LlamaCloud consisting of researcher and writer steps in order to generate the final response.

## Setup

Install core packages, download 10k files from Apple and Tesla.

You will need to upload these documents to LlamaCloud. For best results, we recommend: 
- Setting Parse settings to "Accurate" mode, "Premium" mode, or "3rd Party multimodal" 
- Setting the "Segmentation Configuration" to "Page" and the "Chunking Configuration" to None. This will give you page-level chunks.

In [None]:
!pip install llama-index
!pip install llama-index-core
!pip install llama-index-embeddings-openai
!pip install llama-index-question-gen-openai
!pip install llama-index-postprocessor-flag-embedding-reranker
!pip install git+https://github.com/FlagOpen/FlagEmbedding.git
!pip install llama-parse

In [None]:
!mkdir data
# download Apple 
!wget "https://s2.q4cdn.com/470004039/files/doc_earnings/2023/q4/filing/_10-K-Q4-2023-As-Filed.pdf" -O data/apple_2023.pdf
!wget "https://s2.q4cdn.com/470004039/files/doc_financials/2022/q4/_10-K-2022-(As-Filed).pdf" -O data/apple_2022.pdf
!wget "https://s2.q4cdn.com/470004039/files/doc_financials/2021/q4/_10-K-2021-(As-Filed).pdf" -O data/apple_2021.pdf
!wget "https://s2.q4cdn.com/470004039/files/doc_financials/2020/ar/_10-K-2020-(As-Filed).pdf" -O data/apple_2020.pdf
!wget "https://www.dropbox.com/scl/fi/i6vk884ggtq382mu3whfz/apple_2019_10k.pdf?rlkey=eudxh3muxh7kop43ov4bgaj5i&dl=1" -O data/apple_2019.pdf

# download Tesla
!wget "https://ir.tesla.com/_flysystem/s3/sec/000162828024002390/tsla-20231231-gen.pdf" -O data/tesla_2023.pdf
!wget "https://ir.tesla.com/_flysystem/s3/sec/000095017023001409/tsla-20221231-gen.pdf" -O data/tesla_2022.pdf
!wget "https://www.dropbox.com/scl/fi/ptk83fmye7lqr7pz9r6dm/tesla_2021_10k.pdf?rlkey=24kxixeajbw9nru1sd6tg3bye&dl=1" -O data/tesla_2021.pdf
!wget "https://ir.tesla.com/_flysystem/s3/sec/000156459021004599/tsla-10k_20201231-gen.pdf" -O data/tesla_2020.pdf
!wget "https://ir.tesla.com/_flysystem/s3/sec/000156459020004475/tsla-10k_20191231-gen_0.pdf" -O data/tesla_2019.pdf

Some OpenAI and LlamaParse details. The OpenAI LLM is used for response synthesis.

In [1]:
# llama-parse is async-first, running the async code in a notebook requires the use of nest_asyncio
import nest_asyncio
nest_asyncio.apply()

In [2]:
import os
# API access to llama-cloud
os.environ["LLAMA_CLOUD_API_KEY"] = "llx-"

In [3]:
# Using OpenAI API for embeddings/llms
os.environ["OPENAI_API_KEY"] = "sk-"

In [2]:
# setup embedding/LLM model
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding(model="text-embedding-3-large")
llm = OpenAI(model="gpt-4o-mini")

Settings.embed_model = embed_model
Settings.llm = llm

## Load Documents into LlamaCloud

The first order of business is to download the 5 Apple and Tesla 10Ks and upload them into LlamaCloud.

You can easily do this by creating a pipeline and uploading docs via the "Files" mode.

After this is done, proceed to the next section.

## Define LlamaCloud File/Chunk Retriever over Documents

In this section we define both a file-level and chunk-level LlamaCloud Retriever over these documents.

The file-level LlamaCloud retriever returns entire documents with a `files_top_k`. There are two retrieval modes:
- `files_via_content`: Retrieve top-k chunks, dereference into source files. Use a weighted average heuristic to determine the top files to return.
- `files_via_metadata`: Use an LLM to analyze the metadata of each file, and determine the top files that are most relevant to the query.

The chunk-level LlamaCloud retriever is our default retriever that returns chunks via hybrid search + reranking.

In [3]:
from llama_index.indices.managed.llama_cloud import LlamaCloudIndex
import os

index = LlamaCloudIndex(
  name="apple_tesla_demo_2",
  project_name="llamacloud_demo",
  api_key=os.environ["LLAMA_CLOUD_API_KEY"]
)

#### Define File Retriever

In this section we define the file-level retriever. By default we use `retrieval_mode="files_via_content"`, but you can also change it to `files_via_metadata`.

In [4]:
doc_retriever = index.as_retriever(
    retrieval_mode="files_via_content",
    # retrieval_mode="files_via_metadata",
    files_top_k=1
)

In [5]:
nodes = doc_retriever.retrieve("Give me a summary of Tesla in 2019") 

In [6]:
# print(nodes[0].get_content())

#### Define chunk retriever

The chunk-level retriever does vector search with a final reranked set of `rerank_top_n=5`.

In [7]:
chunk_retriever = index.as_retriever(
    retrieval_mode="chunks",
    rerank_top_n=5
)

#### Define Retriever Tools

Wrap these with Python functions into tool objects - these will directly be used by the LLM.

In [8]:
from llama_index.core.tools import FunctionTool
from llama_index.core.schema import NodeWithScore
from typing import List

# function tools
def chunk_retriever_fn(query: str) -> List[NodeWithScore]:
    """Retrieves a small set of relevant document chunks from the corpus.

    ONLY use for research questions that want to look up specific facts from the knowledge corpus,
    and don't need entire documents.

    """
    return chunk_retriever.retrieve(query)

def doc_retriever_fn(query: str) -> float:
    """Document retriever that retrieves entire documents from the corpus.

    ONLY use for research questions that may require searching over entire research reports.

    Will be slower and more expensive than chunk-level retrieval but may be necessary.
    """
    return doc_retriever.retrieve(query)

chunk_retriever_tool = FunctionTool.from_defaults(fn=chunk_retriever_fn)
doc_retriever_tool = FunctionTool.from_defaults(fn=doc_retriever_fn)

## Build a Report Generation Workflow

Now that we've defined the retrievers, we're ready to build the report generation workflow.

The workflow contains roughly the following steps:

1. **Research Gathering**: Perform a function calling loop where the agent tries to reason about what tool to call (chunk-level or document-level retrieval) in order to gather more information. All information is shared to a dictionary that is propagated throughout each step. The tools return an indication of the type of information returned to the agent. After the agent feels like it's gathered enough information, move on to the next phase.
2. **Report Generation**: Generate a research report given the pooled research. For now, try to stuff as much information into the context window through the summary index.

This implementation is inspired by our [Function Calling Agent](https://docs.llamaindex.ai/en/stable/examples/workflow/function_calling_agent/) workflow implementation.

In [9]:
from llama_index.llms.openai import OpenAI
from pydantic import BaseModel, Field
from typing import List, Tuple
import pandas as pd
from IPython.display import display, Markdown


class TextBlock(BaseModel):
    """Text block."""

    text: str = Field(..., description="The text for this block.")


class TableBlock(BaseModel):
    """Image block."""

    caption: str = Field(..., description="Caption of the table.")
    col_names: List[str] = Field(..., description="Names of the columns.")
    rows: List[Tuple] = Field(
        ...,
        description=(
            "List of rows. Each row is a data entry tuple, "
            "where each element of the tuple corresponds positionally to the column name."
        )
    )

    def to_df(self) -> pd.DataFrame:
        """To dataframe."""
        df = pd.DataFrame(self.rows, columns=self.col_names)
        df.style.set_caption(self.caption)
        return df


class ReportOutput(BaseModel):
    """Data model for a report.

    Can contain a mix of text and table blocks. Use table blocks to present any quantitative metrics and comparisons.

    """

    blocks: List[TextBlock | TableBlock] = Field(
        ..., description="A list of text and table blocks."
    )

    def render(self) -> None:
        """Render as formatted text within a jupyter notebook."""
        for b in self.blocks:
            if isinstance(b, TextBlock):
                display(Markdown(b.text))
            else:
                display(b.to_df())


report_gen_system_prompt = """\
You are a report generation assistant tasked with producing a well-formatted context given parsed context.

You will be given context from one or more reports that take the form of parsed text + tables

You are responsible for producing a report with interleaving text and tables - in the format of interleaving text and "table" blocks.

You MUST output your response as a tool call in order to adhere to the required output format. Do NOT give back normal text.

"""
report_gen_llm = OpenAI(
    model="gpt-4o-mini", 
    system_prompt=report_gen_system_prompt, 
    max_tokens=2048,
    context_window=126000
)
report_gen_sllm = report_gen_llm.as_structured_llm(output_cls=ReportOutput)

In [10]:
report_gen_sllm.metadata.context_window

126000

In [11]:
# from llama_index.core.response_synthesizers import TreeSummarize, CompactAndRefine
# apple_doc = doc_retriever.retrieve("apple balance sheet 2021")
# tesla_doc = doc_retriever.retrieve("tesla balance sheet 2021")
# docs = apple_doc + tesla_doc
# query = "Give me the consolidated balance sheet for Apple and Tesla in 2021"
# summarizer = TreeSummarize(llm=report_gen_sllm)

# response = summarizer.synthesize(query, nodes=docs)

Num prompt tokens: 131
System prompt tokens: 91
Text chunk tokens: 123783
Full summary messages tokens: 123911
Text chunk tokens: 123815
Full summary messages tokens: 123943
Text chunk tokens: 21185
Full summary messages tokens: 21312
Full messages tokens: 124101
blocks=[TextBlock(text='# Apple Inc. Consolidated Balance Sheet (as of September 25, 2021)')]
<class '__main__.ReportOutput'>
Full messages tokens: 124133
blocks=[TextBlock(text='As of December 31, 2021, the consolidated balance sheet for Tesla is as follows:'), TableBlock(caption='Tesla Consolidated Balance Sheet (in millions)', col_names=['Assets', '2021'], rows=[('Cash and cash equivalents', '$17,576'), ('Restricted cash included in prepaid expenses and other current assets', '345'), ('Restricted cash included in other non-current assets', '223'), ('Total cash and cash equivalents and restricted cash', '$18,144'), ('Accounts receivable, net', '$2,627'), ('Inventory', '$5,757'), ('Solar energy systems, net', '$5,765'), ('Pro

In [15]:
response.response.render()

# Apple Inc. Consolidated Balance Sheet (as of September 25, 2021)

Unnamed: 0,Assets,2021
0,Cash and cash equivalents,"$17,576"
1,Restricted cash included in prepaid expenses a...,345
2,Restricted cash included in other non-current ...,223
3,Total cash and cash equivalents and restricted...,"$18,144"
4,"Accounts receivable, net","$2,627"
5,Inventory,"$5,757"
6,"Solar energy systems, net","$5,765"
7,"Property, plant and equipment, net","$18,884"
8,Total assets,"$24,649"


The consolidated balance sheet for Apple and Tesla in 2021 is not provided in the available context. However, the context includes various sections of agreements and corporate information related to Tesla, Inc. and its subsidiaries, but does not contain specific financial statements or balance sheets for either company.

Unnamed: 0,Assets,Liabilities,Equity
0,Total Assets: $XX,Total Liabilities: $XX,Total Equity: $XX


The specific financial statements for Apple Inc. are not available in the provided context.

In [None]:
from llama_index.core.workflow import Workflow

from typing import Any, List
from operator import itemgetter

from llama_index.core.llms.function_calling import FunctionCallingLLM
from llama_index.core.llms.structured_llm import StructuredLLM
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.llms import ChatMessage
from llama_index.core.tools.types import BaseTool
from llama_index.core.tools import ToolSelection
from llama_index.core.workflow import Workflow, StartEvent, StopEvent, Context, step
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import TreeSummarize, CompactAndRefine
from llama_index.core.workflow import Event


class InputEvent(Event):
    input: List[ChatMessage]


class ChunkRetrievalEvent(Event):
    tool_call: ToolSelection


class DocRetrievalEvent(Event):
    tool_call: ToolSelection


class ReportGenerationEvent(Event):
    pass



# # TMP
# def _write_log(log: str):
#     with open("test_log.txt", "a") as fp:
#         fp.write(log + "\n")


class ReportGenerationAgent(Workflow):
    """Report generation agent."""

    def __init__(
        self,
        chunk_retriever_tool: BaseTool,
        doc_retriever_tool: BaseTool,
        llm: FunctionCallingLLM | None = None,
        report_gen_sllm: StructuredLLM | None = None,
        **kwargs: Any,
    ) -> None:
        super().__init__(**kwargs)
        self.chunk_retriever_tool = chunk_retriever_tool
        self.doc_retriever_tool = doc_retriever_tool

        self.llm = llm or OpenAI()
        self.summarizer = CompactAndRefine(llm=self.llm)
        assert self.llm.metadata.is_function_calling_model

        self.report_gen_sllm = report_gen_sllm or self.llm.as_structured_llm(
            ReportOutput, system_prompt=report_gen_system_prompt
        )
        # self.report_gen_summarizer = CompactAndRefine(llm=self.report_gen_sllm)
        self.report_gen_summarizer = TreeSummarize(llm=self.report_gen_sllm)

        self.memory = ChatMemoryBuffer.from_defaults(llm=llm)
        self.sources = []

    @step(pass_context=True)
    async def prepare_chat_history(self, ctx: Context, ev: StartEvent) -> InputEvent:
        # clear sources
        self.sources = []

        ctx.data["stored_chunks"] = []
        ctx.data["query"] = ev.input

        # get user input
        user_input = ev.input
        user_msg = ChatMessage(role="user", content=user_input)
        self.memory.put(user_msg)

        # get chat history
        chat_history = self.memory.get()
        return InputEvent(input=chat_history)

    @step(pass_context=True)
    async def handle_llm_input(
        self, ctx: Context, ev: InputEvent
    ) -> ChunkRetrievalEvent | DocRetrievalEvent | ReportGenerationEvent | StopEvent:
        chat_history = ev.input

        response = await self.llm.achat_with_tools(
            [self.chunk_retriever_tool, self.doc_retriever_tool],
            chat_history=chat_history,
        )
        self.memory.put(response.message)

        tool_calls = self.llm.get_tool_calls_from_response(
            response, error_on_no_tool_call=False
        )
        if not tool_calls:
            # all the content should be stored in the context, so just pass along input
            return ReportGenerationEvent(input=ev.input)

        for tool_call in tool_calls:
            if tool_call.tool_name == self.chunk_retriever_tool.metadata.name:
                return ChunkRetrievalEvent(tool_call=tool_call)
            elif tool_call.tool_name == self.doc_retriever_tool.metadata.name:
                return DocRetrievalEvent(tool_call=tool_call)
            else:
                return StopEvent(result={"response": "Invalid tool."})

    @step(pass_context=True)
    async def handle_retrieval(
        self, ctx: Context, ev: ChunkRetrievalEvent | DocRetrievalEvent
    ) -> InputEvent:
        """Handle retrieval.

        Store retrieved chunks, and go back to agent reasoning loop.

        """
        query = ev.tool_call.tool_kwargs["query"]
        if isinstance(ev, ChunkRetrievalEvent):
            retrieved_chunks = self.chunk_retriever_tool(query).raw_output
        else:
            retrieved_chunks = self.doc_retriever_tool(query).raw_output
        ctx.data["stored_chunks"].extend(retrieved_chunks)

        # synthesize an answer given the query to return to the LLM.
        response = self.summarizer.synthesize(query, nodes=retrieved_chunks)
        self.memory.put(
            ChatMessage(
                role="tool",
                content=str(response),
                additional_kwargs={
                    "tool_call_id": ev.tool_call.tool_id,
                    "name": ev.tool_call.tool_name,
                },
            )
        )

        # send input event back with updated chat history
        return InputEvent(input=self.memory.get())

    @step(pass_context=True)
    async def generate_report(
        self, ctx: Context, ev: ReportGenerationEvent
    ) -> StopEvent:
        """Generate report."""
        # given all the context, generate query
        response = self.report_gen_summarizer.synthesize(
            ctx.data["query"], nodes=ctx.data["stored_chunks"]
        )

        return StopEvent(result={"response": response})

In [None]:
agent = ReportGenerationAgent(
    chunk_retriever_tool,
    doc_retriever_tool,
    llm=llm,
    report_gen_sllm=report_gen_sllm,
    verbose=True,
    timeout=120.0,
)

In [12]:
ret = await agent.run(
    input="Give me the consolidated balance sheet for Apple and Tesla in 2021"
)

Running step prepare_chat_history
Step prepare_chat_history produced event InputEvent
Running step handle_llm_input
Step handle_llm_input produced event DocRetrievalEvent
Running step handle_retrieval
Step handle_retrieval produced event InputEvent
Running step handle_llm_input
Step handle_llm_input produced event DocRetrievalEvent
Running step handle_retrieval
Step handle_retrieval produced event InputEvent
Running step handle_llm_input
Step handle_llm_input produced event ReportGenerationEvent
Running step generate_report


BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, you requested 128740 tokens (126530 in the messages, 162 in the functions, and 2048 in the completion). Please reduce the length of the messages, functions, or completion.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}

In [19]:
ret["response"].response.render()

NameError: name 'ret' is not defined

In [None]:
ret = await agent.run(
    input="Tell me about the gross margin breakdown of Apple 2020-2023."
)

In [None]:
ret["response"].response.render()