# Report Generation with LlamaReport

In this notebook, we'll walk through the basic process of generating a report with LlamaReport, and highlight some of the key features of the library.

TLDR:
1. Download source data to use as knowledge base for the report
2. Kick off report generation with a template
3. Get the plan and review/accept/reject suggestions
4. Get the final report
5. Review/accept/reject suggestions to edit the final report
6. Print the final report

In [None]:
%pip install llama-cloud-services

## 1. Download Source Data

Here, we download the `Attention is All You Need` paper as a PDF.

LlamaReport currently supports up to 5 files as input, and essentially any file type that can be parsed by LlamaParse.


In [None]:
!wget "https://arxiv.org/pdf/1706.03762.pdf" -O "./attention.pdf"

## 2. Kick off Report Generation

Here, we kick off report generation with a template.

The template can either be a string or a file path, but here we'll use a string.

In our experiments, anything works as a template, but some general guidelines:

- Use markdown formatting + instructions in each section to guide the report generation
- If using an existing file as a template, provide extra instructions to guide the report generation

**NOTE:** Since we are in a notebook, we will use async functions and `await` throughout. Synchronous methods that work without `await` are available by just removing the `a` from the method name and removing the `await` keyword.

In [None]:
from llama_cloud_services import LlamaReport

llama_report = LlamaReport(
    api_key="llx-...",
)

report_client = await llama_report.acreate_report(
    name="my_cool_report_on_attention",
    # can pass in file paths or bytes
    input_files=["./attention.pdf"],
    template_text="""\
# [Some title]\n\n
## TLDR\n
A quick summary of the paper.\n\n
## Details\n
More details about the paper, possibly more than one section here.\n
""",
    # optional additional instructions for the report generation
    # template_instructions=None,
    # optional file path to an existing template instead of template_text
    # template_file=None,
)

The returned `ReportClient` object is used to interact with the report generation process for this specific report.

In [None]:
print(report_client)

Report(id=0a394b33-1a3e-463c-b5cb-7ff8ab827d0a, name=my_cool_report_on_attention)


## 3. Get the plan

The first phases of report generation involve ingesting the source data and generating a plan.

The plan is a list of instructions for the report generation, and can be reviewed/accepted/rejected by the user.


In [None]:
plan = await report_client.await_for_plan(
    timeout=10000,
    poll_interval=10,
)

In [None]:
for plan_block in plan.blocks:
    print(plan_block.block.template)
    print(plan_block.queries)
    print("==================")

# {title}
[ReportQuery(field='title', prompt='Generate a clear and concise title for this paper about the Transformer model and attention mechanisms', context='The paper discusses the Transformer architecture for sequence transduction using attention mechanisms, focusing on machine translation applications')]
## TLDR

{tldr_content}
[ReportQuery(field='tldr_content', prompt='Write a brief, clear summary of the key points about the Transformer model', context='Focus on the main innovations: attention mechanisms, efficiency improvements, and state-of-the-art results in machine translation')]
## Details

{details_content}
[ReportQuery(field='details_content', prompt='Provide detailed information about the Transformer model architecture and its applications', context='Include information about:\n- The attention mechanism implementation\n- Advantages over recurrent and convolutional models\n- Performance in machine translation tasks\n- Training efficiency improvements')]


With the plan, we can either use it to kick off generation of the final report, or we can edit the plan and adjust it as needed.

While we could manually edit the objects here and use `await report_client.aupdate_plan(action="edit", updated_plan=plan)`, we can also use `LlamaReport` to agentically edit the plan.

In [None]:
suggestions = await report_client.asuggest_edits(
    "Can you split the details section into two sections?"
)

In [None]:
for suggestion in suggestions:
    print("Justification for change:", suggestion.justification)
    print("Proposed changes:")
    for plan_block in suggestion.blocks:
        print(plan_block.block.template)
        print(plan_block.queries)
        print("==================")

Justification for change: 
I'll help you break down the details section into two distinct parts - one focusing on the architecture and another on the practical applications and performance. This will make the content more organized and easier to follow. The original block at index 2 will be replaced with these two new sections.

Proposed changes:

## Architecture Details

{architecture_content}

[ReportQuery(field='architecture_content', prompt='Describe the technical details of the Transformer model architecture', context='Focus on:\n- Core components of the Transformer architecture\n- Self-attention mechanism implementation\n- Multi-head attention details\n- Position encoding approach\n- Feed-forward network structure')]

## Performance and Applications

{applications_content}

[ReportQuery(field='applications_content', prompt='Explain the practical applications and performance advantages of the Transformer model', context='Cover:\n- Comparison with RNN and CNN models\n- Machine tran

This looks pretty good! We can also use the client to automatically accept and apply, or reject, these suggestions.

This will (locally) keep track of the history of changes, so that future suggestions can be based on the previous changes.

In [None]:
for suggestion in suggestions:
    await report_client.aaccept_edit(suggestion)

What effect did that have on the tracked local history? Let's see!

In [None]:
report_client.edit_history

[EditAction(block_idx=2, old_content='## Details\n\n{details_content}\n\nField: details_content, Prompt: Provide detailed information about the Transformer model architecture and its applications, Context: Include information about:\n- The attention mechanism implementation\n- Advantages over recurrent and convolutional models\n- Performance in machine translation tasks\n- Training efficiency improvements\nDepends on: none', new_content='\n## Architecture Details\n\n{architecture_content}\n\n\nField: architecture_content, Prompt: Describe the technical details of the Transformer model architecture, Context: Focus on:\n- Core components of the Transformer architecture\n- Self-attention mechanism implementation\n- Multi-head attention details\n- Position encoding approach\n- Feed-forward network structure\nDepends on: none', action='approved', timestamp=datetime.datetime(2025, 2, 4, 20, 59, 55, 773558)),
 EditAction(block_idx=3, old_content='[No old content]', new_content='\n## Performan

In [None]:
report_client.chat_history

[Message(role=<MessageRole.USER: 'user'>, content='Can you split the details section into two sections?', timestamp=datetime.datetime(2025, 2, 4, 20, 59, 47, 754848)),
 Message(role=<MessageRole.ASSISTANT: 'assistant'>, content="\nI'll help you break down the details section into two distinct parts - one focusing on the architecture and another on the practical applications and performance. This will make the content more organized and easier to follow. The original block at index 2 will be replaced with these two new sections.\n", timestamp=datetime.datetime(2025, 2, 4, 20, 59, 55, 482070))]

These two items are used to provide context for future suggestions! You can always clear this, or provide your own history.

In [None]:
# report_client.suggest_edits("....", chat_history=[{"role": "user", "content": "..."}, ...])

## 4. Get the final report

Now that we have a plan, we can kick off generation of the final report.

In [None]:
# kicks off report generation
await report_client.aupdate_plan(action="approve")

# waits for report generation to complete
report = await report_client.await_completion(
    timeout=10000,
    poll_interval=10,
)

In [None]:
report_text = "\n\n".join([block.template for block in report.blocks])
print(report_text)

# Attention Is All You Need: A Pure Attention-Based Architecture for Neural Machine Translation

## TLDR

The Transformer introduced a revolutionary architecture that relies entirely on attention mechanisms, eliminating the need for recurrence or convolution in sequence processing. Its key innovations include multi-head self-attention for parallel processing of input sequences, scaled dot-product attention for efficient computation, and positional encodings for sequence order awareness. The model achieved breakthrough results in machine translation (28.4 BLEU on English-to-German, 41.8 BLEU on English-to-French) while requiring significantly less training time than previous approaches, training in 3.5 days on 8 GPUs. This architecture demonstrated that attention mechanisms alone are sufficient for state-of-the-art sequence modeling, setting a new direction for natural language processing.


## Architecture Details

The Transformer architecture represents a groundbreaking approach to se

## 5. Edit the final report

Now that we have a report, we can edit it.

We can use the `asuggest_edits` method to get suggestions for edits, and then use the `aaccept_edit`/`areject_edit` methods to apply them.


In [None]:
suggestions = await report_client.asuggest_edits(
    "Can you change the TLDR header to something more professional?"
)
for suggestion in suggestions:
    print("Justification for change:", suggestion.justification)
    print("Proposed changes:")
    for block in suggestion.blocks:
        print(block.template)
        print("==================")

Justification for change: 
I'd suggest changing "TLDR" to "Executive Summary" which is more appropriate for a professional or academic report. This term is widely used in formal documents and better reflects the nature of this concise overview section while maintaining the same function of providing a quick summary of the key points.

Proposed changes:
## Executive Summary

The Transformer introduced a revolutionary architecture that relies entirely on attention mechanisms, eliminating the need for recurrence or convolution in sequence processing. Its key innovations include multi-head self-attention for parallel processing of input sequences, scaled dot-product attention for efficient computation, and positional encodings for sequence order awareness. The model achieved breakthrough results in machine translation (28.4 BLEU on English-to-German, 41.8 BLEU on English-to-French) while requiring significantly less training time than previous approaches, training in 3.5 days on 8 GPUs. Th

Changing to "Executive Summary" sounds reasonable, lets accept that!


In [None]:
for suggestion in suggestions:
    await report_client.aaccept_edit(suggestion)

## 7. Print the final report

Now that we have a report, we can print it.

In [None]:
report_response = await report_client.aget()
report_text = "\n\n".join([block.template for block in report_response.report.blocks])
print(report_text)

# Attention Is All You Need: A Pure Attention-Based Architecture for Neural Machine Translation

## Executive Summary

The Transformer introduced a revolutionary architecture that relies entirely on attention mechanisms, eliminating the need for recurrence or convolution in sequence processing. Its key innovations include multi-head self-attention for parallel processing of input sequences, scaled dot-product attention for efficient computation, and positional encodings for sequence order awareness. The model achieved breakthrough results in machine translation (28.4 BLEU on English-to-German, 41.8 BLEU on English-to-French) while requiring significantly less training time than previous approaches, training in 3.5 days on 8 GPUs. This architecture demonstrated that attention mechanisms alone are sufficient for state-of-the-art sequence modeling, setting a new direction for natural language processing.


## Architecture Details

The Transformer architecture represents a groundbreaking a

We can also see the sources for each block!

In [None]:
for block in report_response.report.blocks:
    # Each block has a list of sources, which are the nodes that were used to generate the block
    for source in block.sources:
        print(source.score)
        print(source.node.text[:100])
        print("==================")

0.99687636
# Abstract

The dominant sequence transduction models are based on complex recurrent or convolutiona
0.99591404
# 2 Background

The goal of reducing sequential computation also forms the foundation of the Extende
0.9951325
# 1 Introduction

Recurrent neural networks, long short-term memory [13] and gated recurrent [7] neu
0.99442345
# 7 Conclusion

In this work, we presented the Transformer, the first sequence transduction model ba
0.9967649
# 3.2.3 Applications of Attention in our Model

The Transformer uses multi-head attention in three d
0.99533635
# 2 Background

The goal of reducing sequential computation also forms the foundation of the Extende
0.9935868
# Abstract

The dominant sequence transduction models are based on complex recurrent or convolutiona
0.98780584
# Outputs

(shifted right)

Figure 1: The Transformer - model architecture.

The Transformer follows
0.9205043
# 3.3 Position-wise Feed-Forward Networks

In addition to attention sub-layers, each of the layer