In [11]:
from llama_index.core import StorageContext, load_index_from_storage
from constants import embed_model

storage_context = StorageContext.from_defaults(persist_dir = "index/")
index = load_index_from_storage(storage_context, embed_model=embed_model)

In [12]:
from llama_index.core.tools import QueryEngineTool
from constants import llm_model

query_engine = index.as_query_engine(llm_model=llm_model, similarity_top_k=5)
rag_tool = QueryEngineTool.from_defaults(
    query_engine=query_engine, 
    name="research_paper_query_engine_tool", 
    description="A RAG engine with recent research papers."
    )

In [13]:
from IPython.display import Markdown, display

def display_prompt_dict(prompts_dict):
    for key, prompt in prompts_dict.items():
        display(Markdown(f"**Prompt key**: {key}"))
        print(prompt.get_template())

In [14]:
prompts_dict = query_engine.get_prompts()
display_prompt_dict(prompts_dict)

**Prompt key**: response_synthesizer:text_qa_template

Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {query_str}
Answer: 


**Prompt key**: response_synthesizer:refine_template

The original query is as follows: {query_str}
We have provided an existing answer: {existing_answer}
We have the opportunity to refine the existing answer (only if needed) with some more context below.
------------
{context_msg}
------------
Given the new context, refine the original answer to better answer the query. If the context isn't useful, return the original answer.
Refined Answer: 


In [15]:
from tools import download_pdf, fetch_arxiv_papers
from llama_index.core.tools import FunctionTool

download_pdf_tool = FunctionTool.from_defaults(
    download_pdf,
    name="download_pdf_file_tool",
    description="python function that downloads a pdf file by link"
)

fetch_arxiv_tool = FunctionTool.from_defaults(
    fetch_arxiv_papers,
    name="fetch_from_arxiv",
    description="download the {max_results} recent papers regarding the topic {title} from arxiv"
)

In [16]:
from llama_index.core.agent import ReActAgent

agent = ReActAgent.from_tools([rag_tool, download_pdf_tool, fetch_arxiv_tool], llm=llm_model, verbose=True)

In [17]:
query_template = """I am interesting in {topic}
Find papers in your knowledge database related to this topic.
Use the following template to query research_paper_query_engine_tool tool: 'Provide title, summary, authors and link to download for papers related to {topic}'.
If there are not, could you fetch the recent one from arxiv?
IMPORTANT: do not download papers unless the user ask for it explicitly.
"""

In [18]:
answer = agent.chat(query_template.format(topic="Multi-Modal Models"))

> Running step e48990a3-907a-4321-a208-3e5d39099b77. Step input: I am interesting in Multi-Modal Models
Find papers in your knowledge database related to this topic.
Use the following template to query research_paper_query_engine_tool tool: 'Provide title, summary, authors and link to download for papers related to Multi-Modal Models'.
If there are not, could you fetch the recent one from arxiv?
IMPORTANT: do not download papers unless the user ask for it explicitly.

[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me find papers related to Multi-Modal Models.
Action: research_paper_query_engine_tool
Action Input: {'input': 'Provide title, summary, authors and link to download for papers related to Multi-Modal Models'}
[0m[1;3;34mObservation: Title: OpenUni: A Simple Baseline for Unified Multimodal Understanding and Generation
Authors: Size Wu, Zhonghua Wu, Zerui Gong, Qingyi Tao, Sheng Jin, Qinyue Li, Wei Li, Chen Change Loy
Summary

In [19]:
Markdown(answer.response)

Here are some recent papers related to Multi-Modal Models:

1. **Title:** OpenUni: A Simple Baseline for Unified Multimodal Understanding and Generation  
   **Authors:** Size Wu, Zhonghua Wu, Zerui Gong, Qingyi Tao, Sheng Jin, Qinyue Li, Wei Li, Chen Change Loy  
   **Summary:** In this report, we present OpenUni, a simple, lightweight, and fully open-source baseline for unifying multimodal understanding and generation. Inspired by prevailing practices in unified model learning, we adopt an efficient training strategy that minimizes the training complexity and overhead by bridging the off-the-shelf multimodal large language models (LLMs) and diffusion models through a set of learnable queries and a light-weight transformer-based connector.  
   **PDF URL:** [Download PDF](http://arxiv.org/pdf/2505.23661v1)

2. **Title:** VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos  
   **Authors:** Tingyu Song, Tongyan Hu, Guo Gan, Yilun Zhao  
   **Summary:** MLLMs have been widely studied for video question answering recently. However, most existing assessments focus on natural videos, overlooking synthetic videos, such as AI-generated content (AIGC). Meanwhile, some works in video generation rely on MLLMs to evaluate the quality of generated videos, but the capabilities of MLLMs on interpreting AIGC videos remain largely underexplored. To address this, we propose a new benchmark, VF-Eval, which introduces four tasks-coherence validation, error awareness, error type detection, and reasoning evaluation-to comprehensively evaluate the abilities of MLLMs on AIGC videos.  
   **PDF URL:** [Download PDF](http://arxiv.org/pdf/2505.23693v1)

3. **Title:** Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence  
   **Authors:** Diankun Wu, Fangfu Liu, Yi-Hsin Hung, Yueqi Duan  
   **Summary:** Recent advancements in Multimodal Large Language Models (MLLMs) have significantly enhanced performance on 2D visual tasks. However, improving their spatial intelligence remains a challenge. Existing 3D MLLMs always rely on additional 3D or 2.5D data to incorporate spatial awareness, restricting their utility in scenarios with only 2D inputs, such as images or videos. In this paper, we present Spatial-MLLM, a novel framework for visual-based spatial reasoning from purely 2D observations.  
   **PDF URL:** [Download PDF](http://arxiv.org/pdf/2505.23747v1)

If you would like to download any of these papers, please let me know!

In [20]:
answer = agent.chat("""Download the following papers: 
                    1. Process one paper at the time.
                    2. State which papaer number you are ptocessing of the tool
                    3. Complete a full download cycle before moving to the next paper
                    4. Explicity state when moving to the next paper
                    5. Provide a final summary only after all papers are download
                    """)

> Running step e860cfd6-0392-410a-86ca-4ffec3522a64. Step input: Download the following papers: 
                    1. Process one paper at the time.
                    2. State which papaer number you are ptocessing of the tool
                    3. Complete a full download cycle before moving to the next paper
                    4. Explicity state when moving to the next paper
                    5. Provide a final summary only after all papers are download
                    
[1;3;38;5;200mThought: I will start downloading the first paper related to Multi-Modal Models.
Action: download_pdf_file_tool
Action Input: {'pdf_url': 'http://arxiv.org/pdf/2505.23661v1', 'output_file_name': 'OpenUni.pdf'}
[0m[1;3;34mObservation: PDF downloaded successfully and saved as 'papers/OpenUni.pdf'.
[0m> Running step 7c43df4d-ea71-4ab6-8cac-dd18123f5dda. Step input: None
[1;3;38;5;200mThought: I have successfully downloaded the first paper titled "OpenUni: A Simple Baseline for Unified Multi

In [21]:
Markdown(answer.response)

1. **OpenUni: A Simple Baseline for Unified Multimodal Understanding and Generation**  
   - **Authors:** Size Wu, Zhonghua Wu, Zerui Gong, Qingyi Tao, Sheng Jin, Qinyue Li, Wei Li, Chen Change Loy  
   - **Summary:** This paper presents OpenUni, a lightweight and open-source baseline for unifying multimodal understanding and generation, utilizing a training strategy that bridges multimodal large language models and diffusion models.

2. **VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos**  
   - **Authors:** Tingyu Song, Tongyan Hu, Guo Gan, Yilun Zhao  
   - **Summary:** This work introduces VF-Eval, a benchmark for evaluating multimodal large language models on AI-generated content videos, focusing on tasks like coherence validation and reasoning evaluation.

3. **Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence**  
   - **Authors:** Diankun Wu, Fangfu Liu, Yi-Hsin Hung, Yueqi Duan  
   - **Summary:** The paper presents Spatial-MLLM, a framework for visual-based spatial reasoning from 2D observations, addressing the challenge of improving spatial intelligence in multimodal large language models.

All papers have been successfully downloaded and summarized. If you need further assistance, feel free to ask!

In [22]:
answer = agent.chat(query_template.format(topic="The history of Soccer"))

> Running step 08b61ef7-9132-4440-8c30-ebe63f397cf1. Step input: I am interesting in The history of Soccer
Find papers in your knowledge database related to this topic.
Use the following template to query research_paper_query_engine_tool tool: 'Provide title, summary, authors and link to download for papers related to The history of Soccer'.
If there are not, could you fetch the recent one from arxiv?
IMPORTANT: do not download papers unless the user ask for it explicitly.

[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: research_paper_query_engine_tool
Action Input: {'input': 'Provide title, summary, authors and link to download for papers related to The history of Soccer'}
[0m[1;3;34mObservation: Title: Not Found
Authors: Not Found
Summary: Not Found
PDF URL: Not Found
[0m> Running step 725db800-42ed-44cd-891d-c75b00360d2b. Step input: None
[1;3;38;5;200mThought: It seems there are no papers availa

In [23]:
Markdown(answer.response)

Unfortunately, I could not find any papers related to the history of soccer. If you have any other topics in mind or need assistance with something else, please let me know!