In [3]:
from llama_index.core import StorageContext, load_index_from_storage
from constants import embed_model

storage_context = StorageContext.from_defaults(persist_dir = "index/")
index = load_index_from_storage(storage_context, embed_model=embed_model)

In [4]:
from llama_index.core.tools import QueryEngineTool
from constants import llm_model

query_engine = index.as_query_engine(llm_model=llm_model, similarity_top_k=5)
rag_tool = QueryEngineTool.from_defaults(
    query_engine=query_engine, 
    name="research_paper_query_engine_tool", 
    description="A RAG engine with recent research papers."
    )

In [5]:
from IPython.display import Markdown, display

def display_prompt_dict(prompts_dict):
    for key, prompt in prompts_dict.items():
        display(Markdown(f"**Prompt key**: {key}"))
        print(prompt.get_template())

In [6]:
prompts_dict = query_engine.get_prompts()
display_prompt_dict(prompts_dict)

**Prompt key**: response_synthesizer:text_qa_template

Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {query_str}
Answer: 


**Prompt key**: response_synthesizer:refine_template

The original query is as follows: {query_str}
We have provided an existing answer: {existing_answer}
We have the opportunity to refine the existing answer (only if needed) with some more context below.
------------
{context_msg}
------------
Given the new context, refine the original answer to better answer the query. If the context isn't useful, return the original answer.
Refined Answer: 


In [9]:
from tools import download_pdf, fetch_arxiv_papers
from llama_index.core.tools import FunctionTool

download_pdf_tool = FunctionTool.from_defaults(
    download_pdf,
    name="download_pdf_file_tool",
    description="python function that downloads a pdf file by link"
)

fetch_arxiv_tool = FunctionTool.from_defaults(
    fetch_arxiv_papers,
    name="fetch_from_arxiv",
    description="download the {max_results} recent papers regarding the topic {title} from arxiv"
)

In [10]:
from llama_index.core.agent import ReActAgent

agent = ReActAgent.from_tools([rag_tool, download_pdf_tool, fetch_arxiv_tool], llm=llm_model, verbose=True)

In [11]:
query_template = """I am interesting in {topic}
Find papers in your knowledge database related to this topic.
Use the following template to query research_paper_query_engine_tool tool: 'Provide title, summary, authors and link to download for papers related to {topic}'.
If there are not, could you fetch the recent one from arxiv?
"""

In [12]:
answer = agent.chat(query_template.format(topic="Multi-Modal Models"))

> Running step facf5855-34b0-44f5-816d-b3a3076ae88d. Step input: I am interesting in Multi-Modal Models
Find papers in your knowledge database related to this topic.
Use the following template to query research_paper_query_engine_tool tool: 'Provide title, summary, authors and link to download for papers related to Multi-Modal Models'.
If there are not, could you fetch the recent one from arxiv?

[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: research_paper_query_engine_tool
Action Input: {'input': 'Provide title, summary, authors and link to download for papers related to Multi-Modal Models'}
[0m[1;3;34mObservation: Title: Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
Authors: Diankun Wu, Fangfu Liu, Yi-Hsin Hung, Yueqi Duan
Summary: Recent advancements in Multimodal Large Language Models (MLLMs) have significantly enhanced performance on 2D visual tasks. However, impr

In [13]:
Markdown(answer.response)

Here are some recent papers related to Multi-Modal Models:

1. **Title**: Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence  
   **Authors**: Diankun Wu, Fangfu Liu, Yi-Hsin Hung, Yueqi Duan  
   **Summary**: This paper presents Spatial-MLLM, a novel framework for visual-based spatial reasoning from purely 2D observations. It proposes a dual-encoder architecture to enhance spatial understanding and introduces a space-aware frame sampling strategy for improved performance in visual-based spatial reasoning tasks.  
   **Link to Download**: [PDF](http://arxiv.org/pdf/2505.23747v1)  
   **Project Page**: [Spatial-MLLM Project](https://diankun-wu.github.io/Spatial-MLLM/)

2. **Title**: To Trust Or Not To Trust Your Vision-Language Model's Prediction  
   **Authors**: Hao Dong, Moru Liu, Jian Liang, Eleni Chatzi, Olga Fink  
   **Summary**: This work introduces TrustVLM, a framework designed to estimate the trustworthiness of predictions made by Vision-Language Models (VLMs). It proposes a confidence-scoring function to improve misclassification detection and demonstrates significant performance improvements across various datasets.  
   **Link to Download**: [PDF](http://arxiv.org/pdf/2505.23745v1)  

These papers explore advancements in Multi-Modal Models and their applications in visual and language understanding.