# üìä Multimodal Financial Report RAG Assistant (ColPali + Qwen-VL)
This Notebook demonstrates how to build a RAG system capable of "seeing" financial charts and tables:
1. **Visual Indexing**: Using the ColPali model to convert PDF pages directly into visual embeddings.
2. **Multi-page Retrieval**: Retrieving Top-K raw page screenshots based on user queries.
3. **Intelligent Analysis**: Sending multiple screenshots to Qwen2.5-VL-72B for deep financial analysis.

### 1. Environment Preparation
Install `byaldi` (a wrapper for ColPali) and related dependencies.

In [None]:
pip install byaldi openai tqdm

### 2. Phase 1: Building the Visual Index
Use the ColPali model to index the PDF. The advantage of ColPali is that it bypasses traditional OCR, directly understanding page layouts, charts, and tables.

In [None]:
import os
from byaldi import RAGMultiModalModel

# Optional environment configuration (Offline mode or Mirror site)
os.environ["HF_HUB_OFFLINE"] = "1"
os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"

MODEL_PATH = "/home/xuxin123/book/project_5_rag/models/colpali-v1_2-merged"
PDF_PATH = "../data/annual_report_2024_cn.pdf"
INDEX_NAME = "finance_report_2024"

def build_visual_index():
    if not os.path.exists(MODEL_PATH):
        print("‚ùå Model folder not found. Please verify the path.")
        return

    # Load model (use load_in_4bit=True if GPU memory is limited)
    RAG = RAGMultiModalModel.from_pretrained(MODEL_PATH, verbose=1)

    print(f"üìñ Building visual index for {PDF_PATH}...")
    RAG.index(
        input_path=PDF_PATH,
        index_name=INDEX_NAME,
        store_collection_with_index=True,
        overwrite=True
    )
    print(f"‚úÖ Index saved to: .byaldi/{INDEX_NAME}")

if os.path.exists(PDF_PATH):
    build_visual_index()
else:
    print("‚ùå PDF file not found. Please check the path.")

### 3. Phase 2: Multimodal Chat & Multi-page Augmented Retrieval
Configure the LLM client and implement Top-K retrieval logic to handle cross-page financial analysis.

In [None]:
from openai import OpenAI
import base64

# --- Configuration ---
API_KEY = "YOUR_API_KEY"
BASE_URL = "https://api.siliconflow.cn/v1"
MODEL_NAME = "Qwen/Qwen2.5-VL-72B-Instruct"
RETRIEVAL_K = 4 # Retrieve Top 4 pages to mitigate noise from TOC or cover pages

client = OpenAI(api_key=API_KEY, base_url=BASE_URL)

# Load Index
try:
    RAG = RAGMultiModalModel.from_index(INDEX_NAME)
    print("‚úÖ Retriever ready")
except: 
    print("‚ùå Please run the previous phase to build the index first")

def ask_finance_helper(query):
    # 1. Visual Retrieval
    results = RAG.search(query, k=RETRIEVAL_K)
    
    # 2. Construct Multi-image Payload
    messages_content = [
        {
            "type": "text", 
            "text": f"You are a professional CFO. Based on the following {len(results)} screenshots, answer this query: {query}. Please ignore the Table of Contents if present."
        }
    ]
    
    print(f"üîç Hit Page Numbers: {[res.page_num for res in results]}")
    
    for res in results:
        messages_content.append({
            "type": "image_url",
            "image_url": {"url": f"data:image/jpeg;base64,{res.base64}", "detail": "high"}
        })
        
    # 3. Cloud Inference
    response = client.chat.completions.create(
        model=MODEL_NAME,
        messages=[{"role": "user", "content": messages_content}],
        temperature=0.1
    )
    return response.choices[0].message.content

### 4. Running a Test Query

In [None]:
test_query = "What is the revenue growth rate for 2024? Please explain based on the profit and loss statement charts."
answer = ask_finance_helper(test_query)
print("\nü§ñ CFO Assistant Answer:\n", answer)