# Multimodal RAG with RAG-Anything and Ollama

This notebook demonstrates how to build a multimodal Retrieval-Augmented Generation (RAG) system using the RAG-Anything library integrated with local Ollama models for chat, embedding, and vision capabilities.

In [None]:
!pip install 'raganything[all]'

## Prerequisites

- Ollama installed and running locally (download from https://ollama.ai)
- Pull required models:
  - `ollama pull llama3.2` (for text generation)
  - `ollama pull llava` (for vision tasks)
  - `ollama pull nomic-embed-text` (for embeddings)
- Python 3.8+ with asyncio support

In [None]:
import asyncio
import requests
import json
from typing import List, Dict, Any
from raganything import RAGAnything, RAGAnythingConfig

## Ollama Utility Functions

These functions handle communication with the local Ollama server for:
- Text chat generation
- Text embedding
- Vision-based image analysis

In [None]:
OLLAMA_BASE_URL = "http://localhost:11434"

async def chat_with_ollama(prompt: str, model: str = "llama3.2") -> str:
    """Generate text response using Ollama chat model."""
    try:
        response = requests.post(
            f"{OLLAMA_BASE_URL}/api/generate",
            json={"model": model, "prompt": prompt, "stream": False},
            timeout=30
        )
        response.raise_for_status()
        return response.json()["response"]
    except requests.RequestException as e:
        raise Exception(f"Ollama chat error: {e}")

async def embed_with_ollama(text: str, model: str = "nomic-embed-text") -> List[float]:
    """Generate embeddings for text using Ollama embedding model."""
    try:
        response = requests.post(
            f"{OLLAMA_BASE_URL}/api/embeddings",
            json={"model": model, "prompt": text},
            timeout=30
        )
        response.raise_for_status()
        return response.json()["embedding"]
    except requests.RequestException as e:
        raise Exception(f"Ollama embedding error: {e}")

async def vision_with_ollama(image_path: str, prompt: str, model: str = "qwen3-vl:8b") -> str:
    """Analyze image using Ollama vision model."""
    try:
        with open(image_path, "rb") as img_file:
            image_data = img_file.read()
        
        response = requests.post(
            f"{OLLAMA_BASE_URL}/api/generate",
            json={
                "model": model,
                "prompt": prompt,
                "images": [image_data.hex()],
                "stream": False
            },
            timeout=60
        )
        response.raise_for_status()
        return response.json()["response"]
    except FileNotFoundError:
        raise Exception(f"Image file not found: {image_path}")
    except requests.RequestException as e:
        raise Exception(f"Ollama vision error: {e}")

## Custom Functions for RAG-Anything

Define the required function interfaces that RAG-Anything expects for LLM, vision, and embedding operations.

In [None]:
async def llm_model_func(messages: List[Dict[str, Any]]) -> str:
    """LLM function for RAG-Anything - extracts the last user message and generates response."""
    user_message = messages[-1]["content"] if messages else ""
    return await chat_with_ollama(user_message)

async def vision_model_func(image_path: str, prompt: str) -> str:
    """Vision function for RAG-Anything - analyzes image with given prompt."""
    return await vision_with_ollama(image_path, prompt)

async def embedding_func(text: str) -> List[float]:
    """Embedding function for RAG-Anything - generates embeddings for text."""
    return await embed_with_ollama(text)

## Configuration

Set up the RAGAnythingConfig with the custom Ollama-based functions and initialize the RAG system.

In [None]:
# Configure RAG-Anything with Ollama functions
config = RAGAnythingConfig(
    llm_model_func=llm_model_func,
    vision_model_func=vision_model_func,
    embedding_func=embedding_func,
    chunk_size=1000,
    chunk_overlap=200,
    vector_db_path="./vector_db",
    enable_vision=True
)

# Initialize RAG system
rag = RAGAnything(config)
print("RAG-Anything initialized with Ollama integration")

## Document Processing

Process a sample multimodal document (containing text and potentially images) for the RAG system.

In [None]:
# Process a sample document (assuming it exists in the docs folder)
# This could be a PDF, HTML, or markdown file with embedded images
sample_doc_path = "docs/sample.md"  # Adjust path as needed

try:
    await rag.process_document(sample_doc_path)
    print(f"Successfully processed document: {sample_doc_path}")
except FileNotFoundError:
    print(f"Document not found: {sample_doc_path}. Please ensure the file exists.")
except Exception as e:
    print(f"Error processing document: {e}")

## Querying

Demonstrate text-based queries and vision-enhanced queries using the multimodal RAG system.

In [None]:
# Example 1: Text-based query
async def text_query_example():
    query = "What are the main topics covered in the document?"
    try:
        result = await rag.query(query)
        print(f"Query: {query}")
        print(f"Response: {result}")
    except Exception as e:
        print(f"Error in text query: {e}")

# Example 2: Vision-enhanced query (assuming document contains images)
async def vision_query_example():
    image_path = "docs/sample_image.jpg"  # Adjust path to an actual image
    prompt = "Describe what you see in this image and how it relates to the document content."
    try:
        result = await rag.query_with_vision(prompt, image_path)
        print(f"Vision Query: {prompt}")
        print(f"Response: {result}")
    except FileNotFoundError:
        print(f"Image not found: {image_path}. Vision query skipped.")
    except Exception as e:
        print(f"Error in vision query: {e}")

# Run examples
await text_query_example()
print("\n---\n")
await vision_query_example()

# Multimodal RAG with RAG-Anything and Ollama

This notebook demonstrates how to build a multimodal Retrieval-Augmented Generation (RAG) system using the RAG-Anything library integrated with local Ollama models for chat, embedding, and vision capabilities.

In [None]:
!pip install rag-anything[all]

## Prerequisites

- Ollama installed and running locally (download from https://ollama.ai)
- Pull required models:
  - `ollama pull llama3.2` (for text generation)
  - `ollama pull llava` (for vision tasks)
  - `ollama pull nomic-embed-text` (for embeddings)
- Python 3.8+ with asyncio support

In [None]:
import asyncio
import requests
import json
from typing import List, Dict, Any
from rag_anything import RAGAnything, RAGAnythingConfig

## Ollama Utility Functions

These functions handle communication with the local Ollama server for:
- Text chat generation
- Text embedding
- Vision-based image analysis

In [None]:
OLLAMA_BASE_URL = "http://localhost:11434"

async def chat_with_ollama(prompt: str, model: str = "llama3.2") -> str:
    """Generate text response using Ollama chat model."""
    try:
        response = requests.post(
            f"{OLLAMA_BASE_URL}/api/generate",
            json={"model": model, "prompt": prompt, "stream": False},
            timeout=30
        )
        response.raise_for_status()
        return response.json()["response"]
    except requests.RequestException as e:
        raise Exception(f"Ollama chat error: {e}")

async def embed_with_ollama(text: str, model: str = "nomic-embed-text") -> List[float]:
    """Generate embeddings for text using Ollama embedding model."""
    try:
        response = requests.post(
            f"{OLLAMA_BASE_URL}/api/embeddings",
            json={"model": model, "prompt": text},
            timeout=30
        )
        response.raise_for_status()
        return response.json()["embedding"]
    except requests.RequestException as e:
        raise Exception(f"Ollama embedding error: {e}")

async def vision_with_ollama(image_path: str, prompt: str, model: str = "llava") -> str:
    """Analyze image using Ollama vision model."""
    try:
        with open(image_path, "rb") as img_file:
            image_data = img_file.read()
        
        response = requests.post(
            f"{OLLAMA_BASE_URL}/api/generate",
            json={
                "model": model,
                "prompt": prompt,
                "images": [image_data.hex()],
                "stream": False
            },
            timeout=60
        )
        response.raise_for_status()
        return response.json()["response"]
    except FileNotFoundError:
        raise Exception(f"Image file not found: {image_path}")
    except requests.RequestException as e:
        raise Exception(f"Ollama vision error: {e}")

## Custom Functions for RAG-Anything

Define the required function interfaces that RAG-Anything expects for LLM, vision, and embedding operations.

In [None]:
async def llm_model_func(messages: List[Dict[str, Any]]) -> str:
    """LLM function for RAG-Anything - extracts the last user message and generates response."""
    user_message = messages[-1]["content"] if messages else ""
    return await chat_with_ollama(user_message)

async def vision_model_func(image_path: str, prompt: str) -> str:
    """Vision function for RAG-Anything - analyzes image with given prompt."""
    return await vision_with_ollama(image_path, prompt)

async def embedding_func(text: str) -> List[float]:
    """Embedding function for RAG-Anything - generates embeddings for text."""
    return await embed_with_ollama(text)

## Configuration

Set up the RAGAnythingConfig with the custom Ollama-based functions and initialize the RAG system.

In [None]:
# Configure RAG-Anything with Ollama functions
config = RAGAnythingConfig(
    llm_model_func=llm_model_func,
    vision_model_func=vision_model_func,
    embedding_func=embedding_func,
    chunk_size=1000,
    chunk_overlap=200,
    vector_db_path="./vector_db",
    enable_vision=True
)

# Initialize RAG system
rag = RAGAnything(config)
print("RAG-Anything initialized with Ollama integration")

## Document Processing

Process a sample multimodal document (containing text and potentially images) for the RAG system.

In [None]:
# Process a sample document (assuming it exists in the docs folder)
# This could be a PDF, HTML, or markdown file with embedded images
sample_doc_path = "docs/sample.md"  # Adjust path as needed

try:
    await rag.process_document(sample_doc_path)
    print(f"Successfully processed document: {sample_doc_path}")
except FileNotFoundError:
    print(f"Document not found: {sample_doc_path}. Please ensure the file exists.")
except Exception as e:
    print(f"Error processing document: {e}")

## Querying

Demonstrate text-based queries and vision-enhanced queries using the multimodal RAG system.

In [None]:
# Example 1: Text-based query
async def text_query_example():
    query = "What are the main topics covered in the document?"
    try:
        result = await rag.query(query)
        print(f"Query: {query}")
        print(f"Response: {result}")
    except Exception as e:
        print(f"Error in text query: {e}")

# Example 2: Vision-enhanced query (assuming document contains images)
async def vision_query_example():
    image_path = "docs/sample_image.jpg"  # Adjust path to an actual image
    prompt = "Describe what you see in this image and how it relates to the document content."
    try:
        result = await rag.query_with_vision(prompt, image_path)
        print(f"Vision Query: {prompt}")
        print(f"Response: {result}")
    except FileNotFoundError:
        print(f"Image not found: {image_path}. Vision query skipped.")
    except Exception as e:
        print(f"Error in vision query: {e}")

# Run examples
await text_query_example()
print("\n---\n")
await vision_query_example()