# Multimodal RAG usage

This notebooks, goes over the usage of the provided [multimodal rag src module](../src/multimodal.py) amd some query examples, to understand the details of the implementation please refer to the [multimodal rag notebook](./2_multimodal_rag.ipynb) 

In [2]:
import os
import sys

current_dir = os.getcwd()
kit_dir = os.path.abspath(os.path.join(current_dir, ".."))
repo_dir = os.path.abspath(os.path.join(kit_dir, ".."))

sys.path.append(kit_dir)
sys.path.append(repo_dir)

from src.multimodal_rag import MultimodalRetrieval

from dotenv import load_dotenv

load_dotenv(os.path.join(repo_dir, '.env'), override=True)

True

## Instantiate the MultimodalRetrieval module

In [3]:
multimodal = MultimodalRetrieval(
    sambanova_api_base="https://api.sambanova.ai/v1",
    sambanova_api_key= os.environ.get("SAMBANOVA_API_KEY")
)

## Parse input document

In [6]:
# Ensure poppler binaries are visible to Python
os.environ["PATH"] = "/opt/homebrew/bin:" + os.environ["PATH"]
os.environ["POPPLER_PATH"] = "/opt/homebrew/bin"

filepath = os.path.join(kit_dir,"data/sample_docs/invoicesample.pdf")
raw_pdf_elements, output_path = multimodal.extract_pdf(filepath)

2025-12-09 19:12:43,555 [INFO] - Reading PDF for file: /Users/jorgep/Documents/ask_public_own/ai-starter-kit-snova/multimodal_knowledge_retriever/data/sample_docs/invoicesample.pdf ...




2025-12-09 19:12:47,241 [INFO] - Loading the Table agent ...
The `max_size` parameter is deprecated and will be removed in v4.26. Please specify in `size['longest_edge'] instead`.
2025-12-09 19:12:47,491 [INFO] - Loading the table structure model ...
2025-12-09 19:12:47,725 [INFO] - Loading pretrained weights from Hugging Face hub (timm/resnet18.a1_in1k)
2025-12-09 19:12:47,993 [INFO] - [timm/resnet18.a1_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors.
2025-12-09 19:12:48,001 [INFO] - Missing keys (fc.weight, fc.bias) discovered while loading pretrained weights. This is expected if model is being adapted.


## Process parsing outputs

In [7]:
text_docs, table_docs, image_paths = multimodal.process_raw_elements(raw_pdf_elements, output_path)

## Create a vectorstore

In [8]:
retriever = multimodal.create_vectorstore()

2025-12-09 19:12:52,547 [INFO] - This is the collection name: collection_ef2be80f-2060-415f-83b6-7622f3611ff2


In [9]:
retriever = multimodal.vectorstore_ingest(retriever, text_docs, table_docs, image_paths, summarize_texts=True, summarize_tables=True)

2025-12-09 19:12:55,939 [INFO] - HTTP Request: POST https://api.sambanova.ai/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-09 19:12:56,582 [INFO] - HTTP Request: POST https://api.sambanova.ai/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-09 19:12:57,602 [INFO] - HTTP Request: POST https://api.sambanova.ai/v1/embeddings "HTTP/1.1 200 OK"
2025-12-09 19:13:01,277 [INFO] - HTTP Request: POST https://api.sambanova.ai/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-09 19:13:02,018 [INFO] - HTTP Request: POST https://api.sambanova.ai/v1/embeddings "HTTP/1.1 200 OK"


## Using summaries of images to get a final response

In [10]:
multimodal.set_retrieval_chain(retriever, image_retrieval_type="summary")

In [11]:
multimodal.call("how many apples they bought")

2025-12-09 19:13:02,484 [INFO] - USER QUERY: how many apples they bought
2025-12-09 19:13:03,214 [INFO] - HTTP Request: POST https://api.sambanova.ai/v1/embeddings "HTTP/1.1 200 OK"
2025-12-09 19:13:04,630 [INFO] - HTTP Request: POST https://api.sambanova.ai/v1/chat/completions "HTTP/1.1 200 OK"


{'question': 'how many apples they bought',
 'answer': 'Based on the invoice, they purchased **1\u202fkg of apples**.',
 'source_documents': [Document(metadata={'filetype': 'application/pdf', 'languages': ['eng'], 'last_modified': '2024-10-08T16:44:41', 'page_number': 1, 'text_as_html': '<table><tbody><tr><td>Apple</td><td>$5.00</td><td>1</td><td>$5.00</td></tr><tr><td>Orange</td><td>$1.99</td><td>2</td><td>$3.98</td></tr><tr><td>Watermelon</td><td>$1.69</td><td>3</td><td>$5.07</td></tr><tr><td>Mango</td><td>$9.56</td><td>2</td><td>$19.12</td></tr><tr><td>Peach</td><td>$2.99</td><td>1</td><td>$2.99</td></tr></tbody></table>', 'orig_elements': 'eJy9V11v2zYU/SuE1ocWCCV+SwyKAsU6DHlokiHu9hAUBkVeO2pkyZDopGnR/z6ScpN0dgc0gPOQxPf4Hn6cw0veXH7NoIUVdH7euOwYZUIyShRobG1pseCmwnWlHaa0ZhVnUshSZEcoW4E3zngTOF8z2/eDazrjYUxxa+76jZ9fQbO88gFhnOvA2cK3jfNXAaVK8oCu+6bzkXd5qTXL5RGiQuTi4xH6HjOqchljKkmZsz3AxAhINt6NHlZxJ+fNZ2gv1sZC9i184cCD9U3fzW1rxnG+Hvo6pJGcKS5VSFg0Lfi7NSTu+fssLbhbbswy7eoyg26ZfUzo6Oer3jWLBpJmjDCB

In [12]:
multimodal.call("whats the address of the store")

2025-12-09 19:13:06,714 [INFO] - USER QUERY: whats the address of the store
2025-12-09 19:13:07,441 [INFO] - HTTP Request: POST https://api.sambanova.ai/v1/embeddings "HTTP/1.1 200 OK"
2025-12-09 19:13:08,545 [INFO] - HTTP Request: POST https://api.sambanova.ai/v1/chat/completions "HTTP/1.1 200 OK"


{'question': 'whats the address of the store',
 'answer': 'The store’s address is:\n\n**123 Somewhere St, Melbourne VIC 3000**.',
 'source_documents': [Document(metadata={'filetype': 'application/pdf', 'languages': ['cat'], 'last_modified': '2024-10-08T16:44:41', 'page_number': 1, 'orig_elements': 'eJzNlltv5DQYhv+KFYEEaGz5FB96t1cgAWWlzl6gqho5Pkwt5TAbO7Bl2f+OcyhUu7OgIs2qVxm/4zex3+fzl9y+r3zrO9/nQ3TVFagwabwOGkPiJIa8lgTq2jZQyiCs4NhjLqsdqDqfjTPZFM/7yg7D6GJvsk/LuDUPw5QP9z4e73NRKGO6eDb59+jyfVGJqFlRT0Ps8+y7vRUSlXlE1wTxux3YxhRLuY6VZEidETZHUar0kLLv5p28ju98e3My1lcfyh/OZ29zHPqDbU1Kh9M4NGUaRkIzLMuEEFufH05+8b7+uVoW3B8nc1x2dVv5/ljdLWrKh25wMUS/ZEYx5ZBgiNWeiCvOrziZ3afiPPRT1/hx3u28iOzfzXlU+x9eXf8Ifv3lzTzv8aH7mNtlrR8z4Y0gOnADRa2awsR52AgboGN1HWRgnkl+MSY1RaRETghGeo78cawFogsCoRE/KyyO/8dEU8zpF2byHfhpGH0H4ilNHXBDO4wgxQxMiXUH7NCnslafpxEYF08x2dgfgW9jRuBVG99OpgNpcKb1CThzis2UQPDjjHHqELieevvEuAOdOfYGJO9AsgX3GNPbyQM7jWlKO+BHk0FbHjj9c7dp7M1ucQRjYxtTTGXKANwUgc/rutHTgro2Y7lN/M3v5y2eKSwpaNMQSmEtS1pcYweNxQrWnqnakMCIdBcr

In [13]:
multimodal.call("what is the main color in the business logo")

2025-12-09 19:13:12,778 [INFO] - USER QUERY: what is the main color in the business logo
2025-12-09 19:13:13,506 [INFO] - HTTP Request: POST https://api.sambanova.ai/v1/embeddings "HTTP/1.1 200 OK"
2025-12-09 19:13:14,677 [INFO] - HTTP Request: POST https://api.sambanova.ai/v1/chat/completions "HTTP/1.1 200 OK"


{'question': 'what is the main color in the business logo',
 'answer': 'The primary color featured in the logo is gold.',
 'source_documents': [Document(metadata={'type': 'image', 'file_directory': '/Users/jorgep/Documents/ask_public_own/ai-starter-kit-snova/multimodal_knowledge_retriever/data/sample_docs/invoicesample', 'filename': 'figure-1-1.jpg'}, page_content='The image is a digital graphic of a logo for Sunny Farm, a produce company based in Victoria, Australia.\n\nAt the center of the image is a circular logo with a gold border and a brown ribbon banner across its middle. The logo features a stylized illustration of a green hill with trees on either side, set against a yellow sunburst background. The words "AUSTRALIA FRESH PRODUCE" are written in white text along the top curve of the circle, while "VICTORIA" is written in smaller white text along the bottom curve. In the center of the banner, the words "SUNNY FARM" are written in large white text.\n\nBelow the logo, the company\

In [14]:
multimodal.call("what is written in the logo")

2025-12-09 19:13:18,018 [INFO] - USER QUERY: what is written in the logo
2025-12-09 19:13:18,744 [INFO] - HTTP Request: POST https://api.sambanova.ai/v1/embeddings "HTTP/1.1 200 OK"
2025-12-09 19:13:19,817 [INFO] - HTTP Request: POST https://api.sambanova.ai/v1/chat/completions "HTTP/1.1 200 OK"


{'question': 'what is written in the logo',
 'answer': 'The logo contains three lines of text:\n\n- **“AUSTRALIA\u202fFRESH\u202fPRODUCE”** – written in white along the top curve of the circle.  \n- **“SUNNY\u202fFARM”** – written in large white letters on the brown ribbon banner across the middle.  \n- **“VICTORIA”** – written in smaller white letters along the bottom curve of the circle.',
 'source_documents': [Document(metadata={'type': 'image', 'file_directory': '/Users/jorgep/Documents/ask_public_own/ai-starter-kit-snova/multimodal_knowledge_retriever/data/sample_docs/invoicesample', 'filename': 'figure-1-1.jpg'}, page_content='The image is a digital graphic of a logo for Sunny Farm, a produce company based in Victoria, Australia.\n\nAt the center of the image is a circular logo with a gold border and a brown ribbon banner across its middle. The logo features a stylized illustration of a green hill with trees on either side, set against a yellow sunburst background. The words "AUS

## Using raw images to get a final response

In [15]:
multimodal.set_retrieval_chain(retriever, image_retrieval_type="raw")

In [16]:
multimodal.call("how many apples they bought")

2025-12-09 19:13:23,372 [INFO] - USER QUERY: how many apples they bought
2025-12-09 19:13:24,094 [INFO] - HTTP Request: POST https://api.sambanova.ai/v1/embeddings "HTTP/1.1 200 OK"
2025-12-09 19:13:27,071 [INFO] - HTTP Request: POST https://api.sambanova.ai/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-09 19:13:27,082 [INFO] - PARTIAL ANSWERS FROM IMAGES: ['answer not in context']
2025-12-09 19:13:27,969 [INFO] - HTTP Request: POST https://api.sambanova.ai/v1/chat/completions "HTTP/1.1 200 OK"


{'question': 'how many apples they bought',
 'answer': 'They purchased\u202f1\u202fkg of apples.',
 'source_documents': [Document(metadata={'type': 'image', 'file_directory': '/Users/jorgep/Documents/ask_public_own/ai-starter-kit-snova/multimodal_knowledge_retriever/data/sample_docs/invoicesample', 'filename': 'figure-1-1.jpg'}, page_content='The image is a digital graphic of a logo for Sunny Farm, a produce company based in Victoria, Australia.\n\nAt the center of the image is a circular logo with a gold border and a brown ribbon banner across its middle. The logo features a stylized illustration of a green hill with trees on either side, set against a yellow sunburst background. The words "AUSTRALIA FRESH PRODUCE" are written in white text along the top curve of the circle, while "VICTORIA" is written in smaller white text along the bottom curve. In the center of the banner, the words "SUNNY FARM" are written in large white text.\n\nBelow the logo, the company\'s address is displayed

In [17]:
multimodal.call("what is the main color in the business logo")

2025-12-09 19:13:28,952 [INFO] - USER QUERY: what is the main color in the business logo
2025-12-09 19:13:29,685 [INFO] - HTTP Request: POST https://api.sambanova.ai/v1/embeddings "HTTP/1.1 200 OK"
2025-12-09 19:13:32,244 [INFO] - HTTP Request: POST https://api.sambanova.ai/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-09 19:13:32,254 [INFO] - PARTIAL ANSWERS FROM IMAGES: ['The main color in the business logo is yellow. The logo features a prominent yellow sunburst in the center, which is the dominant visual element. The yellow color is used to represent the sun and the sunny aspect of the farm\'s name, "Sunny Farm."']
2025-12-09 19:13:34,483 [INFO] - HTTP Request: POST https://api.sambanova.ai/v1/chat/completions "HTTP/1.1 200 OK"


{'question': 'what is the main color in the business logo',
 'answer': 'The main color in the business logo is yellow.',
 'source_documents': [Document(metadata={'type': 'image', 'file_directory': '/Users/jorgep/Documents/ask_public_own/ai-starter-kit-snova/multimodal_knowledge_retriever/data/sample_docs/invoicesample', 'filename': 'figure-1-1.jpg'}, page_content='The image is a digital graphic of a logo for Sunny Farm, a produce company based in Victoria, Australia.\n\nAt the center of the image is a circular logo with a gold border and a brown ribbon banner across its middle. The logo features a stylized illustration of a green hill with trees on either side, set against a yellow sunburst background. The words "AUSTRALIA FRESH PRODUCE" are written in white text along the top curve of the circle, while "VICTORIA" is written in smaller white text along the bottom curve. In the center of the banner, the words "SUNNY FARM" are written in large white text.\n\nBelow the logo, the company\'

In [18]:
multimodal.call("what is written in the logo")

2025-12-09 19:13:35,433 [INFO] - USER QUERY: what is written in the logo
2025-12-09 19:13:36,432 [INFO] - HTTP Request: POST https://api.sambanova.ai/v1/embeddings "HTTP/1.1 200 OK"
2025-12-09 19:13:39,169 [INFO] - HTTP Request: POST https://api.sambanova.ai/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-09 19:13:39,180 [INFO] - PARTIAL ANSWERS FROM IMAGES: ['The logo features the text "Australia Fresh Produce", "Sunny Farm", and "Victoria". The words are arranged in a circular pattern around a central image of a sun rising over a landscape, with "Australia Fresh Produce" at the top, "Sunny Farm" in the middle, and "Victoria" at the bottom.\n\nTherefore, the text written in the logo is: Australia Fresh Produce, Sunny Farm, Victoria.']
2025-12-09 19:13:39,645 [INFO] - HTTP Request: POST https://api.sambanova.ai/v1/chat/completions "HTTP/1.1 200 OK"


{'question': 'what is written in the logo',
 'answer': 'The logo contains the text\u202f“Australia Fresh Produce,” “Sunny Farm,” and “Victoria.”',
 'source_documents': [Document(metadata={'type': 'image', 'file_directory': '/Users/jorgep/Documents/ask_public_own/ai-starter-kit-snova/multimodal_knowledge_retriever/data/sample_docs/invoicesample', 'filename': 'figure-1-1.jpg'}, page_content='The image is a digital graphic of a logo for Sunny Farm, a produce company based in Victoria, Australia.\n\nAt the center of the image is a circular logo with a gold border and a brown ribbon banner across its middle. The logo features a stylized illustration of a green hill with trees on either side, set against a yellow sunburst background. The words "AUSTRALIA FRESH PRODUCE" are written in white text along the top curve of the circle, while "VICTORIA" is written in smaller white text along the bottom curve. In the center of the banner, the words "SUNNY FARM" are written in large white text.\n\nBel