# Multimodal RAG

This notebook goes over how to build a RAG system that can leverage Image+Text Capabilities of the SambaNova Multimodal models 

In [1]:
import os
import sys
import glob

current_dir = os.getcwd()
kit_dir = os.path.abspath(os.path.join(current_dir, '..'))
repo_dir = os.path.abspath(os.path.join(kit_dir, '..'))

sys.path.append(kit_dir)
sys.path.append(repo_dir)

from dotenv import load_dotenv

load_dotenv(os.path.join(repo_dir, '.env'), override=True)

import requests
import json
import base64
from pprint import pprint

## Multimodal call

In [None]:
from utils.model_wrappers.multimodal_models import SambastudioMultimodal

lvlm=SambastudioMultimodal(
    api_key = os.environ.get('SAMBANOVA_API_KEY'),
    temperature = 0.01,
    max_tokens_to_generate = 1024,
    model = "Llama-3.2-11B-Vision-Instruct",
)

### QA Call

In [None]:
prompt = 'how many birds could you find at 4pm:'
image_path = os.path.join(kit_dir, 'data', 'sample_docs', 'sample.png')
lvlm.invoke(prompt, image_path)

'**Analysis of Bird Count at 4pm**\n\nBased on the provided graph, we can observe the number of birds present at different times of the day. The x-axis represents the time of day, ranging from 6 AM to 4 PM, while the y-axis indicates the number of birds.\n\n**Observations:**\n\n* At 4 PM, the graph shows a significant decrease in the number of birds compared to other times of the day.\n* The highest number of birds is observed at 2 PM, with approximately 40 birds.\n* The lowest number of birds is recorded at 4 PM, with around 10 birds.\n\n**Conclusion:**\n\nGiven the data presented in the graph, it is evident that there are approximately **10 birds** at 4 PM.'

### Summary call

In [4]:
prompt = 'A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the humans question. USER: <image>\nDescribe the image in detail. Be specific about graphs, such as bar plots, scatter plots, or others. ASSISTANT:'
lvlm.invoke(prompt, image_path)

'The image presents a scatter plot illustrating the relationship between the number of birds on a tree and the time of day. The x-axis represents the time of day, ranging from 6 AM to 4 PM, while the y-axis represents the number of birds, spanning from 0 to 40.\n\n**Key Features:**\n\n*   **Scatter Plot:** The graph features a scatter plot with orange dots representing the number of birds at each time interval.\n*   **Time Intervals:** The x-axis is divided into hourly intervals, starting from 6 AM and ending at 4 PM.\n*   **Number of Birds:** The y-axis displays the number of birds, ranging from 0 to 40.\n*   **Data Points:** Each orange dot on the graph corresponds to a specific time interval and the number of birds observed during that time.\n*   **Trend:** The graph reveals a general downward trend in the number of birds as the time of day progresses from morning to afternoon.\n*   **Peak:** The highest number of birds is observed at 8 AM, with approximately 40 birds present.\n*   

## Doc Extraction

### Unstructured PDF extraction

In [5]:
from unstructured.partition.pdf import partition_pdf

# Path to save images
file_path = os.path.join(kit_dir, 'data', 'sample_docs', 'invoicesample.pdf')
output_path = os.path.splitext(file_path)[0]

# Get elements
raw_pdf_elements = partition_pdf(
    filename=file_path,
    extract_images_in_pdf=True,
    strategy='hi_res',
    hi_res_model_name='yolox',
    # Use layout model (YOLOX) to get bounding boxes (for tables) and find titles
    # Titles are any sub-section of the document
    infer_table_structure=True,
    chunking_strategy='by_title',
    max_characters=1000,
    new_after_n_chars=800,
    combine_text_under_n_chars=500,
    extract_image_block_output_dir=output_path,
)

Some weights of the model checkpoint at microsoft/table-transformer-structure-recognition were not used when initializing TableTransformerForObjectDetection: ['model.backbone.conv_encoder.model.layer2.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer3.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer4.0.downsample.1.num_batches_tracked']
- This IS expected if you are initializing TableTransformerForObjectDetection from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TableTransformerForObjectDetection from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


### View Elements

In [6]:
for i, element in enumerate(raw_pdf_elements):
    print(f'\033[95m ELEMENT {i}\033[00m')
    print(f'TYPE: {type(element)}')
    print(f'META: {element.metadata.to_dict()}')
    print(f'TEXT: {element.text}')
    print('\n\n##########\n')

[95m ELEMENT 0[00m
TYPE: <class 'unstructured.documents.elements.CompositeElement'>
META: {'filetype': 'application/pdf', 'languages': ['eng'], 'last_modified': '2025-03-06T15:35:28', 'page_number': 1, 'orig_elements': 'eJy9Vttu3DYQ/RVB7UMLLCXeL34LmqIwijgpvOmLYSwocbimo8tC4tpxgv57KWlTpMkmQBZYP56jGYo8Z2bIm485NNBCFzfB5RdZXmnKia8YskQaxDXFyGLDEQDTRjnvmCX5KstbiNbZaFPOx7zu+8GFzkYYZ9zYp34fN3cQtncxMZQxk3IO9GNw8S6xRAqW2F0fujjl3dwYQwuxygjnBb9dZZ8wJbIQEyYCq4IeIZaMxOTj0xihnU7yJryH5npna8j/SR8cRKhj6LtN3dhx3OyGvkphuKCSCZkCfGggPu1gzn3zKp833G33djuf6iaHbpvfzuwYN23vgg8wa0YxFQgzhOWaiAsmLqiesncpc9Pt2wqG6bTTJiK8n/TIX6zXv1+tL19fZevXU+in/65DbObtfmkL11gqqgxitCaIM65QRUEgZi0GiYnmWJ3NFqJEoZLqAhdkVn3BUoiCTVgTPtn0JV7iTzNFM/kNT0I7ybqz817z8u0Iw1juIA79fVu+7Ov9pFpiEk5Lj+WL6z/L9PNded8PWyhtQGO0Q4QBvQuxbPdNDMlL22zedf1jAy4tPqTVAjzAUE5SlqNtdw1sXF+PZege+lDDQpU+bPcDIIJIcb/bnrViPq+Sy0mCY1UiidegoUaaekDcM4+srTEigjliauIxk+erEkzmsmCULWXwHyFloWeCcVmYY8Sccmr7UiyeuX1fQtc9ZX/sO/tou8+dubLDYGN4gPUUecQhJTmmtQJkOJdpvJoKGYIBWWO4NdQJyd05Hd

In [7]:
# Create a dictionary to store counts of each type
category_counts = {}

for element in raw_pdf_elements:
    category = str(type(element))
    if category in category_counts:
        category_counts[category] += 1
    else:
        category_counts[category] = 1

# Unique_categories will have unique elements
# TableChunk if Table > max chars set above
unique_categories = set(category_counts.keys())
category_counts

{"<class 'unstructured.documents.elements.CompositeElement'>": 2,
 "<class 'unstructured.documents.elements.Table'>": 1}

In [8]:
from langchain.schema import Document


# Categorize by type
categorized_elements = []
for element in raw_pdf_elements:
    if 'unstructured.documents.elements.Table' in str(type(element)):
        meta = element.metadata.to_dict()
        meta['type'] = 'table'
        categorized_elements.append(Document(page_content=element.metadata.text_as_html, metadata=meta))
    elif 'unstructured.documents.elements.CompositeElement' in str(type(element)):
        meta = element.metadata.to_dict()
        meta['type'] = 'text'
        categorized_elements.append(Document(page_content=str(element), metadata=meta))

# Tables
table_docs = [e for e in categorized_elements if e.metadata['type'] == 'table']
print(len(table_docs))

# Text
text_docs = [e for e in categorized_elements if e.metadata['type'] == 'text']
print(len(text_docs))

1
2


### Text and table summaries

In [9]:
from utils.model_wrappers.langchain_llms import SambaNovaCloud, SambaStudio
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import load_prompt

In [10]:
text_prompt = load_prompt(os.path.join(kit_dir, 'prompts', 'llama3-text_summary.yaml'))
table_prompt = load_prompt(os.path.join(kit_dir, 'prompts', 'llama3-table_summary.yaml'))

# Summary chain
model = SambaNovaCloud(
    max_tokens=500,
    model='Meta-Llama-3.1-8B-Instruct',
)

# model = SambaStudio(
#     model_kwargs={
#         'do_sample': False,
#         'temperature': 0.01,
#         'max_tokens: 256,
#         'process_prompt': False,
#         'model': 'Meta-Llama-3-70B-Instruct-4096',
#     },
# )

text_summarize_chain = {'element': lambda x: x} | text_prompt | model | StrOutputParser()
table_summarize_chain = {'element': lambda x: x} | table_prompt | model | StrOutputParser()

### Text Summaries

In [11]:
# Apply to text
texts = [i.page_content for i in text_docs if i.page_content != '']
if texts:
    text_summaries = text_summarize_chain.batch(texts, {'max_concurrency': 1})

In [12]:
text_summaries

['The text appears to be an invoice from Denny Gunawan, with the following details:\n\n- Address: 221 Queen St, Melbourne VIC 3000 (also listed as 123 Somewhere St, Melbourne VIC 3000)\n- Phone number: (03) 1234 5678\n- Total amount: $39.60\n- Invoice number: #20130304',
 'A receipt summary: A subtotal of $36.00 was calculated, followed by an additional 10% GST (Goods and Services Tax), resulting in a total of $39.60.']

### Table summaries

In [13]:
# Apply to tables
tables = [i.page_content for i in table_docs]
if tables:
    table_summaries = table_summarize_chain.batch(tables, {'max_concurrency': 1})

In [14]:
table_summaries

["The table contains a list of fruits, their prices, quantities, and total costs. Here's a concise summary:\n\n- Total items: 7\n- Total revenue: $46.77\n- Average price per item: $6.71"]

### Image summary

In [15]:
prompt = 'Describe the image in detail. Be specific about graphs include name of axis, labels, legends and important numerical information'
image_paths = []
image_paths.extend(glob.glob(os.path.join(output_path, '*.jpg')))
image_paths.extend(glob.glob(os.path.join(output_path, '*.png')))

image_summaries = []
image_docs = []

for image_path in image_paths:
    result = lvlm.invoke(prompt, image_path)
    image_summaries.append(result)
    image_docs.append(
        Document(
            page_content=result,
            metadata={
                'type': 'image',
                'file_directory': os.path.dirname(image_path),
                'filename': os.path.basename(image_path),
            },
        )
    )

In [16]:
image_summaries

['The image is a logo for Sunny Farm, an Australian fresh produce farm. The logo features a circular design with a gold border and a green and yellow sunburst in the center. The sunburst is surrounded by a green field with trees on either side.\n\n*   **Circular Design:**\n    *   The circular design is the main element of the logo.\n    *   It has a gold border with a thin white outline.\n    *   The circle is divided into two sections: the top half features the sunburst, while the bottom half features the green field and trees.\n*   **Sunburst:**\n    *   The sunburst is a yellow and green graphic that represents the sun.\n    *   It is positioned at the top of the circle, above the green field.\n    *   The sunburst is surrounded by a thin white outline.\n*   **Green Field:**\n    *   The green field is a graphic representation of a field of crops.\n    *   It is positioned below the sunburst, taking up the bottom half of the circle.\n    *   The field is depicted in various shades 

In [17]:
image_docs

[Document(metadata={'type': 'image', 'file_directory': '/Users/petrojm/Documents/projects/ASK/temp/jorge/ai-starter-kit/multimodal_knowledge_retriever/data/sample_docs/invoicesample', 'filename': 'figure-1-1.jpg'}, page_content='The image is a logo for Sunny Farm, an Australian fresh produce farm. The logo features a circular design with a gold border and a green and yellow sunburst in the center. The sunburst is surrounded by a green field with trees on either side.\n\n*   **Circular Design:**\n    *   The circular design is the main element of the logo.\n    *   It has a gold border with a thin white outline.\n    *   The circle is divided into two sections: the top half features the sunburst, while the bottom half features the green field and trees.\n*   **Sunburst:**\n    *   The sunburst is a yellow and green graphic that represents the sun.\n    *   It is positioned at the top of the circle, above the green field.\n    *   The sunburst is surrounded by a thin white outline.\n*   

### add to vectorstore

In [18]:
import uuid

from langchain.retrievers.multi_vector import MultiVectorRetriever
from langchain.storage import InMemoryByteStore
from utils.model_wrappers.api_gateway import APIGateway
from langchain_community.vectorstores import Chroma
from langchain_core.documents import Document

# The vectorstore to use to index the child chunks
vectorstore = Chroma(
    collection_name='summaries',
    embedding_function=APIGateway.load_embedding_model(
        type='sambastudio', batch_size=1, bundle=False, select_expert='e5-mistral-7b-instruct-8192'
    ),
)

# The storage layer for the parent documents
store = InMemoryByteStore()
id_key = 'doc_id'

# The retriever (empty to start)
retriever = MultiVectorRetriever(vectorstore=vectorstore, docstore=store, id_key=id_key, search_kwargs={'k': 2})


  vectorstore = Chroma(
2025-03-06 15:46:58,379 [INFO] - Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.


In [19]:
# Add texts
if texts:
    doc_ids = [str(uuid.uuid4()) for _ in text_docs]
    summary_texts = [Document(page_content=s, metadata={id_key: doc_ids[i]}) for i, s in enumerate(text_summaries)]
    retriever.vectorstore.add_documents(summary_texts)
    retriever.docstore.mset(list(zip(doc_ids, text_docs)))

# Add tables
if tables:
    table_ids = [str(uuid.uuid4()) for _ in table_docs]
    summary_tables = [Document(page_content=s, metadata={id_key: table_ids[i]}) for i, s in enumerate(table_summaries)]
    retriever.vectorstore.add_documents(summary_tables)
    retriever.docstore.mset(list(zip(table_ids, table_docs)))

# Add images
if image_summaries:
    img_ids = [str(uuid.uuid4()) for _ in image_summaries]
    summary_img = [Document(page_content=s, metadata={id_key: img_ids[i]}) for i, s in enumerate(image_summaries)]
    retriever.vectorstore.add_documents(summary_img)
    retriever.docstore.mset(list(zip(img_ids, image_docs)))  # Store the image summary as the raw document

In [20]:
retriever.invoke('what is the final price in the invoice?')

[Document(metadata={'filetype': 'application/pdf', 'languages': ['eng'], 'last_modified': '2025-03-06T15:35:28', 'page_number': 1, 'orig_elements': 'eJy9Vttu3DYQ/RVB7UMLLCXeL34LmqIwijgpvOmLYSxIcbSmo8tC4tpxgv57KWlTpMkmQBZYP56jGYo8Z2bIm485NNBCFzfB5xdZ7oi32tEKMWk04q4myClPETFgMTUEgIp8leUtROtttCnnY171/eBDZyOMM27sU7+PmzsI27uYGMqYSTkH+jH4eJdYIgVL7K4PXZzybm6MoYVYZYTzgt+usk+YElmICROBVUGPEEtGYvLxaYzQTid5E95Dc72zFeT/pA8eIlQx9N2mauw4bnZD71IYLqhkQqaAOjQQn3Yw5755lc8b7rZ7u51PdZNDt81vZ3aMm7b3oQ4wa0YxFQgzhOWaiAsmLqiesncpc9PtWwfDdNppExHeT3rkL9br36/Wl6+vsvXrKfTTf9chNvN2v7TFeOGNZDVyTEvEqfPIVpwjbA04rKzj/ny2ECUKlVQXuCCz6guWQhRswprwyaYv8RJ/mimayW94EtpJ1p2d95qXb0cYxnIHcejv2/JlX+0n1RKTcFp6LF9c/1mmn+/K+37YQmkDGqMdIgzoXYhlu29iSF7aZvOu6x8b8GnxIa0W4AGGcpKyHG27a2Dj+2osQ/fQhwoWqqzDdj8AIogU97vtWSvm8yq5nCQ4ViUClOZEc8Sl9YjryiKnfY2scsZTZoiA+nxVgslcFoyypQz+I6Qs9EwwLgtzjJhTTm1fisUzt+9L6Lqn7I99Zx9t97kzV3YYbAwPsJ4ijzikFTEVlxVyFa4Rx7pCtuYCKZwmLeZK1+aMfYyJLsgq4zLJdnBoJgTjB8sYFVPrfkUsKac5xDQ1zz1gKSXZX3uALruO2StoXL8fOsj+vvwtYxj

In [21]:
retriever.invoke('what is the logo of the company')

[Document(metadata={'type': 'image', 'file_directory': '/Users/petrojm/Documents/projects/ASK/temp/jorge/ai-starter-kit/multimodal_knowledge_retriever/data/sample_docs/invoicesample', 'filename': 'figure-1-1.jpg'}, page_content='The image is a logo for Sunny Farm, an Australian fresh produce farm. The logo features a circular design with a gold border and a green and yellow sunburst in the center. The sunburst is surrounded by a green field with trees on either side.\n\n*   **Circular Design:**\n    *   The circular design is the main element of the logo.\n    *   It has a gold border with a thin white outline.\n    *   The circle is divided into two sections: the top half features the sunburst, while the bottom half features the green field and trees.\n*   **Sunburst:**\n    *   The sunburst is a yellow and green graphic that represents the sun.\n    *   It is positioned at the top of the circle, above the green field.\n    *   The sunburst is surrounded by a thin white outline.\n*   

## Retrieval with raw text, raw tables and image summaries

In [22]:
from langchain.chains import RetrievalQA

prompt = load_prompt(os.path.join(kit_dir, 'prompts', 'llama3-knowledge_retriever_custom_qa_prompt.yaml'))

chain = RetrievalQA.from_llm(
    llm=model, retriever=retriever, return_source_documents=True, input_key='question', output_key='answer'
)
chain.combine_documents_chain.llm_chain.prompt = prompt

In [23]:
chain.invoke({'question': 'what is the final price in the invoice?'})

{'question': 'what is the final price in the invoice?',
 'answer': 'The final price in the invoice is $39.60.',
 'source_documents': [Document(metadata={'filetype': 'application/pdf', 'languages': ['eng'], 'last_modified': '2025-03-06T15:35:28', 'page_number': 1, 'orig_elements': 'eJy9Vttu3DYQ/RVB7UMLLCXeL34LmqIwijgpvOmLYSxIcbSmo8tC4tpxgv57KWlTpMkmQBZYP56jGYo8Z2bIm485NNBCFzfB5xdZ7oi32tEKMWk04q4myClPETFgMTUEgIp8leUtROtttCnnY171/eBDZyOMM27sU7+PmzsI27uYGMqYSTkH+jH4eJdYIgVL7K4PXZzybm6MoYVYZYTzgt+usk+YElmICROBVUGPEEtGYvLxaYzQTid5E95Dc72zFeT/pA8eIlQx9N2mauw4bnZD71IYLqhkQqaAOjQQn3Yw5755lc8b7rZ7u51PdZNDt81vZ3aMm7b3oQ4wa0YxFQgzhOWaiAsmLqiesncpc9PtWwfDdNppExHeT3rkL9br36/Wl6+vsvXrKfTTf9chNvN2v7TFeOGNZDVyTEvEqfPIVpwjbA04rKzj/ny2ECUKlVQXuCCz6guWQhRswprwyaYv8RJ/mimayW94EtpJ1p2d95qXb0cYxnIHcejv2/JlX+0n1RKTcFp6LF9c/1mmn+/K+37YQmkDGqMdIgzoXYhlu29iSF7aZvOu6x8b8GnxIa0W4AGGcpKyHG27a2Dj+2osQ/fQhwoWqqzDdj8AIogU97vtWSvm8yq5nCQ4ViUClOZEc8Sl9YjryiKnfY2scsZTZoiA+nxVgslcFoyypQz+I6Qs9EwwLgtzjJhTTm1fisUzt+9L6Lqn7I

In [24]:
chain.invoke('what is the logo of the company')

{'question': 'what is the logo of the company',
 'answer': 'The logo of the company is a circular design with a gold border and a green and yellow sunburst in the center. The sunburst is surrounded by a green field with trees on either side. It features the text "SUNNY FARM" in white letters across the center of the circle and the text "AUSTRALIA FRESH PRODUCE" written in smaller white letters above the sunburst.',
 'source_documents': [Document(metadata={'type': 'image', 'file_directory': '/Users/petrojm/Documents/projects/ASK/temp/jorge/ai-starter-kit/multimodal_knowledge_retriever/data/sample_docs/invoicesample', 'filename': 'figure-1-1.jpg'}, page_content='The image is a logo for Sunny Farm, an Australian fresh produce farm. The logo features a circular design with a gold border and a green and yellow sunburst in the center. The sunburst is surrounded by a green field with trees on either side.\n\n*   **Circular Design:**\n    *   The circular design is the main element of the logo

## Retrieval with raw text, raw tables and raw images

In [25]:
query = 'what is the logo of the company?'

In [26]:
retriever.invoke(query)

[Document(metadata={'type': 'image', 'file_directory': '/Users/petrojm/Documents/projects/ASK/temp/jorge/ai-starter-kit/multimodal_knowledge_retriever/data/sample_docs/invoicesample', 'filename': 'figure-1-1.jpg'}, page_content='The image is a logo for Sunny Farm, an Australian fresh produce farm. The logo features a circular design with a gold border and a green and yellow sunburst in the center. The sunburst is surrounded by a green field with trees on either side.\n\n*   **Circular Design:**\n    *   The circular design is the main element of the logo.\n    *   It has a gold border with a thin white outline.\n    *   The circle is divided into two sections: the top half features the sunburst, while the bottom half features the green field and trees.\n*   **Sunburst:**\n    *   The sunburst is a yellow and green graphic that represents the sun.\n    *   It is positioned at the top of the circle, above the green field.\n    *   The sunburst is surrounded by a thin white outline.\n*   

In [27]:
chain.invoke({'question': query})

{'question': 'what is the logo of the company?',
 'answer': 'The logo of the company is a circular design featuring a gold border and a green and yellow sunburst in the center. The sunburst is surrounded by a green field with trees on either side. The company name is written in white letters across the center of the circle, with additional text written in smaller white letters. The logo features a predominantly gold and green color scheme, with accents of yellow and brown, and represents the theme of a sunny farm in Australia, with a focus on fresh produce.',
 'source_documents': [Document(metadata={'type': 'image', 'file_directory': '/Users/petrojm/Documents/projects/ASK/temp/jorge/ai-starter-kit/multimodal_knowledge_retriever/data/sample_docs/invoicesample', 'filename': 'figure-1-1.jpg'}, page_content='The image is a logo for Sunny Farm, an Australian fresh produce farm. The logo features a circular design with a gold border and a green and yellow sunburst in the center. The sunburst

### filter image results

In [28]:
def get_retrieved_images(retriever, query):
    results = retriever.invoke(query)
    results = [result for result in results if result.metadata['type'] == 'image']
    return results

In [29]:
retrieved_images = get_retrieved_images(retriever, query)
retrieved_images

[Document(metadata={'type': 'image', 'file_directory': '/Users/petrojm/Documents/projects/ASK/temp/jorge/ai-starter-kit/multimodal_knowledge_retriever/data/sample_docs/invoicesample', 'filename': 'figure-1-1.jpg'}, page_content='The image is a logo for Sunny Farm, an Australian fresh produce farm. The logo features a circular design with a gold border and a green and yellow sunburst in the center. The sunburst is surrounded by a green field with trees on either side.\n\n*   **Circular Design:**\n    *   The circular design is the main element of the logo.\n    *   It has a gold border with a thin white outline.\n    *   The circle is divided into two sections: the top half features the sunburst, while the bottom half features the green field and trees.\n*   **Sunburst:**\n    *   The sunburst is a yellow and green graphic that represents the sun.\n    *   It is positioned at the top of the circle, above the green field.\n    *   The sunburst is surrounded by a thin white outline.\n*   

### Generate response over retrieved raw images 

In [30]:
def get_image_answers(retrieved_image_docs, query):
    image_answer_prompt_template = load_prompt(os.path.join(kit_dir, 'prompts', 'multimodal-qa.yaml'))
    image_answer_prompt = image_answer_prompt_template.format(question=query)
    answers = []
    for doc in retrieved_image_docs:
        image_path = os.path.join(doc.metadata['file_directory'], doc.metadata['filename'])
        answers.append(lvlm.invoke(image_answer_prompt, image_path))
    return answers

In [31]:
image_answers = get_image_answers(retrieved_images, query)
image_answers

['The logo of the company is a circular emblem featuring a stylized sunburst design, accompanied by the text "Australia Fresh Produce" and "Sunny Farm Victoria". The logo\'s color scheme is predominantly gold, with green accents and a white background. The overall design conveys a sense of warmth, freshness, and quality, suggesting that the company values its products and aims to convey a positive image to its customers.']

In [32]:
def get_retrieved_docs(retriever, query):
    results = retriever.invoke(query)
    results = [result for result in results if result.metadata['type'] != 'image']
    return results

In [33]:
context_docs = get_retrieved_docs(retriever, query)
context_docs

[Document(metadata={'filetype': 'application/pdf', 'languages': ['eng'], 'last_modified': '2025-03-06T15:35:28', 'page_number': 1, 'orig_elements': 'eJy9Vttu3DYQ/RVB7UMLLCXeL34LmqIwijgpvOmLYSxIcbSmo8tC4tpxgv57KWlTpMkmQBZYP56jGYo8Z2bIm485NNBCFzfB5xdZ7oi32tEKMWk04q4myClPETFgMTUEgIp8leUtROtttCnnY171/eBDZyOMM27sU7+PmzsI27uYGMqYSTkH+jH4eJdYIgVL7K4PXZzybm6MoYVYZYTzgt+usk+YElmICROBVUGPEEtGYvLxaYzQTid5E95Dc72zFeT/pA8eIlQx9N2mauw4bnZD71IYLqhkQqaAOjQQn3Yw5755lc8b7rZ7u51PdZNDt81vZ3aMm7b3oQ4wa0YxFQgzhOWaiAsmLqiesncpc9PtWwfDdNppExHeT3rkL9br36/Wl6+vsvXrKfTTf9chNvN2v7TFeOGNZDVyTEvEqfPIVpwjbA04rKzj/ny2ECUKlVQXuCCz6guWQhRswprwyaYv8RJ/mimayW94EtpJ1p2d95qXb0cYxnIHcejv2/JlX+0n1RKTcFp6LF9c/1mmn+/K+37YQmkDGqMdIgzoXYhlu29iSF7aZvOu6x8b8GnxIa0W4AGGcpKyHG27a2Dj+2osQ/fQhwoWqqzDdj8AIogU97vtWSvm8yq5nCQ4ViUClOZEc8Sl9YjryiKnfY2scsZTZoiA+nxVgslcFoyypQz+I6Qs9EwwLgtzjJhTTm1fisUzt+9L6Lqn7I99Zx9t97kzV3YYbAwPsJ4ijzikFTEVlxVyFa4Rx7pCtuYCKZwmLeZK1+aMfYyJLsgq4zLJdnBoJgTjB8sYFVPrfkUsKac5xDQ1zz1gKSXZX3uALruO2StoXL8fOsj+vvwtYxj

In [35]:
prompt = load_prompt(os.path.join(kit_dir, 'prompts', 'llama3-knowledge_retriever_custom_qa_prompt.yaml'))
text_contexts = [doc.page_content for doc in context_docs]
full_context = '\n\n'.join(image_answers) + '\n\n' + '\n\n'.join(text_contexts)
formated_prompt = prompt.format(context=full_context, question=query)
formated_prompt
model.invoke(formated_prompt)

'The logo of the company is a circular emblem featuring a stylized sunburst design, accompanied by the text "Australia Fresh Produce" and "Sunny Farm Victoria". The logo\'s color scheme is predominantly gold, with green accents and a white background.'

This example workflow is consolidated in the provided [multimodal rag src module](../src/multimodal.py) to see an usage example please refer to the [multimodal rag notebook](./3_multimodal_rag_usage.ipynb) 