Multimodal LLMs include Google’s Gemini, Meta’s Llama 3.2 and OpenAI’s GPT-4 and GPT-4o.


Execute the following in a single line copy paste after creating the terminal. 

DO NOT Execute in the Jupyter Noteboll Cell since it may have issues  

uv pip install git+https://github.com/ibm-granite-community/utils.git \
    transformers \
    pillow \
    langchain_classic \
    langchain_core \
    langchain_huggingface sentence_transformers \
    langchain_milvus 'pymilvus[milvus_lite]' \
    docling \
    'langchain_replicate @ git+https://github.com/ibm-granite-community/langchain-replicate.git'

In [1]:
import logging
import sys
import time

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')


print(f"*"*60)
print("Executed Date & Time:", time.strftime("%Y-%m-%d %H:%M:%S"))
print(f"*"*60)


************************************************************
Executed Date & Time: 2025-12-14 11:14:44
************************************************************


Load the Granite models
Specify the embeddings model to use for generating text embedding vectors. Here we will use one of the Granite Embeddings models

To use a different embeddings model, replace this code cell with one from this Embeddings Model recipe.

### langchain-huggingface

The langchain-huggingface package is an official partner integration that connects Hugging Face models and services with the LangChain framework. It is available on the Python Package Index (PyPI) and can be installed using pip. 


In [2]:
from langchain_huggingface import HuggingFaceEmbeddings
from transformers import AutoModel, AutoTokenizer

embeddings_model_path = "ibm-granite/granite-embedding-30m-english"

embeddings_model = HuggingFaceEmbeddings(

    model_name=embeddings_model_path,

)


print(f"*"*60)
print("Executed Date & Time:", time.strftime("%Y-%m-%d %H:%M:%S"))
print(f"*"*60)

  from .autonotebook import tqdm as notebook_tqdm
2025-12-14 11:14:57,770 - INFO - Use pytorch device_name: cpu
2025-12-14 11:14:57,771 - INFO - Load pretrained SentenceTransformer: ibm-granite/granite-embedding-30m-english
2025-12-14 11:14:57,770 - INFO - Use pytorch device_name: cpu
2025-12-14 11:14:57,771 - INFO - Load pretrained SentenceTransformer: ibm-granite/granite-embedding-30m-english


************************************************************
Executed Date & Time: 2025-12-14 11:14:58
************************************************************


In [3]:
from dotenv import load_dotenv, find_dotenv

import sys
import os


def env(var_name, default=None):
    """Get environment variable with a default value."""
    return os.getenv(var_name, default)

def get_env_var(var_name):
    """Get environment variable or raise an error if not found."""
    value = os.getenv(var_name)
    if value is None:
        raise ValueError(f"Environment variable '{var_name}' not found.")
    return value



load_dotenv()
if not find_dotenv():
    logging.warning("No .env file found. Ensure environment variables are set correctly.")
else:
    logging.info(".env file loaded successfully.")
 

# Initialize the vision model using Replicate
replicate_api_token = os.getenv("REPLICATE_API_TOKEN")
if not replicate_api_token:
    replicate_api_token = get_env_var("REPLICATE_API_TOKEN")
    if not replicate_api_token:
        logging.error("REPLICATE_API_TOKEN not found in environment variables.")
    else:
        logging.info("Replicate API token loaded successfully.")
        
else:
    logging.info("Replicate API token loaded successfully.")


print(f"*"*60)
print("Executed Date & Time:", time.strftime("%Y-%m-%d %H:%M:%S"))
print(f"*"*60)


2025-12-14 11:15:01,351 - INFO - .env file loaded successfully.
2025-12-14 11:15:01,352 - INFO - Replicate API token loaded successfully.
2025-12-14 11:15:01,352 - INFO - Replicate API token loaded successfully.


************************************************************
Executed Date & Time: 2025-12-14 11:15:01
************************************************************


In [4]:
#%pip install git+https://github.com/ibm-granite-community/utils.git

# Run the following command on Terminal

#uv pip install git+https://github.com/ibm-granite-community/utils.git

from ibm_granite_community.notebook_utils import get_env_var

from langchain_community.llms import Replicate

from transformers import AutoProcessor



# Define the vision model path
vision_model_path = "ibm-granite/granite-vision-3.2-2b"

# Initialize the embeddings_tokenizer using the embeddings_model_path
embeddings_tokenizer = AutoTokenizer.from_pretrained(embeddings_model_path)

 
  


vision_model = Replicate(
    model=vision_model_path,
    replicate_api_token=replicate_api_token,
    model_kwargs={
        "max_tokens": embeddings_tokenizer.model_max_length,  # Set the maximum number of tokens to generate as output.
        "min_tokens": 100,  # Set the minimum number of tokens to generate as output.
    },
)

vision_processor = AutoProcessor.from_pretrained(vision_model_path)

print(f"*"*60)
print("Executed Date & Time:", time.strftime("%Y-%m-%d %H:%M:%S"))
print(f"*"*60)

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


************************************************************
Executed Date & Time: 2025-12-14 11:15:04
************************************************************


Do an install as follows in the terminal to resolve the errors for from langchain_replicate import ChatReplicate

You could also do a 

%uv pip install git+https://github.com/ibm-granite-community/langchain-replicate.git

thro' the jupyter cell, but doing it thro' Terminal in an set environment is preferred. 


uv pip install git+https://github.com/ibm-granite-community/langchain-replicate.git

In [5]:
from langchain_replicate import ChatReplicate

model_path = "ibm-granite/granite-4.0-h-small"
model = ChatReplicate(
    model=model_path,
    replicate_api_token=get_env_var("REPLICATE_API_TOKEN"),
    model_kwargs={
        "max_tokens": 1000, # Set the maximum number of tokens to generate as output.
        "min_tokens": 100, # Set the minimum number of tokens to generate as output.
    },
)


print(f"*"*60)
print("Executed Date & Time:", time.strftime("%Y-%m-%d %H:%M:%S"))
print(f"*"*60)

************************************************************
Executed Date & Time: 2025-12-14 11:15:20
************************************************************


In [6]:
from docling.datamodel import base_models
print(dir(base_models))

['AbstractDocumentBackend', 'AssembledUnit', 'BaseFormatOption', 'BaseModel', 'BasePageElement', 'BoundingBox', 'Cluster', 'ConfidenceReport', 'ConfigDict', 'ContainerElement', 'ConversionStatus', 'DocInputType', 'DocItemLabel', 'DoclingComponentType', 'DocumentStream', 'Enum', 'EquationPrediction', 'ErrorItem', 'Field', 'FieldSerializationInfo', 'FigureClassificationPrediction', 'FigureElement', 'FormatToExtensions', 'FormatToMimeType', 'Image', 'InputFormat', 'ItemAndImageEnrichmentElement', 'LayoutPrediction', 'MimeTypeToFormat', 'NodeItem', 'OpenAiApiResponse', 'OpenAiChatMessage', 'OpenAiResponseChoice', 'OpenAiResponseUsage', 'Optional', 'OutputFormat', 'Page', 'PageConfidenceScores', 'PageElement', 'PagePredictions', 'PictureDataType', 'PipelineOptions', 'PydanticSerCtxKey', 'QualityGrade', 'ScoreValue', 'SegmentedPdfPage', 'Size', 'TYPE_CHECKING', 'Table', 'TableCell', 'TableStructurePrediction', 'TextCell', 'TextElement', 'Type', 'Union', 'VlmPrediction', 'VlmPredictionToken',

In [7]:
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.base_models import InputFormat
# from docling.datamodel.base_models import BaseFormatOption #FormatOption

from docling.datamodel.pipeline_options import PdfPipelineOptions  

pdf_pipeline_options = PdfPipelineOptions(
    do_ocr =False,
    generate_picture_images = True,
)

format_options: dict[InputFormat, PdfFormatOption] = {
    InputFormat.PDF: PdfFormatOption(
        pipeline_options=pdf_pipeline_options
    )
}

#format_options = {
#    InputFormat.PDF: PdfFormatOption(
#        pipeline_options=pdf_pipeline_options
#    ) 
#}   
converter = DocumentConverter( allowed_formats=[InputFormat.PDF])

sources = [
    "https://midwestfoodbank.org/images/AR_2020_WEB2.pdf",
]

conversions = { source:converter.convert(source=source).document for source in sources  }

print(f"*"*60)
print("Executed Date & Time:", time.strftime("%Y-%m-%d %H:%M:%S"))
print(f"*"*60)

2025-12-14 11:15:32,493 - INFO - detected formats: [<InputFormat.PDF: 'pdf'>]
2025-12-14 11:15:32,639 - INFO - Going to convert document batch...
2025-12-14 11:15:32,641 - INFO - Initializing pipeline for StandardPdfPipeline with options hash e15bc6f248154cc62f8db15ef18a8ab7
2025-12-14 11:15:32,639 - INFO - Going to convert document batch...
2025-12-14 11:15:32,641 - INFO - Initializing pipeline for StandardPdfPipeline with options hash e15bc6f248154cc62f8db15ef18a8ab7
2025-12-14 11:15:32,668 - INFO - Loading plugin 'docling_defaults'
2025-12-14 11:15:32,673 - INFO - Registered picture descriptions: ['vlm', 'api']
2025-12-14 11:15:32,700 - INFO - Loading plugin 'docling_defaults'
2025-12-14 11:15:32,668 - INFO - Loading plugin 'docling_defaults'
2025-12-14 11:15:32,673 - INFO - Registered picture descriptions: ['vlm', 'api']
2025-12-14 11:15:32,700 - INFO - Loading plugin 'docling_defaults'
2025-12-14 11:15:32,712 - INFO - Registered ocr engines: ['auto', 'easyocr', 'ocrmac', 'rapidocr

************************************************************
Executed Date & Time: 2025-12-14 11:15:57
************************************************************


With the documents processed, we then further process the text elements in the documents. We chunk them into appropriate sizes for the embeddings model we are using. A list of LangChain documents are created from the text chunks.

In [8]:
from docling_core.transforms.chunker.hybrid_chunker import HybridChunker

from docling_core.types.doc.document import TableItem
from langchain_core.documents import Document

doc_id = 0

texts:list[Document] = []

max_tokens = 512  # Set your model's max token length

for source, docling_document in conversions.items():
    for chunk in HybridChunker(tokenizer=embeddings_tokenizer).chunk(docling_document):
        items = chunk.meta.doc_items  # This shows warning or error, ignore it for now. 
        if len(items) == 1 and isinstance(items[0], TableItem):
            continue  # We will process Tables later

        refs = " ".join(map(lambda item: item.get_ref().cref, items))
        print(refs)

        text = chunk.text
        tokens = embeddings_tokenizer.encode(text)

        # If chunk is too long, split it into smaller pieces
        if len(tokens) > max_tokens:
            # Split tokens into sublists of max_tokens size
            for i in range(0, len(tokens), max_tokens):
                sub_tokens = tokens[i:i+max_tokens]
                sub_text = embeddings_tokenizer.decode(sub_tokens)
                document = Document(
                    page_content=sub_text,
                    metadata={
                        "source": source,
                        "doc_id": (doc_id := doc_id + 1),
                        "refs": refs,
                    },
                )
                texts.append(document)
        else:
            document = Document(
                page_content=text,
                metadata={
                    "source": source,
                    "doc_id": (doc_id := doc_id + 1),
                    "refs": refs,
                },
            )
            texts.append(document)

print(f"Total chunks: {len(texts)} text document chunks created")

print(f"*"*60)
print("Executed Date & Time:", time.strftime("%Y-%m-%d %H:%M:%S"))
print(f"*"*60)


 


Token indices sequence length is longer than the specified maximum sequence length for this model (669 > 512). Running this sequence through the model will result in indexing errors


#/texts/4
#/texts/7 #/texts/8 #/texts/9 #/texts/10 #/texts/11 #/texts/12 #/texts/13 #/texts/14 #/texts/15 #/texts/16
#/tables/0 #/texts/19 #/texts/20 #/texts/21 #/texts/23
#/texts/26
#/texts/32
#/texts/34
#/texts/36
#/texts/38
#/texts/40
#/texts/44
#/texts/46 #/texts/49 #/texts/50 #/texts/51 #/texts/52
#/texts/56 #/texts/57
#/texts/59 #/texts/60 #/texts/61 #/texts/62 #/texts/63 #/texts/64 #/texts/65 #/texts/66 #/texts/67 #/texts/68
#/texts/70 #/texts/71 #/texts/72 #/texts/73 #/texts/74 #/texts/75 #/texts/77
#/texts/154
#/tables/1 #/texts/158 #/texts/166
#/texts/170 #/texts/171
#/texts/179 #/texts/180 #/texts/181
#/texts/185 #/texts/186 #/texts/187 #/texts/188
#/texts/201
#/texts/203 #/texts/204 #/texts/205 #/texts/206 #/texts/207 #/texts/208 #/texts/209
#/texts/212
#/texts/214
#/texts/216
#/texts/218
#/texts/220
#/texts/222
#/texts/224
#/texts/226
#/texts/228
#/texts/230
#/texts/232
#/texts/234
#/texts/236
#/texts/265 #/texts/266 #/texts/267
#/texts/269 #/texts/270
#/texts/272
#/texts/

Next we process any tables in the documents. We convert the table data to markdown format for passing into the language model. A list of LangChain documents are created from the table’s markdown renderings.

In [9]:
from docling_core.types.doc.labels import DocItemLabel

doc_id = len(texts)

tables:list[Document] = []

for source,docling_document in conversions.items():

    for table in docling_document.tables : 
        if table.label in [DocItemLabel.TABLE]:
            ref = table.get_ref().cref
            print(ref)
            
            text = table.export_to_markdown()
            document = Document(
                page_content=text,
                metadata={
                    "source": source,
                    "doc_id": (doc_id := doc_id + 1),
                    "refs": ref,
                },
            )
            tables.append(document)

print(f"Total tables: {len(tables)} table document chunks created")

print(f"*"*60)
print("Executed Date & Time:", time.strftime("%Y-%m-%d %H:%M:%S"))
print(f"*"*60)




#/tables/1
Total tables: 1 table document chunks created
************************************************************
Executed Date & Time: 2025-12-14 11:16:09
************************************************************


Finally we process any images in the documents. Here we use the vision language model to understand the content of an image. In this example, we are interested in any textual information in the image. You might want to experiment with different prompt text to see how it might improve the results.

NOTE: Processing the images can take a very long time depending upon the number of images and the service running the vision language model.

In [10]:
# Print the chat template to understand its structure
print(vision_processor.chat_template)

{%- if tools %}
    {{- '<|start_of_role|>available_tools<|end_of_role|>
' }}
    {%- for tool in tools %}
    {{- tool | tojson(indent=4) }}
    {%- if not loop.last %}
        {{- '

' }}
    {%- endif %}
    {%- endfor %}
    {{- '<|end_of_text|>
' }}
{%- endif %}
{%- for message in messages if message['role'] == 'system'%}{% else %}<|system|>
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
{% endfor %}{%- for message in messages %}
    {%- if message['role'] == 'system' %}
    {{- '<|system|>
' + message['content'][0]['text'] + '
' }}
    {%- elif message['role'] == 'user' %}<|user|>
 {# Render all images first #}{% for content in message['content'] | selectattr('type', 'equalto', 'image') %}{{ '<image>
' }}{% endfor %}{# Render all text next #}{% for content in message['content'] | selectattr('type', 'equalto', 'text') %}{{ content['text'] + '
' }}{% endfor %}
{%- elif messag

In [11]:
import base64
import io
import PIL.Image
import PIL.ImageOps

from IPython.display import display, HTML
from shapely import buffer

def encode_image(image:PIL.Image.Image, format:str ="png") -> str:
    image = PIL.ImageOps.exif_transpose(image) or image
    image = image.convert("RGB")
    buffered = io.BytesIO()
    image.save(buffered, format=format)
    encoding = base64.b64encode(buffered.getvalue()).decode("utf-8")
    uri = f"data:image/{format.lower()};base64,{encoding}"
    return uri


# feel free to experiment with the prompt below

image_prompt = " if the Image contains text, explain the text in the image in detail. " 

#conversation = [
#    {"role": "system",
#      "content": [
#          {"type": "image"},
#         {"type": "text", "text": image_prompt},
#      ],
#   },
#]

conversation = [
    {
        "role": "user",
        "content": image_prompt,  # Just a simple string
    },
]


vision_prompt = vision_processor.apply_chat_template(
    conversation, #  = conversation, Use positional argument instead of assigning to parameter name
    add_generation_prompt = True,
)

pictures:list[Document] =[]

doc_id = len(texts) + len(tables)

for source,docling_document in conversions.items():

    for picture in docling_document.pictures :
        ref = picture.get_ref().cref 
        print(ref)

        image = picture.get_image(docling_document)

        if image is not None:
            result = vision_model.invoke(vision_prompt, image=encode_image(image))
            # If result is a dict, get the 'text' key if it exists, else use the whole result
            if isinstance(result, dict):
                text = result.get("text", str(result))
            else:
                text = str(result)

            document = Document(
                page_content=text,
                metadata={
                    "source": source,
                    "doc_id": (doc_id := doc_id + 1),
                    "refs": ref,
                },
            )
            pictures.append(document)
        else:
            print("No image found, skipping.")

print(f"*"*60)
print("Executed Date & Time:", time.strftime("%Y-%m-%d %H:%M:%S"))
print(f"*"*60)




#/pictures/0
No image found, skipping.
#/pictures/1
No image found, skipping.
#/pictures/2
No image found, skipping.
#/pictures/3
No image found, skipping.
#/pictures/4
No image found, skipping.
#/pictures/5
No image found, skipping.
#/pictures/6
No image found, skipping.
#/pictures/7
No image found, skipping.
#/pictures/8
No image found, skipping.
#/pictures/9
No image found, skipping.
#/pictures/10
No image found, skipping.
#/pictures/11
No image found, skipping.
#/pictures/12
No image found, skipping.
#/pictures/13
No image found, skipping.
#/pictures/14
No image found, skipping.
#/pictures/15
No image found, skipping.
#/pictures/16
No image found, skipping.
#/pictures/17
No image found, skipping.
#/pictures/18
No image found, skipping.
#/pictures/19
No image found, skipping.
#/pictures/20
No image found, skipping.
#/pictures/21
No image found, skipping.
#/pictures/22
No image found, skipping.
#/pictures/23
No image found, skipping.
#/pictures/24
No image found, skipping.
#/pictures

In [12]:
import itertools
from pydoc import resolve

from docling_core.types.doc.document import RefItem
from pydantic import BaseModel



# Print all created documents

for document in itertools.chain(texts, tables):

    print(f"Document ID: {document.metadata['doc_id']}")

    print(f"Source: {document.metadata['source']}")

    print(f"Content:\n{document.page_content}")

    print("=" * 80) # Separator for clarity



for document in pictures:

    print(f"Document ID: {document.metadata['doc_id']}")

    source = document.metadata['source']

    print(f"Source: {source}")
    print(f"Content:\n{document.page_content}")

    docling_document = conversions[source]

    ref = document.metadata['ref']

    
    
    #picture = RefItem(cref=ref).resolve(docling_document)

    image = picture.get_image(docling_document)

    print("Image:")

    display(image)

    print("=" * 80) # Separator for clarity

print(f"*"*60)
print("Executed Date & Time:", time.strftime("%Y-%m-%d %H:%M:%S"))
print(f"*"*60)

Document ID: 1
Source: https://midwestfoodbank.org/images/AR_2020_WEB2.pdf
Content:
bridging the gap between poverty and prosperity
Document ID: 2
Source: https://midwestfoodbank.org/images/AR_2020_WEB2.pdf
Content:
No one could have predicted the events of 2020. The global COVID-19 pandemic created a dynamic year. With the help of volunteers, donors, staff, and most importantly, the blessings of God, Midwest Food Bank responded nimbly to the changing landscape.
All  MFB  locations  remained  open  and  responsive  to  the  need  of  our nonprofit partners. We enacted safety protocols and reduced volunteer numbers  to  maintain  social  distancing  guidelines.  To  allow  partner agencies to receive food from MFB safely, we altered our distribution model.  Community,  business,  and  donor  support  funded  operations and helped with food purchases. More details on our response to the pandemic are on page 14.
Noteworthy in 2020:
- MFB distributed a record amount of food, 37% more than 

Populate the vector database
Using the embedding model, we load the documents from the text chunks and generated image captioning into a vector database. Creating this vector database allows us to easily conduct a semantic similarity search across our documents.

NOTE: Population of the vector database can take some time depending on your embedding model and service.

Choose your vector database
Specify the database to use for storing and retrieving embedding vectors.

To connect to a vector database other than Milvus, replace this code cell with one from this Vector Store recipe.

In [13]:
# %pip install langchain_milvus

import os
from langchain_core.vectorstores import VectorStore
# from langchain_milvus import Milvus

from langchain_community.vectorstores import Annoy

# Define the directory and file path for the vector database
vector_db_dir = os.path.join(os.getcwd(), "vectorDB")
os.makedirs(vector_db_dir, exist_ok=True)
db_file = os.path.join(vector_db_dir, "vectorstore.db")

print(f"The vector database will be saved to {db_file}")

#vector_db: VectorStore = Milvus(
#    embedding_function=embeddings_model,
#    connection_args={"uri": db_file},
#    auto_id=True,
#    enable_dynamic_field=True,
#    index_params={"index_type": "AUTOINDEX"},
#)

vector_db = Annoy.from_documents(
    documents=texts + tables + pictures,  # or whatever list of LangChain Documents you have
    embedding=embeddings_model,
    persist_path=db_file  # path to save the Annoy index
)

vector_db.save_local(db_file)

# To Reload later 
#vector_db = Annoy.load_local(
#    persist_path=db_file,
#    embedding=embeddings_model
#)

print(f"*"*60)
print("Executed Date & Time:", time.strftime("%Y-%m-%d %H:%M:%S"))
print(f"*"*60)

The vector database will be saved to d:\AiCode\MultiModelDocklingGraniteRAG\vectorDB\vectorstore.db
************************************************************
Executed Date & Time: 2025-12-14 11:16:31
************************************************************
************************************************************
Executed Date & Time: 2025-12-14 11:16:31
************************************************************


We now add all the LangChain documents for the text, tables and image descriptions to the vector database.

In [16]:
import itertools

documents = list(itertools.chain(texts, tables, pictures))

ids = vector_db.add_documents(documents)

print(f"{len(ids)} documents added to the vector database")

NotImplementedError: Annoy does not allow to add new data once the index is build.

Step 4: RAG with Granite
Now that we have successfully converted our documents and vectorized them, we can set up out RAG pipeline.

Retrieve relevant chunks
Here we test the vector database by searching for chunks with relevant information to our query in the vector space. We display the documents associated with the retrieved image description.

Feel free to try different queries.

In [14]:
query = "How much was spent on food distribution relative to the amount of food distributed?"

for doc in vector_db.as_retriever().invoke(query):

    print(doc)

    print("=" * 80) # Separator for clarity

page_content='Midwest Food Bank Growth Value of food distributed (millions) Value of food distributed (millions)' metadata={'source': 'https://midwestfoodbank.org/images/AR_2020_WEB2.pdf', 'doc_id': 15, 'refs': '#/texts/154'}
page_content='We  receive  donated  food  from  all  over  the country  -  food  produced  in  excess,  incorrect labeling,  and  more.  Donated  food  comes  from various sources:
- Food manufacturers
- Food distribution centers
- USDA programs
- Grocery stores
- Private food drives' metadata={'source': 'https://midwestfoodbank.org/images/AR_2020_WEB2.pdf', 'doc_id': 43, 'refs': '#/texts/323 #/texts/324 #/texts/325 #/texts/326 #/texts/327 #/texts/328'}
page_content='Provide industry-leading food relief to those in need while feeding them spiritually.
NOUN
the state of being without reliable access to a sufficient quantity of affordable, nutritious food
Food-insecure households have difficulty at some time during the year providing enough food for all  their  memb

The returned document should be responsive to the query. Let's go ahead and construct our RAG pipeline.


Create the RAG pipeline for Granite
First we create the prompts for Granite to perform the RAG query. We use the Granite chat template and supply the placeholder values that the LangChain RAG pipeline will replace.

Next, we construct the RAG pipeline by using the Granite prompt templates previously created.

In [15]:
from ibm_granite_community.langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_classic.chains.retrieval import create_retrieval_chain
from langchain_core.prompts import ChatPromptTemplate

# Create a Granite prompt for question-answering with the retrieved context
prompt_template = ChatPromptTemplate.from_template("{input}")

# Assemble the retrieval-augmented generation chain
combine_docs_chain = create_stuff_documents_chain(
    llm=model,
    prompt=prompt_template,
)
rag_chain = create_retrieval_chain(
    retriever=vector_db.as_retriever(),
    combine_docs_chain=combine_docs_chain,
)

Generate a retrieval-augmented response to a question
The pipeline uses the query to locate documents from the vector database and use them as context for the query.

In [16]:
from ibm_granite_community.notebook_utils import wrap_text

output = rag_chain.invoke({"input": query})

print(wrap_text(output['answer']))

2025-12-14 11:17:55,303 - INFO - HTTP Request: GET https://api.replicate.com/v1/models/ibm-granite/granite-4.0-h-small "HTTP/1.1 200 OK"
2025-12-14 11:17:55,536 - INFO - HTTP Request: POST https://api.replicate.com/v1/models/ibm-granite/granite-4.0-h-small/predictions "HTTP/1.1 201 Created"
2025-12-14 11:17:55,536 - INFO - HTTP Request: POST https://api.replicate.com/v1/models/ibm-granite/granite-4.0-h-small/predictions "HTTP/1.1 201 Created"
2025-12-14 11:17:56,115 - INFO - HTTP Request: GET https://api.replicate.com/v1/predictions/hxag2wc0q5rj40cv3pkv6ekzz8 "HTTP/1.1 200 OK"
2025-12-14 11:17:56,115 - INFO - HTTP Request: GET https://api.replicate.com/v1/predictions/hxag2wc0q5rj40cv3pkv6ekzz8 "HTTP/1.1 200 OK"
2025-12-14 11:17:56,697 - INFO - HTTP Request: GET https://api.replicate.com/v1/predictions/hxag2wc0q5rj40cv3pkv6ekzz8 "HTTP/1.1 200 OK"
2025-12-14 11:17:56,697 - INFO - HTTP Request: GET https://api.replicate.com/v1/predictions/hxag2wc0q5rj40cv3pkv6ekzz8 "HTTP/1.1 200 OK"
2025-

Based on the provided context, it is not possible to determine the exact amount
spent on food distribution relative to the amount of food distributed. The
information given does not include any specific financial data or metrics on the
costs associated with the food distribution efforts. The context focuses on the
sources of donated food, the prevalence of food insecurity in the US, and the
impact of the pandemic on food access, but does not provide any quantitative
information on the financial aspects of the food distribution operations.
Without additional data on the costs incurred, it is not possible to calculate
the ratio of spending to the amount of food distributed.


Awesome! We have created an AI application that can successfully leverage knowledge from the source documents' text and images.

Next Steps
Explore advanced RAG workflows for other industries.
Experiment with other document types and larger datasets.
Optimize prompt engineering for better Granite responses.

In [17]:
query = "How many Mercies meals were distributed in Haiti in 2020?"

for doc in vector_db.as_retriever().invoke(query):

    print(doc)

    print("=" * 80) # Separator for clarity

page_content='Haiti  added  the  COVID-19  pandemic  to  its  list  of challenges in 2020. All airports, seaports, factories, and schools were closed for a time.
Most Haitian children receive their primary daily  nourishment  from  their  school  lunch.  MFB Haiti  was  able  to  get  Tender  Mercies  distributed through partnerships with faith-based schools. In 2020, Midwest Food Bank Haiti more than doubled shipments of food to Haiti. Over 160 tons of food relief were shipped, nearly three-quarters of which was Tender Mercies meals.' metadata={'source': 'https://midwestfoodbank.org/images/AR_2020_WEB2.pdf', 'doc_id': 36, 'refs': '#/texts/269 #/texts/270'}
page_content='Tender Mercies® is Midwest Food Bank's nutritious bagged meal of rice, beans, and soy protein, making a delicious meal. It is an integral part of fighting food insecurity in the United States. Tender Mercies is also a mainstay of our international  efforts.  In  East  Africa and Haiti, it is distributed to schools and 

In [18]:
query = "List five national board members"

for doc in vector_db.as_retriever().invoke(query):

    print(doc)

    print("=" * 80) # Separator for clarity

page_content='Merilee Baptiste, Executive Director Eric Sheldahl, Divisional Board President' metadata={'source': 'https://midwestfoodbank.org/images/AR_2020_WEB2.pdf', 'doc_id': 54, 'refs': '#/texts/432'}
page_content='John Whitaker, Executive Director Jim Gapinski, Divisional Board President' metadata={'source': 'https://midwestfoodbank.org/images/AR_2020_WEB2.pdf', 'doc_id': 64, 'refs': '#/texts/459'}
page_content='Karl Steidinger, Executive Director Stanley Sinn, Divisional Board President' metadata={'source': 'https://midwestfoodbank.org/images/AR_2020_WEB2.pdf', 'doc_id': 55, 'refs': '#/texts/434'}
page_content='Jerry Koehl, Divisional Board President' metadata={'source': 'https://midwestfoodbank.org/images/AR_2020_WEB2.pdf', 'doc_id': 69, 'refs': '#/texts/469'}


In [19]:
query = "Who is the CEO of Midwest Food Bank? Name the Vice-President of Human Resources"

for doc in vector_db.as_retriever().invoke(query):

    print(doc)

    print("=" * 80) # Separator for clarity

page_content='One  of  the  key  strengths  of  Midwest  Food  Bank is  its  volunteers.  They  are  the  life-blood  of  the organization.  From  leading  volunteer  groups  to driving  semi-trucks,  people  generously  give  of their time and talents to further MFB's mission. In 2020 volunteer service hours equaled 150 full-time employees.
The pandemic created safety challenges for MFB. Safety protocols were put in place at each location. Volunteer groups were limited to allow for social distancing. While we saw an increase in the amount of food needed, fewer volunteers were able to help.
Multiple  MFB  locations  received  the  invaluable assistance of the National Guard. They filled many of the volunteer positions vital to our operations driving trucks and fork-lifts and helping with food distributions. Their presence was a blessing.
300,898 HOURS OF SERVICE WERE VOLUNTEERED BY 17,930 INDIVIDUALS IN 2020.
Many volunteers demonstrated their courage and dedication by increasing their