# Multi-Modal RAG Hands-On Exercises

In those exercises two different PDFs will be provided to the RAG pipeline: `Explainable_machine_learning_prediction_of_edema_a.pdf` and `Modeling tumor size dynamics based on real‐world electronic health records.pdf`.


## Setup

In [None]:
import sys



sys.path.append("../../")

In [None]:
!pip install -r ../../requirements.txt

In [None]:
import os
import getpass
import json
from tqdm import tqdm

import numpy as np

from helpers.data_processing import SimpleChunker, PDFExtractor
from helpers.embedding import (
    OpenAITextEmbeddings,
    OpenAITextEmbeddingsAzure,
    ImageEmbeddings,
    ImageEmbeddingsForText,
)
from helpers.vectorstore import (
    ChromaDBVectorStore,
    VectorStoreRetriever,
)
from helpers.constants_and_data_classes import Roles
from helpers.llm import OpenAILLM, OpenAILLMAzure
from helpers.rag import Generator, DefaultRAG

In [None]:
data_folder = "../../data"

pdf_files = [
    "Explainable_machine_learning_prediction_of_edema_a.pdf",
    "Modeling tumor size dynamics based on real‐world electronic health records.pdf",
]

text_vector_store_collection = "text_collection"
image_vector_store_collection = "image_collection"

text_vector_store_full_collection = "text_collection_full"
image_vector_store_full_collection = "image_collection_full"

In [None]:
os.environ["OPENAI_API_KEY"] = getpass.getpass()

## If Azure Endpoint then you don't need the OPENAI_API_KEY but the following
# os.environ["AZURE_API_KEY"] = ""
# os.environ["AZURE_API_BASE"] = ""
# os.environ["AZURE_API_VERSION"] = ""

## Define the RAG pipeline

In [None]:
data_extractor = PDFExtractor()
chunker = SimpleChunker(max_chunk_size=1000)


text_chunks = []
image_chunks = []

for pdf_file in pdf_files:
    pdf_path = os.path.join(data_folder, pdf_file)
    _, text, images = data_extractor.extract_text_and_images(pdf_path)
    text_chunks_curr = chunker.chunk_text(text, {"source_text": pdf_file})
    image_chunks_curr = chunker.chunk_images(images, {"source_text": pdf_file})
    text_chunks.extend(text_chunks_curr)
    image_chunks.extend(image_chunks_curr)

In [None]:
# Check if both Azure environment variables exist
azure_endpoint = os.getenv("AZURE_API_BASE")
azure_api_key = os.getenv("AZURE_API_KEY")
if azure_endpoint and azure_api_key:
    text_embedding_model = OpenAITextEmbeddingsAzure()
    print("Using AzureOpenAI client")
else:
    text_embedding_model = OpenAITextEmbeddings()
    print("Using OpenAI client")

    
text_embeddings = text_embedding_model.get_embedding(
    [chunk.content for chunk in text_chunks]
)

In [None]:
image_embeddings = []

image_embedding_model = ImageEmbeddings()
for chunk in tqdm(image_chunks):
    image_embeddings.append(image_embedding_model.get_embedding(chunk.content))

image_embeddings = np.array(image_embeddings)

image_text_embedding_model = ImageEmbeddingsForText()

In [None]:
vector_store_text = ChromaDBVectorStore(text_vector_store_full_collection)
vector_store_text.insert_chunks(text_chunks, text_embeddings)

vector_store_image = ChromaDBVectorStore(image_vector_store_full_collection)
vector_store_image.insert_chunks(image_chunks, image_embeddings)

In [None]:
retriever = VectorStoreRetriever(
    text_embedding_model,
    vector_store_text,
    image_text_embedding_model,
    vector_store_image,
)

In [None]:
# Check if both Azure environment variables exist
azure_endpoint = os.getenv("AZURE_API_BASE")
azure_api_key = os.getenv("AZURE_API_KEY")
if azure_endpoint and azure_api_key:
    llm = OpenAILLMAzure(temperature=0.3)
    print("Using AzureOpenAI client")
else:
    llm = OpenAILLM(temperature=0.3)
    print("Using OpenAI client")

In [None]:
developer_prompt = """You are a helpful assistant, and your task is to answer questions using relevant chunks and images. Please first think step-by-step by mentioning which chunks you used and then answer the question. Organize your output in a json formatted as dict{"step_by_step_thinking": Str(explanation), "chunk_used": List(integers), "answer": Str{answer}}. Your responses will be read by someone without specialized knowledge, so please have a definite and concise answer."""
print(developer_prompt)

In [None]:
rag_template = """
Here are the relevant CHUNKS:
{context}

--------------------------------------------

Here is the USER QUESTION:
{query}

--------------------------------------------

Please think step-by-step and generate your output in json:
"""
print(rag_template)

In [None]:
generator = Generator(llm, developer_prompt, rag_template)

In [None]:
rag_without_images = DefaultRAG(
    llm=llm,
    text_embedding_model=text_embedding_model,
    text_vector_store=vector_store_text,
    generator=generator,
    params={"top_k_text": 5},
)

In [None]:
rag = DefaultRAG(
    llm=llm,
    text_embedding_model=text_embedding_model,
    text_vector_store=vector_store_text,
    image_text_embedding_model=image_text_embedding_model,
    image_vector_store=vector_store_image,
    generator=generator,
    params={"top_k_text": 5, "top_k_image": 3},
)

In [None]:
answer, sources, cost = rag.execute(
    "Here goes my amazing question!",
    {},
    verbose=True,
)

In [None]:
print(json.dumps(answer, indent=3))

In [None]:
# The chunks retrieved by the retriever:
print(len(sources))
print(sources[0])

In [None]:
print(cost)

# Hands-on Exercises

1. Explore the code
2. Test questions and evaluate answers
3. Discuss possible improvements
4. (Optional - Advanced) Implemented query expansion

## 1. Explore the code

Quickly go through the code and the notebooks to ensure you understand how each block works.

## 2. Test questions and evaluate answers

The second exercise consist of testing questions and evaluating the answers. To do so, use the `rag` and `rag_without_images` pipelines defined previously and use them as shown above.

### 2.1 Question about text (1/2)

Ask a question about `Explainable_machine_learning_prediction_of_edema_a.pdf` that can be answered with text. Use `rag_without_images`. 

Check the answer and verify that the chunks used belong to he correct document.

If you don't have any idea, you can ask "How did cumulative tepotinib dose impact edema predictions, and what insights did SHAP provide about this relationship?".

In [None]:
answer, sources, cost = rag_without_images.execute(
    "Here goes my amazing question!",
    {},
    verbose=True,
)

In [None]:
print(json.dumps(answer, indent=3))

In [None]:
print(len(sources))
for source in sources:
    print(source["chunk"].metadata)

### 2.2 Question about text (2/2)

Ask a question about `Modeling tumor size dynamics based on real‐world electronic health records.pdf` that can be answered with text. Use `rag_without_images`. 

Check the answer and verify that the chunks used belong to this document.

If you don't have any idea, you can use "What was the rationale for using an ON/OFF treatment effect model instead of a dose-dependent model?".

In [None]:
answer, sources, cost = rag_without_images.execute(
    "Here goes my amazing question about the second PDF!",
    {},
    verbose=True,
)

In [None]:
print(json.dumps(answer, indent=3))

In [None]:
print(len(sources))
for source in sources:
    print(source["chunk"].metadata)

### 2.3 Question about a plot

Find a question about a plot in one of the two documents that can not be answered using the text. 

First, ask the question to the only-text RAG pipeline (`rag_without_images`) and verify it can not answer it.

Second, ask it to the multi-modal RAG pipeline (`rag`) and check the answer. Verify that the chunks used belong to this document.

If you don't know which question to ask, you can try: "What is the lowest SHAP value observed for 'weight' on probability of severe edema?"

In [None]:
answer, sources, cost = rag_without_images.execute(
    "Here goes my amazing question about a plot!",
    {},
    verbose=True,
)

In [None]:
print(json.dumps(answer, indent=3))

In [None]:
answer, sources, cost = rag.execute(
    "Here goes my amazing question about a plot!",
    {},
    verbose=True,
)

In [None]:
print(json.dumps(answer, indent=3))

In [None]:
print(len(sources))
for source in sources:
    print(source["chunk"].metadata)

## 3. Discuss possible improvements

Discuss how the pipeline could be improved to achieve better answers and identify the current pain-points. How will it be different if using a different architecture of multi-modal RAG? 

If time permits, try to change some parameters of the pipeline to see how it impacts the result.

## 4. (Optional - Advanced) Implement query expansion

Implement query expansion by defining the prompt for the LLM to generate alternative queries to search more broadly in the vector store.

You should provide a developer prompt, explaining to the LLM it's role (it has to find rephrasing of the query).

And you should write a template for the query, stating it to provide the alternative queries based on the user query. In the template you can provide `{query}` to give it the user query and `{expansion_number}` for the number of alternative queries.

The LLM should write each query on a new line.

Try the results of one of the previous question, how does it impact the performance? And how does it impact the cost?

In [None]:
query_expansion_developer_message = {
    "role": Roles.DEVELOPER,
    "content": "Explain the role here",
}

query_expansion_template_query = """
        Write the template here, use {query} and {expansion_number}
        As a reminder each expanded query should be on its own line
    """

In [None]:
rag_with_query_expansion = DefaultRAG(
    llm=llm,
    text_embedding_model=text_embedding_model,
    text_vector_store=vector_store_text,
    image_text_embedding_model=image_text_embedding_model,
    image_vector_store=vector_store_image,
    generator=generator,
    params={"top_k_text": 5, "top_k_image": 1, "number_query_expansion": 3},
    query_expansion_developer_message=query_expansion_developer_message,
    query_expansion_template_query=query_expansion_template_query,
)

In [None]:
answer, sources, cost = rag_with_query_expansion.execute(
    "Here goes my amazing question about!",
    {},
    verbose=True,
)

In [None]:
print(json.dumps(answer, indent=3))

In [None]:
print(cost)

----------------