## Gemini, Qdrant and LlamaIndex - Multimodal RAG

### Installation

In [1]:
!pip install llama-index
!pip install 'google-generativeai>=0.3.0' qdrant_client

!pip install llama-index-multi-modal-llms-gemini
!pip install llama-index-vector-stores-qdrant
!pip install llama-index-embeddings-gemini

Installing collected packages: llama-index-embeddings-gemini
Successfully installed llama-index-embeddings-gemini-0.1.5


### Set up Gemini API and Check available Models

In [3]:
import google.generativeai as genai

In [8]:
import os
from getpass import getpass

GOOGLE_API_KEY = getpass()
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY

··········


In [4]:
for m in genai.list_models():
  if 'generateContent' in m.supported_generation_methods:
    print(m.name)

models/gemini-1.0-pro
models/gemini-1.0-pro-001
models/gemini-1.0-pro-latest
models/gemini-1.0-pro-vision-latest
models/gemini-pro
models/gemini-pro-vision


## Data Loading and extraction

Download few png images of Famous Indian places and build a knowledge base

In [6]:
from llama_index.multi_modal_llms.gemini import GeminiMultiModal
from llama_index.core.program import MultiModalLLMCompletionProgram
from llama_index.core.output_parsers import PydanticOutputParser
from llama_index.core.schema import TextNode
from llama_index.core import SimpleDirectoryReader

In [11]:
from pydantic import BaseModel
from PIL import Image
import matplotlib.pyplot as plt

class Indian_Places(BaseModel):
    city_name: str
    state_name: str
    famous_food: str
    history: str
    review: str
    description: str
    nearby_tourist_places: str

In [14]:
documents = SimpleDirectoryReader("./indian_places")
documents = documents.load_data()

## Pydantic Multimodal Initiallization for Gemini Multimodal

In [15]:
prompt_template_str = """\
    You are an AI assistant your job is to summarize images, tables and text CONTEXT for retrieval \
    You MUST treat this job is coherent and honestly  \
    You MUST return the answer with json format \
"""

def pydantic_gemini(
    model_name, output_class, image_documents, prompt_template_str
):
    gemini_llm = GeminiMultiModal(model_name=model_name)
    llm_program = MultiModalLLMCompletionProgram.from_defaults(
        output_parser=PydanticOutputParser(output_class),
        image_documents=image_documents,
        prompt_template_str=prompt_template_str,
        multi_modal_llm=gemini_llm,
        verbose=True,
    )
    response = llm_program()
    return response

In [16]:
type(documents)

list

## Extract data in JSON format

In [27]:
from PIL import Image

In [17]:
results = []
for img_doc in documents:
    pydantic_response = pydantic_gemini(
        "models/gemini-pro-vision",
        Indian_Places,
        [img_doc],
        prompt_template_str,
    )
    if "coimbatore" in img_doc.image_path:
        for r in pydantic_response:
            print(r)
    results.append(pydantic_response)

[1;3;38;2;90;149;237m> Raw output:  {
  "city_name": "Coimbatore",
  "state_name": "Tamil Nadu",
  "famous_food": "South Indian",
  "history": "Coimbatore is a city in the Indian state of Tamil Nadu. It is the second largest city in the state after Chennai. Coimbatore is known for its textile industry and is often referred to as the \"Manchester of South India\". The city is also home to several educational institutions and research centers.",
  "review": "Coimbatore is a beautiful city with a rich history and culture. The city is home to several temples, mosques, and churches. The climate is tropical and the city experiences hot summers and mild winters. The city is well-connected by air, rail, and road. Coimbatore is a major industrial and commercial center and is home to several large corporations. The city is also a major educational center and is home to several universities and colleges.",
  "description": "Coimbatore is a city in the Indian state of Tamil Nadu. It is the second

In [18]:
nodes = []
for res in results:
    text_node = TextNode()
    metadata = {}
    for r in res:
        if r[0] == "description":
            text_node.text = r[1]
        else:
            metadata[r[0]] = r[1]
    text_node.metadata = metadata
    nodes.append(text_node)

## Create Qdrant Client to store the knowledge base

In [7]:
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import Settings
from llama_index.core import StorageContext
import qdrant_client


client = qdrant_client.QdrantClient(path="qdrant_gemini_3")
vector_store = QdrantVectorStore(client=client, collection_name="collection")

In [19]:
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.embeddings.gemini import GeminiEmbedding
from llama_index.llms.gemini import Gemini

## For RAG setup Gemini Embeddings and LLM

In [21]:
Settings.embed_model = GeminiEmbedding(
    model_name="models/embedding-001", api_key=GOOGLE_API_KEY
)
Settings.llm = Gemini(api_key=GOOGLE_API_KEY)

In [22]:
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex(
    nodes=nodes,
    storage_context=storage_context,
)

In [23]:
query_engine = index.as_query_engine(
    similarity_top_k=1,
)

In [29]:
response = query_engine.query(
    "which place belongs to Coimbatore from the given context, and tell about that given place history. Also tell whats the best food one can eat there?"
)
print(response)

The Adiyogi Shiva statue is a 112-foot tall statue of Shiva located in Coimbatore, Tamil Nadu. It is the largest Shiva statue in the world. The statue was consecrated on 24 February 2017 by Sadhguru Jaggi Vasudev, the founder of the Isha Foundation.

The statue is made of concrete and steel and is covered with 500 copper plates. The copper plates were donated by devotees from all over the world. The statue is a symbol of peace and unity and is a popular tourist destination.

The best food to eat in Coimbatore is South Indian food. South Indian food is known for its use of spices and flavors. Some of the most popular South Indian dishes include idli, dosa, vada, and sambar.
