Cell 1: Setup Environment Variables and API Key
The goal is to load the necessary API keys and environment variables required for embedding and generative AI services.



In [None]:
import warnings
warnings.filterwarnings("ignore")

# Load environment variables and API keys
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())  # Load local .env file
EMBEDDING_API_KEY = os.getenv("EMBEDDING_API_KEY")
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")


Cell 2: Connect to Weaviate
Establish a connection with the Weaviate instance and verify that the connection is ready.

In [None]:
import weaviate

# Connect to Weaviate and set environment variables
client = weaviate.connect_to_embedded(
    version="1.24.4",
    environment_variables={
        "ENABLE_MODULES": "backup-filesystem,multi2vec-palm",
        "BACKUP_FILESYSTEM_PATH": "/home/jovyan/work/L4/backups",
    },
    headers={
        "X-PALM-Api-Key": EMBEDDING_API_KEY,
    }
)

# Check if the client is ready
client.is_ready()


Cell 3: Restore Prevectorized Data
Restore a backup collection of prevectorized resources from the Weaviate instance.

# Restore a collection from a backup
client.backup.restore(
    backup_id="resources-img-and-vid",
    include_collections="Resources",
    backend="filesystem"
)

# Add a delay to ensure the collection is fully restored before proceeding
import time
time.sleep(5)


Cell 4: Preview Data Count
Retrieve the count of different media types (images, videos, etc.) in the collection.

In [None]:
from weaviate.classes.aggregate import GroupByAggregate

# Get the 'Resources' collection
resources = client.collections.get("Resources")

# Aggregate and count the number of items in each mediaType
response = resources.aggregate.over_all(
    group_by=GroupByAggregate(prop="mediaType")
)

# Print the counts of each media type
for group in response.groups:
    print(f"{group.grouped_by.value} count: {group.total_count}")


Cell 5: Retrieve Image from Query
Define a function that retrieves an image from the Weaviate collection based on a text query.

In [None]:
from IPython.display import Image
from weaviate.classes.query import Filter

# Function to retrieve an image based on a query
def retrieve_image(query):
    resources = client.collections.get("Resources")
    response = resources.query.near_text(
        query=query,
        filters=Filter.by_property("mediaType").equal("image"),  # Only return image objects
        return_properties=["path"],
        limit=1,
    )
    result = response.objects[0].properties
    return result["path"]  # Return the path to the image


Cell 6: Run Image Retrieval
Run the retrieve_image function with a specific query and display the result.

In [None]:
# Retrieve an image based on a query
img_path = retrieve_image("fishing with my buddies")

# Display the retrieved image
display(Image(img_path))


Cell 7: Set Up Google Generative AI
Configure the Google Generative AI client with the API key for the Gemini Vision model.

In [None]:
import google.generativeai as genai
from google.api_core.client_options import ClientOptions

# Set up the Google Generative AI library with the Vision model API key
genai.configure(
    api_key=GOOGLE_API_KEY,
    transport="rest",
    client_options=ClientOptions(
        api_endpoint=os.getenv("GOOGLE_API_BASE"),
    ),
)


Cell 8: Helper Functions
Create helper functions to format text as Markdown and call the Large Multimodal Model (LMM) to generate content based on an image.

In [None]:
import textwrap
import PIL.Image
from IPython.display import Markdown, Image

# Convert text to Markdown for better display in notebooks
def to_markdown(text):
    text = text.replace("•", "  *")
    return Markdown(textwrap.indent(text, "> ", predicate=lambda _: True))

# Function to call the LMM (Large Multimodal Model) for generating content from an image
def call_LMM(image_path: str, prompt: str) -> str:
    img = PIL.Image.open(image_path)  # Load the image
    model = genai.GenerativeModel("gemini-1.5-flash")  # Load the generative model
    response = model.generate_content([prompt, img], stream=True)  # Send the image and prompt
    response.resolve()  # Get the model's response
    return to_markdown(response.text)  # Format and return the response


Cell 9: Generate Image Description
Call the LMM to generate a detailed description of the retrieved image.

In [None]:
# Use the LMM to generate a description for the retrieved image
call_LMM(img_path, "Please describe this image in detail.")


Cell 10: Combine Multimodal RAG Workflow
Define a function that integrates both retrieval and generative capabilities in a single multimodal RAG process.

In [None]:
def mm_rag(query):
    # Step 1: Retrieve an image using Weaviate
    SOURCE_IMAGE = retrieve_image(query)
    display(Image(SOURCE_IMAGE))  # Display the retrieved image
    
    # Step 2: Generate a description using the LMM
    description = call_LMM(SOURCE_IMAGE, "Please describe this image in detail.")
    return description


Cell 11: Run Multimodal RAG Workflow
Run the entire multimodal RAG process with a specific query.

In [None]:
# Execute the multimodal RAG function with a sample query
mm_rag("paragliding through the mountains")


Cell 12: Close the Weaviate Client
Ensure the Weaviate client is closed after completing the task to free up resources.

In [None]:
# Close the Weaviate client connection
client.close()


The above code demonstrates the Multimodal Retrieval-Augmented Generation (MM-RAG) workflow using Weaviate and Google Gemini Pro Vision. This process involves two main steps:

Image Retrieval: The code connects to a Weaviate instance, restores prevectorized data, and allows image retrieval based on a text query. The retrieve_image() function fetches an image from the Weaviate database that matches a given query using a near_text search.

Image Description Generation: After retrieving the image, the code uses Google Gemini Pro Vision (a large multimodal model) to generate a detailed description of the image. This is done via the call_LMM() function, which takes the image and a prompt, calls the model, and outputs a description.

The workflow is encapsulated in the mm_rag() function, which integrates both steps: retrieving an image and generating its description based on a user query. The code concludes by closing the Weaviate client to free up resources.