Cell 1: Setup Environment Variables and API Keys
The aim is to load environment variables and configure the API keys for multimodal embeddings and text embeddings.

In [None]:
import warnings
warnings.filterwarnings("ignore")

# Load environment variables and API keys
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())  # Read local .env file

# Get API keys from environment variables
MM_EMBEDDING_API_KEY = os.getenv("EMBEDDING_API_KEY")
TEXT_EMBEDDING_API_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_BASEURL = os.getenv("OPENAI_BASE_URL")


Cell 2: Connect to Weaviate
Establish a connection to the Weaviate instance and configure it to use multi-vector spaces (text and image vectors).

In [None]:
import weaviate

# Connect to Weaviate and set the vectorizer modules
client = weaviate.connect_to_embedded(
    version="1.24.4",
    environment_variables={
        "ENABLE_MODULES": "multi2vec-palm,text2vec-openai"
    },
    headers={
        "X-PALM-Api-Key": MM_EMBEDDING_API_KEY,
        "X-OpenAI-Api-Key": TEXT_EMBEDDING_API_KEY,
        "X-OpenAI-BaseURL": OPENAI_BASEURL
    }
)

# Check if the client is ready
client.is_ready()


Cell 3: Create Multivector Collection
Create a Movies collection in Weaviate with properties for text and images, and configure the vector spaces for text and image data.

In [None]:
from weaviate.classes.config import Configure, DataType, Property

# Create the 'Movies' collection with text and image properties
client.collections.create(
    name="Movies",
    properties=[
        Property(name="title", data_type=DataType.TEXT),
        Property(name="overview", data_type=DataType.TEXT),
        Property(name="vote_average", data_type=DataType.NUMBER),
        Property(name="release_year", data_type=DataType.INT),
        Property(name="tmdb_id", data_type=DataType.INT),
        Property(name="poster", data_type=DataType.BLOB),
        Property(name="poster_path", data_type=DataType.TEXT),
    ],
    
    # Configure the vector spaces for text and images
    vectorizer_config=[
        Configure.NamedVectors.text2vec_openai(
            name="txt_vector",
            source_properties=["title", "overview"],
        ),
        Configure.NamedVectors.multi2vec_palm(
            name="poster_vector",
            image_fields=["poster"],
            project_id="semi-random-dev",
            location="us-central1",
            model_id="multimodalembedding@001",
            dimensions=1408,
        ),
    ]
)


Cell 4: Load Movie Data
Load movie data from a JSON file to insert into the Weaviate collection.



In [None]:
import pandas as pd

# Load movie data from JSON file
df = pd.read_json("movies_data.json")
df.head()  # Display the first few rows of the data


Cell 5: Helper Function for Image Processing
Define a helper function to convert image files to base64 encoding for inserting image data into Weaviate.

In [None]:
import base64

# Convert an image file to base64 encoding
def toBase64(path):
    with open(path, 'rb') as file:
        return base64.b64encode(file.read()).decode('utf-8')


In [None]:
from weaviate.util import generate_uuid5

# Get the 'Movies' collection from Weaviate
movies = client.collections.get("Movies")

# Batch process to insert movie data
with movies.batch.rate_limit(20) as batch:
    for index, movie in df.iterrows():
        
        # Skip the movie if it already exists
        if movies.data.exists(generate_uuid5(movie.id)):
            print(f'{index}: Skipping insert. The movie "{movie.title}" is already in the database.')
            continue
        
        print(f'{index}: Adding "{movie.title}"')
        
        # Path to the movie poster image file
        poster_path = f"./posters/{movie.id}_poster.jpg"
        posterb64 = toBase64(poster_path)  # Convert poster to base64

        # Build the movie object with text and image data
        movie_obj = {
            "title": movie.title,
            "overview": movie.overview,
            "vote_average": movie.vote_average,
            "tmdb_id": movie.id,
            "poster_path": poster_path,
            "poster": posterb64
        }

        # Add object to batch queue
        batch.add_object(
            properties=movie_obj,
            uuid=generate_uuid5(movie.id),
        )

# Check for any failed inserts
if len(movies.batch.failed_objects) > 0:
    print(f"Failed to import {len(movies.batch.failed_objects)} objects")
else:
    print("Import complete with no errors")


Cell 7: Text Search in Text Vector Space
Perform a semantic text search on the movie titles and overviews.

In [None]:
from IPython.display import Image

# Perform a text search in the text vector space
response = movies.query.near_text(
    query="Movie about lovable cute pets",
    target_vector="txt_vector",
    limit=3,
)

# Display the search results
for item in response.objects:
    print(item.properties["title"])
    print(item.properties["overview"])
    display(Image(item.properties["poster_path"], width=200))


Cell 8: Text Search in Poster Vector Space
Perform a semantic search using text queries in the image (poster) vector space.

In [None]:
# Perform a text query but search in the poster vector space
response = movies.query.near_text(
    query="Movie about lovable cute pets",
    target_vector="poster_vector",
    limit=3,
)

# Display the search results
for item in response.objects:
    print(item.properties["title"])
    print(item.properties["overview"])
    display(Image(item.properties["poster_path"], width=200))


Cell 9: Image Search in Poster Vector Space
Use an image as a query to search in the poster vector space.

In [None]:
# Display the input image for the query
Image("test/spooky.jpg", width=300)

# Perform an image-based search in the poster vector space
response = movies.query.near_image(
    near_image=toBase64("test/spooky.jpg"),
    target_vector="poster_vector",
    limit=3,
)

# Display the search results
for item in response.objects:
    print(item.properties["title"])
    display(Image(item.properties["poster_path"], width=200))


Cell 10: Another Image Search Example
Perform another image-based search using a different image as the query.



In [None]:
# Display the input image for the query
Image("test/superheroes.png", width=300)

# Perform an image-based search in the poster vector space
response = movies.query.near_image(
    near_image=toBase64("test/superheroes.png"),
    target_vector="poster_vector",
    limit=3,
)

# Display the search results
for item in response.objects:
    print(item.properties["title"])
    display(Image(item.properties["poster_path"], width=200))


Cell 11: Close the Weaviate Client
Ensure the Weaviate client is closed to free up resources.

In [None]:
# Close the Weaviate client
client.close()


Description of the Code:
This notebook demonstrates the creation of a Multimodal Recommender System using Weaviate. The system uses both text embeddings and image embeddings to perform semantic searches. The process includes:

Environment Setup: Load necessary API keys and configure access to OpenAI and Weaviate services.
Collection Creation: A Weaviate collection (Movies) is created, with properties for movie titles, overviews, ratings, and posters. The collection is configured with two vector spaces: one for text (titles and overviews) and one for images (posters).
Data Insertion: Movie data, including text and poster images, are inserted into the collection in batch mode.
Multimodal Search: The system performs text-based searches in both the text and image vector spaces, as well as image-based searches using movie posters as the query.
Closing Resources: After the search tasks are completed, the Weaviate client is closed to free up resources.
This system allows users to search for movies using both text queries and image queries, demonstrating a multimodal recommender approach.






