### Building RAG MultiModal Search: Weaviate + OLLAMA + CLIP Multimodal Embedding Model.

In [93]:
from weaviate.classes.config import Configure, Property, DataType, Multi2VecField
from weaviate.util import generate_uuid5
from tqdm import tqdm
from pathlib import Path
from PIL import Image
import weaviate
import pandas as pd
import base64
import weaviate.classes.query as wq


In [22]:
# Connect to the local Weaviate instance
client = weaviate.connect_to_local()

In [23]:
assert client.is_live() # Checks if the Weaviate server is live.

### ◻️ Create `WomenShoesMM` Collection adding Generative Integration Capability
A generative integration in Weaviate refers to the ability to connect Weaviate with external generative AI models (such as LLMs) to enable retrieval augmented generation (RAG) functions. Weaviate offers different providers, such as OpenAI, Cohere, Anthropic, Mistral, NVIDIA, Anyscale, FriendliAI, xAI, OctoAI, AWS, and more, to configure a collection in Weaviate to use a generative module. Each collection can be set up with a generative module independently of its vectorizer module. 

⚠️ Important: 
* `.Vectorizer` (Deprecated):
In older versions of the Weaviate Python client (before v4.16.0), you would use Configure.Vectorizer to specify the vectorizer module for your collection. 
```bash
vectorizer_config=Configure.Vectorizer.text2vec_openai()
```

* `.Vectors` (Current):
Starting with Weaviate Python client v4.16.0, the API was updated to use Configure.Vectors instead. This new approach is more flexible and supports both single and multiple named vector.
```bash
vector_config=Configure.Vectors.multi2vec_clip(
    image_fields=[Multi2VecField(name="image")],
    text_fields=[Multi2VecField(name="description")]
)
```

⚠️ Important:  To apply changes to your docker-compose.yml (such as adding generative-ollama to ENABLE_MODULES), you need to restart your Weaviate container so it picks up the new environment variable.The standard way to do this is:

1. Stop the running containers
```bash
docker compose down
```
2. Start the containers again
```bash
docker compose up -d
```

This will recreate the Weaviate container with the updated configuration. You do not need to rebuild the image unless you have changed the Dockerfile itself; updating environment variables in docker-compose.yml and restarting is sufficient Modules configuration.

In [110]:
# Check if a collection named "Animals" exists on the Weaviate server. If the collection exists, delete it.
if(client.collections.exists("WomenShoesMM")):
    client.collections.delete("WomenShoesMM")

In [111]:
# Create a WomenShoesMM collection with multi-modal vectorization and generative AI capabilities
client.collections.create(
    name="WomenShoesMM",
    properties=[
        Property(name="name", data_type=DataType.TEXT),
        Property(name="description", data_type=DataType.TEXT),
        Property(name="currency", data_type=DataType.TEXT),
        Property(name="price", data_type=DataType.NUMBER),
        Property(name="image_path", data_type=DataType.TEXT),
        Property(name="image", data_type=DataType.BLOB),
    ],
    vector_config=Configure.Vectors.multi2vec_clip(
        image_fields=[Multi2VecField(name="image")],
        text_fields=[Multi2VecField(name="description")]
    ),
    generative_config=Configure.Generative.ollama(
        api_endpoint="http://host.docker.internal:11434",  # Adjust as needed for your setup
        model="llava7b"  # Replace with your preferred Ollama model
    )
)

<weaviate.collections.collection.sync.Collection at 0x1edbac63310>

### ◻️ Add women shoes info to WomenShoesMM Collection

In [112]:
# Load the dataset from a CSV file
women_df = pd.read_csv("data/women_shoes.csv", sep=",")
print(women_df.shape)
women_df.head(2)

(277, 8)


Unnamed: 0,name,description,price,currency,terms,image_downloads,extracted_footwear_type,iso_image
0,PEARL HEELED SLINGBACKS,Slingback heels with front pearl detail. Point...,79.9,USD,shoes,"['82aaf07d-4981-46d9-ab54-56d82c40cfa5', 'ea8b...",SLINGBACKS,ea8b9976-7cda-4e4c-9926-b357b1aa65c7
1,LEATHER BALLET FLATS,Mary Jane style leather ballet flats. Buckled ...,59.9,USD,shoes,"['3c5a4b1d-407b-4683-8489-795e1177aa4a', '769a...",BALLET FLATS,769a6f42-94a6-451b-9014-5fcf1100d212


In [113]:
# Get the collection
women_shoes_mm = client.collections.use("WomenShoesMM")

In [114]:
# Enter context manager
with women_shoes_mm.batch.fixed_size(50) as batch: 
    # Loop through the data
    for i, row in tqdm(women_df.iterrows()):
        # Convert image to base64
        img_dir = 'data/iso_women_shoes/'
        img_path = (img_dir + f"{row['iso_image']}.jpg")
        with open(img_path, "rb") as file:
            image_b64 = base64.b64encode(file.read()).decode("utf-8")
            #image_b64 = convert_toBase64(img_path) 

        # Build the object payload
        shoe_obj = {
            "name": row["name"],
            "description": row["description"],
            "price": row["price"],
            "currency": row["currency"],
            "image_path": img_path,
            "image": image_b64
        }

        # Add data to the batch
        batch.add_object(
            properties=shoe_obj,
            uuid=generate_uuid5(row["iso_image"])
        )# Batcher automatically sends batches

        
# Check for failed objects
if len(women_shoes_mm.batch.failed_objects) > 0:
    print(f"Failed to import {len(women_shoes_mm.batch.failed_objects)} objects")
    for failed in women_shoes_mm.batch.failed_objects:
        print(f"e.g. Failed to import object with error: {failed.message}")
        

277it [00:30,  9.21it/s] 


### ◻️ Retrieval Augmented Generation (RAG) 
Retrieval Augmented Generation (RAG) combines information retrieval with generative AI models.

In Weaviate, a RAG query consists of two parts: a search query, and a prompt for the model. Weaviate first performs the search, then passes both the search results and your prompt to a generative AI model before returning the generated response.

After configuring the generative AI integration, perform RAG operations, either with the `single prompt` or `grouped task` method.

`query`
- Purpose: Specifies the search criteria for retrieving objects from your Weaviate collection.
- How it works: The query is vectorized (by the model provider integration) and used to find the most relevant objects in the collection.


`single_prompt`
- Purpose: Generates a separate output for each object in the search results.
- How it works: You provide a prompt with placeholders (e.g., {title}) that are filled with properties from each object.
The generative model produces an output for each object individually.
- Result: Each object in the response includes its own generated output.

![single_prompt.png](https://docs.weaviate.io/assets/images/integration_ollama_rag_single-e404950fa7a2120110acf80c697ef6ff.png)


`grouped_task`
- Purpose: Generates a single output for the entire set of search results.
- How it works: You provide a prompt that considers the whole group of results.The generative model produces one output based on all the objects together.
- Result: The response contains one generated output for the group, plus the list of objects.

![grouped_task.png.png](https://docs.weaviate.io/assets/images/integration_ollama_rag_grouped-190b744e3fcbe9dfb0a4ac25ae3e0792.png)

🔗 Resource: [Ollama Generative AI with Weaviate](https://docs.weaviate.io/weaviate/model-providers/ollama/generative#retrieval-augmented-generation)

In [91]:
# Verify the number of objects in the collection
response_0 = women_shoes_mm.aggregate.over_all(total_count=True)
print(response_0.total_count)

277


## Multimodal Search

In [None]:
mm_response_0 = women_shoes_mm.generate.near_text(
    query='Show sandals with a heel height of 9cm',
    limit=1,
    single_prompt="Describe each shoe using: {description}, {price}, {currency}."
)

In [None]:
# Inspect the response
for o in mm_response_0.objects:
    display(Image.open(o.properties["image_path"]).resize((150, 100)))
    print(o.properties["description"], o.properties["price"], o.properties["currency"])  # Print the title
    print(o.generated)  # Print the generated text 