# Using Cohere Embed Models with Azure AI Search (Text and Image Embeddings)

This notebook demonstrates how to use the Cohere Embed model deployed on Azure AI to generate embeddings for both text and image data. These embeddings are then stored in Azure AI Search for vector similarity search.

## Prerequisites

- **Azure Account**: Ensure you have an active Azure account with access to Azure AI services and Azure Cognitive Search.
- **Cohere Embed Model**: Deploy the Cohere Embed model in Azure AI.
- **API Keys and Endpoints**: Obtain the necessary API keys and endpoints for Azure AI and Azure Cognitive Search.
- **Sample Images**: Have a few sample images available for embedding.

## Install Required Packages

First, install the necessary Python packages.

```python
!pip install cohere azure-search-documents python-dotenv


## Install required libraries

In [1]:
! pip install --quiet azure-search-documents==11.6.0b6
! pip install --quiet cohere python-dotenv azure-identity tqdm requests tenacity


[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip

[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


## Import Libraries

In [60]:
import base64
import cohere
import json
import numpy as np
import os
import requests
from azure.core.credentials import AzureKeyCredential
from azure.identity import DefaultAzureCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    AIStudioModelCatalogName,
    AzureMachineLearningParameters,
    AzureMachineLearningVectorizer,
    HnswAlgorithmConfiguration,
    HnswParameters,
    SearchField,
    SearchFieldDataType,
    SearchIndex,
    SearchableField,
    SimpleField,
    VectorEncodingFormat,
    VectorSearch,
    VectorSearchAlgorithmKind,
    VectorSearchAlgorithmMetric,
    VectorSearchProfile,
)
from azure.search.documents.models import (
    VectorizableTextQuery,
    VectorizedQuery,
    VectorQuery,
)
from dotenv import load_dotenv
from tenacity import retry, stop_after_attempt, wait_exponential
from tqdm.notebook import tqdm
from typing import Dict, List, Optional


# Load environment variables
load_dotenv()

# Azure AI Studio Cohere Configuration
AZURE_AI_STUDIO_COHERE_EMBED_KEY = os.getenv("AZURE_AI_STUDIO_COHERE_EMBED_KEY")
AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT = os.getenv("AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT")
AZURE_AI_STUDIO_COHERE_COMMAND_ENDPOINT= os.getenv("AZURE_AI_STUDIO_COHERE_COMMAND_ENDPOINT")
AZURE_AI_STUDIO_COHERE_COMMAND_KEY=os.getenv("AZURE_AI_STUDIO_COHERE_COMMAND_KEY")

# Azure Cognitive Search Configuration
AZURE_SEARCH_SERVICE_ENDPOINT = os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT")
AZURE_SEARCH_ADMIN_KEY = os.getenv("AZURE_SEARCH_ADMIN_KEY")
INDEX_NAME = "multimodal-cohere-index"

# Cohere Model Information
EMBEDDING_MODEL_NAME = "embed-english-v3.0"


## Authenticate Clients

In [61]:
# Authenticate Azure Cognitive Search client
azure_search_credential = AzureKeyCredential(AZURE_SEARCH_ADMIN_KEY)

# Initialize Azure Search clients
index_client = SearchIndexClient(
    endpoint=AZURE_SEARCH_SERVICE_ENDPOINT,
    credential=azure_search_credential
)

search_client = SearchClient(
    endpoint=AZURE_SEARCH_SERVICE_ENDPOINT,
    index_name=INDEX_NAME,
    credential=azure_search_credential
)

# Initialize Cohere client
cohere_client = cohere.Client(
    api_key=AZURE_AI_STUDIO_COHERE_EMBED_KEY,
    base_url=AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT
)


## Sample Data
### Text Documents and Image URLs

In [62]:
documents = [
    {
        "id": "1",
        "caption": "The image showcases a peaceful rural scene with several cows leisurely grazing in a sunlit pasture. The golden light of early morning creates a warm ambiance, illuminating the cows as they feed on the lush grass. In the background, soft shadows and a hazy atmosphere enhance the tranquil and idyllic nature of this farm setting",
        "imageUrl": "https://portal.vision.cognitive.azure.com/dist/assets/ImageCaptioningSample1-bbe41ac5.png"
    },
    {
        "id": "2",
        "caption": "The image captures a solitary surfer standing on a rocky outcrop, gazing out at the ocean waves. Clad in a wetsuit and holding a surfboard, the figure embodies a spirit of adventure and anticipation as they prepare to embrace the surf. With the soft blue sky and gentle waves in the background, the scene conveys a sense of tranquility and the thrill of the ocean.",
        "imageUrl": "https://portal.vision.cognitive.azure.com/dist/assets/ImageCaptioningSample2-72b3c1ca.png"
    },
    {
        "id": "3",
        "caption": "The image showcases the ornate architecture of a grand building, crowned by intricate sculptures that convey a sense of justice and authority. Prominently featured is a figure representing Lady Justice, holding scales, symbolizing fairness and the rule of law. Set against a backdrop of a bright blue sky, the details of the sculpture and the building's design reflect a commitment to justice and civic pride.",
        "imageUrl": "https://portal.vision.cognitive.azure.com/dist/assets/ImageCaptioningSample3-e03062c2.png"
    },
    {
        "id": "4",
        "caption": "The image features a focused baseball player poised at the plate, ready to swing his bat. Dressed in a gray uniform with a striking red helmet, he embodies determination and athleticism under the stadium lights. The blurred background of the baseball field highlights the intensity of the moment, capturing the excitement of the game.",
        "imageUrl": "https://portal.vision.cognitive.azure.com/dist/assets/ImageCaptioningSample4-0559774f.png"
    }
]


## Functions to Encode Images and Generate Embeddings
### Encode Image to Base64

In [63]:
def encode_image_to_base64(image_url: str) -> str:
    """
    Convert image URL to base64 data URI format required by Cohere.
    """
    try:
        response = requests.get(image_url, timeout=10)
        response.raise_for_status()
        content_type = response.headers['Content-Type']
        base64_data = base64.b64encode(response.content).decode('utf-8')
        return f"data:{content_type};base64,{base64_data}"
    except Exception as e:
        print(f"Error encoding image {image_url}: {e}")
        return None


### Generate Text Embeddings

In [64]:
def generate_text_embedding(text: str, model=EMBEDDING_MODEL_NAME) -> List[float]:
    response = cohere_client.embed(
        texts=[text],
        model=model,
        input_type="search_document",
        truncate="NONE"
    )
    return response.embeddings[0]


### Generate Image Embeddings

In [65]:
def generate_image_embedding(image_base64: str, model=EMBEDDING_MODEL_NAME) -> List[float]:
    response = cohere_client.embed(
        images=[image_base64],
        model=model,
        input_type="image",
        truncate="NONE"
    )
    return response.embeddings[0]


## Process and Embed Data

In [66]:
processed_documents = []

for doc in tqdm(documents, desc="Processing documents"):
    caption = doc["caption"]
    image_url = doc["imageUrl"]
    
    # Generate text embedding
    caption_embedding = generate_text_embedding(caption)
    
    # Encode image and generate image embedding
    image_base64 = encode_image_to_base64(image_url)
    if image_base64:
        image_embedding = generate_image_embedding(image_base64)
    else:
        image_embedding = None
    
    # Add embeddings to document
    doc["captionVector"] = caption_embedding
    doc["imageVector"] = image_embedding
    
    processed_documents.append(doc)


Processing documents:   0%|          | 0/4 [00:00<?, ?it/s]

## Create an Azure AI Search Index

In [67]:
# Define the fields for the index
fields = [
    SimpleField(name="id", type=SearchFieldDataType.String, key=True),
    SimpleField(
        name="imageUrl",
        type=SearchFieldDataType.String,
        retrievable=True,
        filterable=True
    ),
    SimpleField(
        name="caption",
        type=SearchFieldDataType.String,
        searchable=True,
        retrievable=True
    ),
    SearchField(
        name="imageVector",
        type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
        searchable=True,
        vector_search_dimensions=1024,  # Adjust if your model uses different dimensions
        vector_search_profile_name="vector_profile"
    ),
    SearchField(
        name="captionVector",
        type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
        searchable=True,
        vector_search_dimensions=1024,  # Adjust if your model uses different dimensions
        vector_search_profile_name="vector_profile"
    )
]

# Define vector search configuration
vector_search = VectorSearch(
    algorithms=[
        HnswAlgorithmConfiguration(
            name="hnsw_config",
            kind=VectorSearchAlgorithmKind.HNSW,
            parameters=HnswParameters(
                m=4,
                ef_construction=400,
                ef_search=500,
                metric=VectorSearchAlgorithmMetric.COSINE,
            ),
        )
    ],
    profiles=[
        VectorSearchProfile(
            name="vector_profile", 
            algorithm_configuration_name="hnsw_config",
            vectorizer_name="vectorizer",
        )
    ],
    vectorizers=[
        AzureMachineLearningVectorizer(
            vectorizer_name="vectorizer",
            aml_parameters=AzureMachineLearningParameters(
                scoring_uri=AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT,
                authentication_key=AZURE_AI_STUDIO_COHERE_EMBED_KEY,
                model_name=AIStudioModelCatalogName.COHERE_EMBED_V3_ENGLISH,
            ),
        )
    ]
)

# Create the index
index = SearchIndex(
    name=INDEX_NAME,
    fields=fields,
    vector_search=vector_search
)

# Create or update the index
try:
    result = index_client.create_or_update_index(index)
    print(f"Index {INDEX_NAME} created successfully")
except Exception as e:
    print(f"Error creating index: {str(e)}")

# Optional: Verify index fields
try:
    index_info = index_client.get_index(INDEX_NAME)
    print("\nIndex fields:")
    for field in index_info.fields:
        print(f"- {field.name} ({field.type})")
except Exception as e:
    print(f"Error retrieving index fields: {str(e)}")


Index multimodal-cohere-index created successfully

Index fields:
- id (Edm.String)
- imageUrl (Edm.String)
- caption (Edm.String)
- imageVector (Collection(Edm.Single))
- captionVector (Collection(Edm.Single))


## Upload Documents to the Index

In [42]:
try:
    result = search_client.merge_or_upload_documents(documents=processed_documents)
    if result[0].succeeded:
        print("Documents uploaded successfully.")
    else:
        print("Error uploading documents.")
except Exception as e:
    print(f"Error uploading documents: {e}")


Documents uploaded successfully.


## Perform multimodal vector search

### Text to Text Vector Search
This section performs a vector search to find similar text entries based on the query.

In [47]:
# Example usage of each search function
query_text = "sports activities like skateboarding"
image_url = "https://portal.vision.cognitive.azure.com/dist/assets/ImageCaptioningSample5-14b26724.png" ## Image of man on skateboard

# Helper function to display results
def display_results(results):
    for result in results:
        print(f"Caption: {result['caption']}")
        print(f"Score: {result['@search.score']}")
        print(f"URL: {result['imageUrl']}")
        display(HTML(f'<img src="{result["imageUrl"]}" style="width:200px;"/>'))
        print("-" * 50)

In [48]:
# Text to Text Vector Search
def text_to_text_search(query_text):
    # Generate text embedding
    text_embedding = generate_text_embedding(query_text)
    
    # Define the text vector query
    text_vector_query = VectorizedQuery(
        vector=text_embedding,
        k_nearest_neighbors=1,
        fields="captionVector"
    )
    
    # Perform search
    results = search_client.search(
        search_text=None, vector_queries=[text_vector_query], top=1
    )
    display_results(results)

# Example usage
print("Text to Text Vector Search Results:")
text_to_text_search(query_text)


Text to Text Vector Search Results:
Caption: The image features a focused baseball player poised at the plate, ready to swing his bat. Dressed in a gray uniform with a striking red helmet, he embodies determination and athleticism under the stadium lights. The blurred background of the baseball field highlights the intensity of the moment, capturing the excitement of the game.
Score: 0.5928263
URL: https://portal.vision.cognitive.azure.com/dist/assets/ImageCaptioningSample4-0559774f.png


--------------------------------------------------


### Text to Image Vector Search
This section performs a vector search to find images related to the query text.

In [49]:
# Text to Image Vector Search
def text_to_image_search(query_text):
    # Generate text embedding
    text_embedding = generate_text_embedding(query_text)
    
    # Define the text-to-image vector query
    text_to_image_query = VectorizedQuery(
        vector=text_embedding,
        k_nearest_neighbors=1,
        fields="imageVector"
    )
    
    
    # Perform search
    results = search_client.search(
        search_text=None, vector_queries=[text_to_image_query], top=1
    )
    display_results(results)

# Example usage
print("Text to Image Vector Search Results:")
text_to_image_search(query_text)


Text to Image Vector Search Results:
Caption: The image captures a solitary surfer standing on a rocky outcrop, gazing out at the ocean waves. Clad in a wetsuit and holding a surfboard, the figure embodies a spirit of adventure and anticipation as they prepare to embrace the surf. With the soft blue sky and gentle waves in the background, the scene conveys a sense of tranquility and the thrill of the ocean.
Score: 0.612399
URL: https://portal.vision.cognitive.azure.com/dist/assets/ImageCaptioningSample2-72b3c1ca.png


--------------------------------------------------


### Image to Text Vector Search
This section performs a vector search to find text entries related to a given image.

In [50]:
# Image to Text Vector Search
def image_to_text_search(image_url):
    # Generate image embedding
    image_base64 = encode_image_to_base64(image_url)
    image_embedding = generate_image_embedding(image_base64)
    
    # Define the image-to-text vector query
    image_to_text_query = VectorizedQuery(
        vector=image_embedding,
        k_nearest_neighbors=1,
        fields="captionVector"
    )
    
    # Perform search
    results = search_client.search(
        search_text=None, vector_queries=[image_to_text_query], top=1
    )
    display_results(results)

# Example usage
print("Image to Text Vector Search Results:")
image_to_text_search(image_url)


Image to Text Vector Search Results:
Caption: The image captures a solitary surfer standing on a rocky outcrop, gazing out at the ocean waves. Clad in a wetsuit and holding a surfboard, the figure embodies a spirit of adventure and anticipation as they prepare to embrace the surf. With the soft blue sky and gentle waves in the background, the scene conveys a sense of tranquility and the thrill of the ocean.
Score: 0.6095421
URL: https://portal.vision.cognitive.azure.com/dist/assets/ImageCaptioningSample2-72b3c1ca.png


--------------------------------------------------


### Image to Image Vector Search
This section performs a vector search to find similar images based on a given image.

In [51]:
# Image to Image Vector Search
def image_to_image_search(image_url):
    # Generate image embedding
    image_base64 = encode_image_to_base64(image_url)
    image_embedding = generate_image_embedding(image_base64)
    
    # Define the image-to-image vector query
    image_to_image_query = VectorizedQuery(
        vector=image_embedding,
        k_nearest_neighbors=1,
        fields="imageVector"
    )
    
    # Perform search
    results = search_client.search(
        search_text=None, vector_queries=[image_to_image_query], top=1
    )
    display_results(results)

# Example usage
print("Image to Image Vector Search Results:")
image_to_image_search(image_url)


Image to Image Vector Search Results:
Caption: The image captures a solitary surfer standing on a rocky outcrop, gazing out at the ocean waves. Clad in a wetsuit and holding a surfboard, the figure embodies a spirit of adventure and anticipation as they prepare to embrace the surf. With the soft blue sky and gentle waves in the background, the scene conveys a sense of tranquility and the thrill of the ocean.
Score: 0.7631374
URL: https://portal.vision.cognitive.azure.com/dist/assets/ImageCaptioningSample2-72b3c1ca.png


--------------------------------------------------


### Cross-Field Vector Search: Text Embedding Query
This section performs a cross-field vector search using a text embedding to query both image and caption fields.

In [52]:
# Cross-Field Vector Search: Text Embedding Query
def text_embedding_cross_field_search(query_text):
    # Generate text embedding
    text_embedding = generate_text_embedding(query_text)
    
    # Define the vector query for both caption and image fields
    cross_field_query = VectorizedQuery(
        vector=text_embedding,
        k_nearest_neighbors=1,
        fields="imageVector, captionVector"
    )
    
    # Perform search
    results = search_client.search(
        search_text=None, vector_queries=[cross_field_query], top=3
    )
    display_results(results)

# Example usage
print("Cross-Field Vector Search Results (Text Embedding Query):")
text_embedding_cross_field_search(query_text)


Cross-Field Vector Search Results (Text Embedding Query):
Caption: The image captures a solitary surfer standing on a rocky outcrop, gazing out at the ocean waves. Clad in a wetsuit and holding a surfboard, the figure embodies a spirit of adventure and anticipation as they prepare to embrace the surf. With the soft blue sky and gentle waves in the background, the scene conveys a sense of tranquility and the thrill of the ocean.
Score: 0.01666666753590107
URL: https://portal.vision.cognitive.azure.com/dist/assets/ImageCaptioningSample2-72b3c1ca.png


--------------------------------------------------
Caption: The image features a focused baseball player poised at the plate, ready to swing his bat. Dressed in a gray uniform with a striking red helmet, he embodies determination and athleticism under the stadium lights. The blurred background of the baseball field highlights the intensity of the moment, capturing the excitement of the game.
Score: 0.01666666753590107
URL: https://portal.vision.cognitive.azure.com/dist/assets/ImageCaptioningSample4-0559774f.png


--------------------------------------------------


### Cross-Field Vector Search: Image Embedding Query
This section performs a cross-field vector search using an image embedding to query both image and caption fields.

In [53]:
# Cross-Field Vector Search: Image Embedding Query
def image_embedding_cross_field_search(image_url):
    # Generate image embedding
    image_base64 = encode_image_to_base64(image_url)
    image_embedding = generate_image_embedding(image_base64)
    
    # Define the vector query for both caption and image fields
    cross_field_query = VectorizedQuery(
        vector=image_embedding,
        k_nearest_neighbors=1,
        fields="imageVector, captionVector"
    )
    
    # Perform search
    results = search_client.search(
        search_text=None, vector_queries=[cross_field_query], top=1
    )
    display_results(results)

# Example usage
print("Cross-Field Vector Search Results (Image Embedding Query):")
image_embedding_cross_field_search(image_url)


Cross-Field Vector Search Results (Image Embedding Query):
Caption: The image captures a solitary surfer standing on a rocky outcrop, gazing out at the ocean waves. Clad in a wetsuit and holding a surfboard, the figure embodies a spirit of adventure and anticipation as they prepare to embrace the surf. With the soft blue sky and gentle waves in the background, the scene conveys a sense of tranquility and the thrill of the ocean.
Score: 0.03333333507180214
URL: https://portal.vision.cognitive.azure.com/dist/assets/ImageCaptioningSample2-72b3c1ca.png


--------------------------------------------------


### Multi-Vector Search: Text and Image Query
This section performs a multi-vector search using both text and image vectors.

In [57]:
# Multi-Vector Search: Text and Image Query
def text_and_image_query_multi_vector(query_text, image_url):
    # Generate the text embedding
    text_embedding = generate_text_embedding(query_text)
    
    # Encode the image and generate the image embedding
    image_base64 = encode_image_to_base64(image_url)
    image_embedding = generate_image_embedding(image_base64)

    # Define the text vector query
    text_vector_query = VectorizedQuery(
        vector=text_embedding,
        k_nearest_neighbors=1,
        fields="captionVector"
    )
    
    # Define the image vector query
    image_vector_query = VectorizedQuery(
        vector=image_embedding,
        k_nearest_neighbors=1,
        fields="imageVector"
    )

    # Perform the search with both vector queries
    results = search_client.search(
        search_text=None,
        vector_queries=[text_vector_query, image_vector_query],
        top=2
    )
    
    # Display the results
    display_results(results)

# Example usage
print("Text and Image Multi-Vector Query Results:")
text_and_image_query_multi_vector(query_text, image_url)


Text and Image Multi-Vector Query Results:
Caption: The image captures a solitary surfer standing on a rocky outcrop, gazing out at the ocean waves. Clad in a wetsuit and holding a surfboard, the figure embodies a spirit of adventure and anticipation as they prepare to embrace the surf. With the soft blue sky and gentle waves in the background, the scene conveys a sense of tranquility and the thrill of the ocean.
Score: 0.01666666753590107
URL: https://portal.vision.cognitive.azure.com/dist/assets/ImageCaptioningSample2-72b3c1ca.png


--------------------------------------------------
Caption: The image features a focused baseball player poised at the plate, ready to swing his bat. Dressed in a gray uniform with a striking red helmet, he embodies determination and athleticism under the stadium lights. The blurred background of the baseball field highlights the intensity of the moment, capturing the excitement of the game.
Score: 0.01666666753590107
URL: https://portal.vision.cognitive.azure.com/dist/assets/ImageCaptioningSample4-0559774f.png


--------------------------------------------------


# Perform RAG using Command R+

In [59]:
# Initialize Cohere client for Command R+ chat
co_chat = cohere.Client(
    base_url=f"{AZURE_AI_STUDIO_COHERE_COMMAND_ENDPOINT}/v1", 
    api_key=AZURE_AI_STUDIO_COHERE_COMMAND_KEY
)

# Cross-Field Vector Search: Text Embedding Query
def text_embedding_cross_field_search(query_text):
    """
    Performs a cross-field vector search on both imageVector and captionVector fields.
    
    Args:
        query_text (str): The text query for search.

    Returns:
        list: A list of search results.
    """
    # Generate text embedding
    text_embedding = generate_text_embedding(query_text)
    
    # Define the vector query for both caption and image fields
    cross_field_query = VectorizedQuery(
        vector=text_embedding,
        k_nearest_neighbors=1,
        fields="imageVector, captionVector"
    )
    
    # Perform search and return results
    results = search_client.search(
        search_text=None, vector_queries=[cross_field_query], top=3
    )
    return results

# Chat function to ground the response based on search results
def ask(query_text):
    """
    Ask a question and return a grounded response from the Cohere chat endpoint.

    Args:
        query_text (str): The question to ask.

    Returns:
        str: The grounded response from the chatbot.
    """
    # Retrieve search results using the pre-defined search function
    search_results = text_embedding_cross_field_search(query_text)

    # Prepare documents for the chat API from search results
    documents = [{"text": result["caption"]} for result in search_results]

    # Get grounded response using Cohere Command R+ chat
    chat_response = co_chat.chat(
        message=query_text, 
        documents=documents,
        max_tokens=100
    )

    return chat_response

# Function to clean and format the grounded response with citations
def pretty_text(text, citations):
    """
    Format the text response with citations in a readable way.

    Args:
        text (str): The response text from the chat.
        citations (list): List of citation objects with start, end, and document_ids.

    Returns:
        str: The formatted text with citations.
    """
    # Sort citations by start position to avoid issues when altering text indices
    sorted_citations = sorted(citations, key=lambda x: x.start, reverse=True)

    # Process each citation in reverse order to prevent index shifting
    for citation in sorted_citations:
        doc_ids_str = ", ".join(citation.document_ids)
        citation_text = text[citation.start : citation.end]
        # Bold the citation text and add document ids as superscript
        new_text = f"**{citation_text}**^({doc_ids_str})"
        text = text[: citation.start] + new_text + text[citation.end :]

    return text

# Example usage
query_text = "sports activities like skateboarding"
response = ask(query_text)

# Print raw response
print("Grounded Response:", response.text)

# Clean the response with citations
pretty_text_output = pretty_text(response.text, response.citations)
print("\nFormatted Response with Citations:")
print(pretty_text_output)


Grounded Response: Other sports activities include baseball.

Formatted Response with Citations:
Other sports activities include **baseball.**^(doc_1)
