# Using Cohere Embed-4 Embedding Model with Azure AI Search for Multimodal Search

This notebook shows how to use Cohere Embed-4 on Azure to generate text and image embeddings and index them in Azure AI Search for multimodal vector search across documents and images.

## Prerequisites

* **Azure Account**: Ensure you have an active Azure account with access to Azure AI services and Azure Cognitive Search.
* **Cohere Embed-4 Model**: Deploy the Cohere Embed-4 model in Azure AI Foundry.
* **API Keys and Endpoints**: Obtain the necessary API keys and endpoints for Azure AI and Azure Cognitive Search.
* **Sample Data**: Have sample images available for embedding.

## Install Required Packages

First, install the necessary Python packages.

In [1]:
! pip install azure-search-documents python-dotenv



## Install required libraries

In [2]:
! pip install --upgrade azure-search-documents



## Import Libraries

In [3]:
import base64
import os
import requests
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    AzureMachineLearningParameters,
    AzureMachineLearningVectorizer,
    HnswAlgorithmConfiguration,
    HnswParameters,
    SearchField,
    SearchFieldDataType,
    SearchIndex,
    SimpleField,
    VectorSearch,
    VectorSearchAlgorithmKind,
    VectorSearchAlgorithmMetric,
    VectorSearchProfile,
)
from azure.search.documents.models import (
    VectorizedQuery,
)
from dotenv import load_dotenv
from tenacity import retry, stop_after_attempt, wait_exponential
from tqdm import tqdm
from typing import Dict, List, Optional
from IPython.display import HTML, display

## Configuration

In [4]:
# Load environment variables
load_dotenv()

# Azure AI Studio Cohere Configuration
AZURE_AI_STUDIO_COHERE_EMBED_KEY = os.getenv("AZURE_AI_STUDIO_COHERE_EMBED_KEY")
AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT = os.getenv("AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT")

# Azure Cognitive Search Configuration
AZURE_SEARCH_SERVICE_ENDPOINT = os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT")
AZURE_SEARCH_ADMIN_KEY = os.getenv("AZURE_SEARCH_ADMIN_KEY")
INDEX_NAME = "multimodal-cohere-embed4-index"

# Cohere Model Information
EMBEDDING_MODEL_NAME = "embed-4"
EMBEDDING_DIMENSIONS = 1536  # Embed-4 uses 1536 dimensions

In [5]:
# Validate required environment variables
required_vars = {
    "AZURE_AI_STUDIO_COHERE_EMBED_KEY": AZURE_AI_STUDIO_COHERE_EMBED_KEY,
    "AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT": AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT,
    "AZURE_SEARCH_SERVICE_ENDPOINT": AZURE_SEARCH_SERVICE_ENDPOINT,
    "AZURE_SEARCH_ADMIN_KEY": AZURE_SEARCH_ADMIN_KEY
}

missing_vars = [name for name, value in required_vars.items() if not value]
if missing_vars:
    print("❌ Missing required environment variables:")
    for var in missing_vars:
        print(f"  - {var}")
    print("\nPlease set these environment variables before proceeding.")
else:
    print("✅ All required environment variables are set")


✅ All required environment variables are set


## Authenticate Clients

In [6]:
# Authenticate Azure Cognitive Search client
azure_search_credential = AzureKeyCredential(AZURE_SEARCH_ADMIN_KEY)

# Initialize Azure Search clients
index_client = SearchIndexClient(
    endpoint=AZURE_SEARCH_SERVICE_ENDPOINT,
    credential=azure_search_credential
)

search_client = SearchClient(
    endpoint=AZURE_SEARCH_SERVICE_ENDPOINT,
    index_name=INDEX_NAME,
    credential=azure_search_credential
)

## Sample Data

### Text Documents and Image URLs

In [7]:
documents = [
    {
        "id": "1",
        "caption": "A man wearing a beanie and shirt works on a laptop, representing modern remote work and technology usage",
        "imageUrl": "https://images.unsplash.com/photo-1755541516517-bb95790dc7ad"
    },
    {
        "id": "2",
        "caption": "A baby elephant sits on a rock surrounded by adult elephants in their natural habitat, showcasing wildlife family dynamics and conservation",
        "imageUrl": "https://images.unsplash.com/photo-1756767265856-c0e726dd9896"
    },
    {
        "id": "3",
        "caption": "A man photographs two girls sitting in an open car trunk, capturing a casual outdoor photography session with friends",
        "imageUrl": "https://images.unsplash.com/photo-1756142007155-c8b4eb0c3808"
    },
    {
        "id": "4",
        "caption": "A hockey player takes aim for the goal while facing the goalie on an ice rink, capturing the intensity and precision of competitive ice hockey",
        "imageUrl": "https://images.unsplash.com/photo-1516226415502-d6624544376b"
    }
]

print(f"Total documents in dataset: {len(documents)}")

Total documents in dataset: 4


## Functions to Encode Images and Generate Embeddings

### Encode Image to Base64

In [8]:
def encode_image_to_base64(image_url: str) -> str:
    """
    Convert image URL to base64 data URI format required by Cohere Embed-4.
    """
    try:
        response = requests.get(image_url, timeout=10)
        response.raise_for_status()
        content_type = response.headers['Content-Type']
        base64_data = base64.b64encode(response.content).decode('utf-8')
        return f"data:{content_type};base64,{base64_data}"
    except Exception as e:
        print(f"Error encoding image {image_url}: {e}")
        return None

### Generate Text Embeddings using REST API

In [9]:
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
def generate_text_embedding(text: str, model=EMBEDDING_MODEL_NAME) -> List[float]:
    """
    Generate text embeddings using Cohere Embed-4 via direct REST API calls.
    
    Args:
        text (str): The text to embed.
        model (str): The embedding model name.
    
    Returns:
        List[float]: The embedding vector.
    """
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {AZURE_AI_STUDIO_COHERE_EMBED_KEY}"
    }
    
    payload = {
        "model": model,
        "texts": [text],
        "input_type": "search_document",
        "truncate": "NONE"
    }
    
    try:
        response = requests.post(
            f"{AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT}/v1/embed",
            headers=headers,
            json=payload,
            timeout=30
        )
        response.raise_for_status()
        result = response.json()
        return result["embeddings"][0]
    except Exception as e:
        print(f"Error generating text embedding: {e}")
        raise

### Generate Image Embeddings using REST API

In [10]:
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
def generate_image_embedding(image_base64: str, model=EMBEDDING_MODEL_NAME) -> List[float]:
    """
    Generate image embeddings using Cohere Embed-4 via direct REST API calls.
    
    Args:
        image_base64 (str): The base64 encoded image.
        model (str): The embedding model name.
    
    Returns:
        List[float]: The embedding vector.
    """
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {AZURE_AI_STUDIO_COHERE_EMBED_KEY}"
    }
    
    payload = {
        "model": model,
        "images": [image_base64],
        "input_type": "image",
        "truncate": "NONE"
    }
    
    try:
        response = requests.post(
            f"{AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT}/v1/embed",
            headers=headers,
            json=payload,
            timeout=30
        )
        response.raise_for_status()
        result = response.json()
        return result["embeddings"][0]
    except Exception as e:
        print(f"Error generating image embedding: {e}")
        raise

## Process and Embed Data

In [None]:
import time

# APPROACH: Manual embedding generation with rate limit handling
# We're generating embeddings manually to handle the strict rate limits of the Cohere Embed-4 API.
# This approach downloads images, converts them to base64, and calls the Embed-4 API directly.
# We add significant delays between API calls due to strict rate limits on the endpoint.
# 
# NOTE: Once rate limits are resolved (higher quota/tier or less busy endpoint), you could:
# - Remove the time.sleep() delays for faster processing
# - Use Azure's built-in vectorizers once Embed-4 is added to the model catalog
# - Process documents in parallel/batches for much better performance

processed_documents = []

print("Processing documents and generating embeddings...")
print("Using extended delays to handle API rate limits - this will take several minutes.")

for i, doc in enumerate(documents):
   print(f"\nProcessing document {i+1}/{len(documents)}: {doc['id']}")
   
   caption = doc["caption"]
   image_url = doc["imageUrl"]
   
   try:
       # Generate text embedding from caption
       print("  Waiting 30 seconds to avoid rate limits...")
       time.sleep(30)
       
       print("  Generating text embedding...")
       caption_embedding = generate_text_embedding(caption)
       print("  ✓ Text embedding generated")
       
       # Generate image embedding from URL
       print("  Waiting 45 seconds before image processing...")
       time.sleep(45)
       
       print("  Downloading and encoding image...")
       image_base64 = encode_image_to_base64(image_url)
       
       if image_base64:
           print("  Generating image embedding...")
           image_embedding = generate_image_embedding(image_base64)
           print("  ✓ Image embedding generated")
       else:
           image_embedding = None
           print("  ✗ Failed to encode image")
       
       # Create document with embeddings for indexing
       doc_copy = doc.copy()
       doc_copy["captionVector"] = caption_embedding
       doc_copy["imageVector"] = image_embedding
       
       processed_documents.append(doc_copy)
       print(f"  ✓ Document {doc['id']} processed successfully")
       
       # Wait between documents to respect rate limits
       if i < len(documents) - 1:
           print("  Waiting 60 seconds before next document...")
           time.sleep(60)
       
   except Exception as e:
       print(f"  ✗ Error processing document {doc['id']}: {e}")
       # Add document without embeddings to maintain data integrity
       doc_copy = doc.copy()
       doc_copy["captionVector"] = None
       doc_copy["imageVector"] = None
       processed_documents.append(doc_copy)
       
       if i < len(documents) - 1:
           print("  Waiting 60 seconds after error...")
           time.sleep(60)

print(f"\nProcessing complete! {len(processed_documents)} documents with embeddings ready for indexing.")

Starting processing with extended delays due to rate limiting...
This may take several minutes...

Processing document 1/4: 1
  Waiting 30 seconds before text embedding...
  Generating text embedding...
  ✓ Text embedding generated
  Waiting 45 seconds before image processing...
  Encoding image...
  Generating image embedding...
  ✓ Image embedding generated
  ✓ Document 1 processed successfully
  Waiting 60 seconds before next document...

Processing document 2/4: 2
  Waiting 30 seconds before text embedding...
  Generating text embedding...
  ✓ Text embedding generated
  Waiting 45 seconds before image processing...
  Encoding image...
  Generating image embedding...
  ✓ Image embedding generated
  ✓ Document 2 processed successfully
  Waiting 60 seconds before next document...

Processing document 3/4: 3
  Waiting 30 seconds before text embedding...
  Generating text embedding...
  ✓ Text embedding generated
  Waiting 45 seconds before image processing...
  Encoding image...
  Gene

## Create an Azure AI Search Index

In [None]:
# Define the fields for the index
fields = [
    SimpleField(name="id", type=SearchFieldDataType.String, key=True),
    SimpleField(
        name="imageUrl",
        type=SearchFieldDataType.String,
        retrievable=True,
        filterable=True
    ),
    SimpleField(
        name="caption",
        type=SearchFieldDataType.String,
        searchable=True,
        retrievable=True
    ),
    SearchField(
        name="imageVector",
        type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
        searchable=True,
        vector_search_dimensions=EMBEDDING_DIMENSIONS,  # Embed-4 uses 1536 dimensions
        vector_search_profile_name="vector_profile"
    ),
    SearchField(
        name="captionVector",
        type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
        searchable=True,
        vector_search_dimensions=EMBEDDING_DIMENSIONS,  # Embed-4 uses 1536 dimensions
        vector_search_profile_name="vector_profile"
    )
]

In [None]:
# Define vector search configuration
# Using manual embedding generation for Embed-4 compatibility
vector_search = VectorSearch(
    algorithms=[
        HnswAlgorithmConfiguration(
            name="hnsw_config",
            kind=VectorSearchAlgorithmKind.HNSW,
            parameters=HnswParameters(
                m=4,
                ef_construction=400,
                ef_search=500,
                metric=VectorSearchAlgorithmMetric.COSINE,
            ),
        )
    ],
    profiles=[
        VectorSearchProfile(
            name="vector_profile", 
            algorithm_configuration_name="hnsw_config",
        )
    ],
)

In [None]:
# Create the index
index = SearchIndex(
    name=INDEX_NAME,
    fields=fields,
    vector_search=vector_search
)

# Create or update the index
try:
    result = index_client.create_or_update_index(index)
    print(f"Index {INDEX_NAME} created successfully")
except Exception as e:
    print(f"Error creating index: {str(e)}")

# Optional: Verify index fields
try:
    index_info = index_client.get_index(INDEX_NAME)
    print("\nIndex fields:")
    for field in index_info.fields:
        print(f"- {field.name} ({field.type})")
except Exception as e:
    print(f"Error retrieving index fields: {str(e)}")

Index multimodal-cohere-embed4-index created successfully

Index fields:
- id (Edm.String)
- imageUrl (Edm.String)
- caption (Edm.String)
- imageVector (Collection(Edm.Single))
- captionVector (Collection(Edm.Single))


## Upload Documents to the Index

In [None]:
try:
    result = search_client.merge_or_upload_documents(documents=processed_documents)
    if result[0].succeeded:
        print("Documents uploaded successfully.")
    else:
        print("Error uploading documents.")
except Exception as e:
    print(f"Error uploading documents: {e}")

Documents uploaded successfully.


## Perform Multimodal Vector Search

In [None]:
# Example usage of each search function
query_text = "sports activities like skateboarding"
image_url = "https://portal.vision.cognitive.azure.com/dist/assets/ImageCaptioningSample5-14b26724.png"  ## Image of man on skateboard

# Helper function to display results
def display_results(results):
    for result in results:
        print(f"Caption: {result['caption']}")
        print(f"Score: {result['@search.score']}")
        print(f"URL: {result['imageUrl']}")
        display(HTML(f'<img src="{result["imageUrl"]}" style="width:200px;"/>'))
        print("-" * 50)

### Text to Text Vector Search

This section performs a vector search to find similar text entries based on the query.

In [None]:
# Text to Text Vector Search
def text_to_text_search(query_text):
    # Generate text embedding
    text_embedding = generate_text_embedding(query_text)
    
    # Define the text vector query
    text_vector_query = VectorizedQuery(
        vector=text_embedding,
        k_nearest_neighbors=1,
        fields="captionVector"
    )
    
    # Perform search
    results = search_client.search(
        search_text=None, vector_queries=[text_vector_query], top=1
    )
    display_results(results)

# Example usage
print("Text to Text Vector Search Results:")
text_to_text_search(query_text)

Text to Text Vector Search Results:
Caption: A man photographs two girls sitting in an open car trunk, capturing a casual outdoor photography session with friends
Score: 0.5598682
URL: https://images.unsplash.com/photo-1756142007155-c8b4eb0c3808


--------------------------------------------------


### Text to Image Vector Search

This section performs a vector search to find images related to the query text.

In [None]:
# Text to Image Vector Search
def text_to_image_search(query_text):
    # Generate text embedding
    text_embedding = generate_text_embedding(query_text)
    
    # Define the text-to-image vector query
    text_to_image_query = VectorizedQuery(
        vector=text_embedding,
        k_nearest_neighbors=1,
        fields="imageVector"
    )
    
    
    # Perform search
    results = search_client.search(
        search_text=None, vector_queries=[text_to_image_query], top=1
    )
    display_results(results)

# Example usage
print("Text to Image Vector Search Results:")
text_to_image_search(query_text)

Text to Image Vector Search Results:
Caption: A hockey player takes aim for the goal while facing the goalie on an ice rink, capturing the intensity and precision of competitive ice hockey
Score: 0.554676
URL: https://images.unsplash.com/photo-1516226415502-d6624544376b


--------------------------------------------------


### Image to Text Vector Search

This section performs a vector search to find text entries related to a given image.

In [None]:
# Image to Text Vector Search
def image_to_text_search(image_url):
    # Generate image embedding
    image_base64 = encode_image_to_base64(image_url)
    image_embedding = generate_image_embedding(image_base64)
    
    # Define the image-to-text vector query
    image_to_text_query = VectorizedQuery(
        vector=image_embedding,
        k_nearest_neighbors=1,
        fields="captionVector"
    )
    
    # Perform search
    results = search_client.search(
        search_text=None, vector_queries=[image_to_text_query], top=1
    )
    display_results(results)

# Example usage
print("Image to Text Vector Search Results:")
image_to_text_search(image_url)

Image to Text Vector Search Results:
Caption: A man wearing a beanie and shirt works on a laptop, representing modern remote work and technology usage
Score: 0.5489908
URL: https://images.unsplash.com/photo-1755541516517-bb95790dc7ad


--------------------------------------------------


## Cross-Field Vector Search: Text Embedding Query

This section performs a cross-field vector search using a text embedding to query both image and caption fields.

In [None]:
# Cross-Field Vector Search: Text Embedding Query
def text_embedding_cross_field_search(query_text, return_results=False):
    # Generate text embedding
    text_embedding = generate_text_embedding(query_text)
    
    # Define the vector query for both caption and image fields
    cross_field_query = VectorizedQuery(
        vector=text_embedding,
        k_nearest_neighbors=1,
        fields="imageVector, captionVector"
    )
    
    # Perform search
    results = search_client.search(
        search_text=None, vector_queries=[cross_field_query], top=3
    )
    
    if return_results:
        return results
    else:
        display_results(results)

# Example usage
print("Cross-Field Vector Search Results (Text Embedding Query):")
text_embedding_cross_field_search(query_text)

Cross-Field Vector Search Results (Text Embedding Query):
Caption: A man photographs two girls sitting in an open car trunk, capturing a casual outdoor photography session with friends
Score: 0.01666666753590107
URL: https://images.unsplash.com/photo-1756142007155-c8b4eb0c3808


--------------------------------------------------
Caption: A hockey player takes aim for the goal while facing the goalie on an ice rink, capturing the intensity and precision of competitive ice hockey
Score: 0.01666666753590107
URL: https://images.unsplash.com/photo-1516226415502-d6624544376b


--------------------------------------------------


## Multi-Vector Search: Text and Image Query

This section performs a multi-vector search using both text and image vectors.

In [None]:
# Multi-Vector Search: Text and Image Query
def text_and_image_query_multi_vector(query_text, image_url):
    # Generate the text embedding
    text_embedding = generate_text_embedding(query_text)
    
    # Encode the image and generate the image embedding
    image_base64 = encode_image_to_base64(image_url)
    image_embedding = generate_image_embedding(image_base64)

    # Define the text vector query
    text_vector_query = VectorizedQuery(
        vector=text_embedding,
        k_nearest_neighbors=1,
        fields="captionVector"
    )
    
    # Define the image vector query
    image_vector_query = VectorizedQuery(
        vector=image_embedding,
        k_nearest_neighbors=1,
        fields="imageVector"
    )

    # Perform the search with both vector queries
    results = search_client.search(
        search_text=None,
        vector_queries=[text_vector_query, image_vector_query],
        top=2
    )
    
    # Display the results
    display_results(results)

# Example usage
print("Text and Image Multi-Vector Query Results:")
text_and_image_query_multi_vector(query_text, image_url)

Text and Image Multi-Vector Query Results:
Caption: A man wearing a beanie and shirt works on a laptop, representing modern remote work and technology usage
Score: 0.01666666753590107
URL: https://images.unsplash.com/photo-1755541516517-bb95790dc7ad


--------------------------------------------------
Caption: A man photographs two girls sitting in an open car trunk, capturing a casual outdoor photography session with friends
Score: 0.01666666753590107
URL: https://images.unsplash.com/photo-1756142007155-c8b4eb0c3808


--------------------------------------------------


## Summary

This notebook demonstrates the complete workflow for using Cohere Embed-4 with Azure AI Search for multimodal vector search applications. Key capabilities showcased include:

1. **Direct REST API Integration**: Updated to use direct REST API calls to Cohere Embed-4 endpoints instead of the SDK
2. **Multimodal Embeddings**: Generate embeddings for text and image content using the latest Embed-4 model
3. **Advanced Vector Search**: Implement various search patterns including cross-field and multi-vector queries
4. **Visual Results**: Display images inline for immediate visual feedback

The Embed-4 model provides enhanced multimodal capabilities with 1536-dimensional embeddings, offering improved performance over previous versions for enterprise AI applications including image search and cross-modal retrieval.